Logic & Proof: Propositions, Quantifiers, Truth Tables

FURTHER MATHS T BAB 1 1. Logic & Proof 1.1 Logic (propositions, quantifiers) A proposition is a declarative sentence that is either true or false, but not both. Normally, we will use the letter p, q and r to represent a proposition. The negation of a proposition is its opposite. For example, if p: 2 + 3 = 5, then its negation is 2 + 3 ≠ 5. A negation of p can be represented by ~p, p̅ or ¬p. It is recommended that you stick to one, so I will be using ~p throughout this post. Given 2 propositions p and q, A conjunction has the meaning of ‘p and q’. Symbolically, it can be written as p∧q. A disjunction has the meaning of ‘p or q’. Symbolically, it can be written as p∨q. There are some other disjunctions which we don’t normally use, like the ‘exclusive or’ (p⊕q), ‘not and’ (nand, p | q) and ‘nor’ (p ↓ q). An Implication is a conditional statement, where p is the antecedent (hypothesis or premise) while q is the consequence (or conclusion). It has the meaning ‘if p, then q’. Symbolically, it can be written as p→q. In English, there are many ways you can interpret this other than ‘if p, then q’. Examples are if p, q. q unless not p. p implies q. p is sufficient for q. p only if q. q when / if p. q whatever p. a necessary condition of p is q. a sufficient condition of q is p. q is necessary for p. q follows from p. A bi-implication is a bi-conditional statement, which means that the implication and its converse are equivalent. It has the meaning ‘p if and only if q’. Symbolically, it can be written as p↔q. Other ways of saying it are p is necessary and sufficient for q. if p then q, and conversely. p iff q. Now using a conditional statement p → q, A converse of the statement is q → p. An inverse of the statement is ~p → ~q. A contraposition of the statement is ~q → ~p. To further elaborate the meanings of all the stuff above, I’ll continue by introducing truth tables. TRUTH TABLES A truth table is a table which states the truth values of various statements. Here, ‘T’ means ‘true’, while ‘F’ means ‘false’. So the truth table for negation is which means that every time when p is true, p̅ is false and whenever p is false, p̅ is true. Let’s go on for conjunctions, disjunctions, implications and bi-implications: Did you notice something? ‘p and q’ is true if both p and q are true. ‘p or q’ is true if either one of p or q is true. p ↔ q is true if both p and q are true, or both p and q are false. p → q is a little tricky. It is only false if p is true, and q is false. I’ll illustrate this a little: Let’s say, Pakatan Rakyat, the opposition party of Malaysia gives a conditional statement: “If we win the next elections, we will immediately reduce the price of petrol by 20%.” So according to their statement, there will be 2 situations. Situation 1: If they won the elections In the event that they really did reduce the petrol price, then the statement is true. But if they break their promise and the petrol price isn’t reduced, then the statement is false. Situation 2: If they lost in the elections Since they didn’t promise anything about what they will do if they lost, whatever that happens, be it the price of petrol increases, remains or reduces, doesn’t make the statement false, and so therefore, for either case, the statement is true. Now we’ll go on to see the truth tables for converse, inverse and contrapositive: Did you see something? You notice that the converse is equivalent to the inverse! And besides, the statement itself is equivalent as its contrapositive. This is one very important piece of information for you. PRECEDENCE In problem solving, you can combine all the logical operators to make a very complicated statement. For example, you can have [ (~r ∧ ~p) → q ] ∧ [ q → (~r ∨ ~p) ] In order to understand the statement, you need to have a precedence of logical operators, which means, which symbol comes first and which comes second and so on. The precedence of logical operators, from first to last is as follows: ~, ∧,∨,→,↔. If 2 statements have the same truth values, we say that the statements are logically equivalent. if p and q are logically equivalent, we write p ≡ q. Note that it is an equal sign with 3 strokes instead of 2. A tautology is a compound proposition that is always true, no matter what the truth values of the propositions that occur in. For example, ~p → (p → q) is a tautology. You can check by looking at its truth table below: Opposite to a tautology is a contradiction, a compound proposition that is always false. A contingency is neither a tautology or a contradiction. Here are some common laws of logic and some notes identities that you should memorize: True-False Laws Some laws for implications: Some laws for bi-conditional statements: PREDICATION & QUANTIFICATION Sometimes, we can use predicates to represent a logic statement, which depends on an unknown, or various unknowns. A predicate is normally denoted by P(x), or any capital letter followed by a bracket with an unknown in it. For example, C(x) may represent “x is a comedian” while F(x) may represent “x is funny”. A quantifier is used to generalise or specialize a particular predicate, and is placed in front of it. There are 2 kinds of quantifiers: 1. The universal quantifier, denoted by ∀xP(x) which means ‘for all x, P(x)’. It also could mean ‘for every’, ‘all of’, ‘for each’, ‘given any’, ‘for arbitrary’ or ‘for any’. In terms of P(x), it can be represented by the logical statement which means the conjunction of any variable in the predicate P. Using the example above, if x means ‘people’, then ∀xC(x) means ‘everyone is a comedian’. 2. The existential quantifier, denoted by ∃xP(x) which means ‘for some x, P(x)’. It also could mean ‘at least one’, ‘there is a…’ or ’there exists’. In terms of P(x), it can be represented by the logical statement which means that it is the disjunction of any variable in the predicate P. Again, using the example above, ƎxC(x) means ‘some people are comedians’. Sometimes, you might also see the expression Ǝ!xC(x). Instead of ‘for some x’, this means ‘there is exactly one x’. We call it the uniqueness quantifier. We use these quantifiers just the same way how we use those p’s and q’s earlier on, you can add the negating sign (~) or the conditional arrows (↔ and →) to them. So what are the truth values of these quantifiers? The statement ∀xP(x) is TRUE if P(x) is true for all x, in which x should belong to a particular domain (people, animal, students or etc). It is FALSE if we could find an x in which P(x) is false. Using the same example again, ∀xC(x) is true if every human being is a comedian, but is false if you could find one person (yes, you only need one person) who isn’t a comedian. The statement ƎxP(x) is TRUE if there exist one x which is true, and is FALSE if every x is false for P(x). Using the same example, ƎxC(x) is true if there is one human being on earth is a comedian, but is false if you can’t find a single human being who is a comedian. Simple? Let me give you some common rules for quantifiers: This one tells you how you can bring the quantifier into the brackets. Beware though, you can’t bring a universal quantifier in if you use a disjunction, and the same applies to the existential quantifier and conjunctions. The negations of quantifiers: This is quite straightforward. A nested quantifier is formed when you use 2 or more quantifiers in 1 predicate. Examples are like Notice that both quantifiers mean different things. The first one says ‘for all x, there exist a y such that P(x,y) is true’, while the second one says ‘there exist an x such that all y is true for P(x,y)’. Let me give you a detail example: Let P(x,y) be the statement “x has sent an SMS to y”, where the domain of x and y are ‘students’. We can see that 1. ƎxƎyP(x,y) means “There is some student who sent an SMS to some student.” 2. Ǝx∀yP(x,y) means “There is a student who sent an SMS to all other student.” 3. ∀xƎyP(x,y) means “All students sent an SMS to at least one student.” 4. Ǝy∀xP(x,y) means “There is a student who receives an SMS from all students.” 5. ∀yƎxP(x,y) means “Every student has been sent an SMS by at least a student” 6. ∀x∀yP(x,y) means “All students have sent an SMS to all students.” Notice that ƎxƎy and ƎyƎx do mean the same thing, and this applies to ∀x∀y and ∀y∀x too, but not the mixture of both. Now the problem is: how do you know the truth values for nested quantifiers? I’ll show you in a table below: You can actually work it out yourself by using the negating rules stated above. It’s simple: a negation sign passes through a universal quantifier turns into an existential quantifier, and vice versa. Another important remark is this: "if Ǝx∀yP(x,y) is true, then ∀yƎxP(x,y) is true, but not INVERSELY.” That’s all for logic. You should practice more on translating sentences into logical statements, because it can be very confusing sometimes. Also, practice how to prove the equivalence of logical statements using the given laws. For complicated equivalences (more than 2 propositions), you could use a truth table. It will be faster. We’ll start doing proofs in the next section. 1.2 Proof (direct, indirect, induction) 1.2 – Proof A theorem is a statement that can be shown true. A proof is a valid argument that establishes the truth of a theorem. In this section, I will be showing you 3 main kinds of proof (with tonnes of sub-proofs). DIRECT PROOF This proof is very straightforward. A direct proof of a conditional statement p→q, is shown by showing that p is true, then show that q is true. Basically what it means is that you show that something is true, and therefore some other thing follows to be true as well. Let me give you an example: Show that the sum of 2 odd integers is even. Did you notice that this is actually a conditional statement? We let p be ‘m & n are odd integers’ and q be ‘the sum of m & n are even’. Putting p→q, we have ‘If m & n are odd, then their sum is even’. So the proof is: Let m = 2j + 1, and n = 2k + 1, where j and k are integers. Here we can see that m & n are odd. Then m + n = (2j + 1) + (2k + 1) = 2(j+k+1), which is even. Therefore, the statement is proven valid. Try to get used to the definition of odd and even integers, being 2k + 1 and 2k respectively, where k is an arbitrary integer (arbitrary means ‘anything’). Actually, when you are asked to prove odd and even integer stuff, it will definitely be a direct proof, and just substitute these definitions in, then you will find the answer. Another case is in proving theorems related to perfect squares, where you use the definition n = k2. One more case is in proving rational and irrational numbers, where you define a rational number n = j/k. All these definitions will help you a lot, and you may use it in the later chapters, like Number Theory… INDIRECT PROOF Literally, an indirect proof is a proof that is not direct, i.e. you don’t prove straight using p→q. There are many kinds of indirect proofs: 1. Proof by Contraposition This proof basically proves the statement p→q using its contrapositive, which is ~q→~p. Example: Show that if x + y ≥ 2 (where x & y ϵ R), then x ≥ 1 or y ≥ 1. Looking at the question, you know you won’t get anywhere if you try to manipulate the first statement, x + y ≥ 2. So what you do is by assuming that ~q is true, which means x < 1 and y < 1. Remember the fact that the contrapositive of a conditional statement has the same truth value as it has! So here we have Suppose x < 1 and y < 1. Then x + y < 1 + 1, x + y < 2, which is the negation of x + y ≥ 2. Therefore, the statement is proven valid. 2. Proof by Contradiction (Reductio ad absurdum) What this proof does is to first assume that p is true, and ~q is true (p→~q). You will eventually evaluate ~q and find out that ~p is also true. Now you have p Λ ~p, which is a contradiction (a statement which is always false)! From here, you conclude that p→q. Example: Prove that the sum of an irrational number and a rational number is irrational. Putting it in a p→q form, we rewrite the statement as ‘if m & n are irrational and rational respectively, then their sum is not rational’. Now let’s try solving it: Suppose r is rational and i is irrational, and r + i = s is rational. [p is true, ~q is true] Then s could be written as p/q for some integers p and q which have no common factors. r could also be written in the form t/u, where t and u are integers with no common factors too. Using some algebra, which is contradicting, since i is irrational. So it follows that the sum of an irrational number and a rational number is irrational. 3. Vacuous Proof All you need to do in this proof is to show that p in the p→q statement is false. (Note that when p is false, the statement will definitely be false!) This is used to proof statements like ‘if 2 + 2 = 5, then it will snow in Malaysia’. Won’t come out in STPM, I assume… 4. Trivial Proof This proof is similar to the above, but here you must show that q is true in the p→q statement. (Once again recall, when q is true, whether p is true or false, the statement still holds!) For the statement ‘if you give me RM1, then the sun will rise in the east’. I don’t need to explain, right? 5. Proof of Equivalence This proof is for proving statements of the form p↔q, where you need to prove p→q and its converse, q→p. You can use one of the above methods (direct proof, proof by contraposition or contradiction) to solve the p→q and q→p part. Basically this proof is a combination of proofs, and I don’t think I need to elaborate much on this. 6. Proof by Cases In this proof, you need to proof something using a case by case basis. For example, Prove that |x| + 2 > 0. I won’t show you the answer. All you need to do is use case by case: case 1 where x > 0, case 2 where x < 0, and case 3 where x = 0, substitute the values of x in it, then show that it is true. 7. Exhaustive Proof This proof requires you to use up all the possibly numbers in that domain, substitute them into the equation to show that it is valid. E.g., Prove that n2 + 1 > 2n, where 0 < n < 5. Just use up all the values n = 1, 2, 3, 4 and substitute them into the equation to prove it. Very straightforward for a small domain of n. 8. Existence Proof The questions for this kind of proof normally starts with something like ‘show that there exist a…’. There are 2 kinds of existence proof, it can either be constructive or non-constructive. A constructive one will make you find the exact answer to the question, while the non-constructive approach will prove the statement true even without finding a solution. I’ll show you the examples: Show that there is an integer that can written as the sum of cubes of two integers in two different ways. This is true as 1729 = 103 + 93 = 123 + 13. [constructive] Show that the equation x3 + x - 1 = 0 has a solution. Let P(x) = x3 + x - 1. Then P(0) = -1 and P(1) = 1. Thus (by the intermediate value theorem), the equation P(x) = 0 has a solution which is between 0 and 1. [non-constructive. Interesting isn’t it?] 9. Uniqueness Proof This proof is an extension of the previous proof. First you show that there is a solution x for the statement P(x). Then you find a value of c which is true for P(x). So lastly, you show that if x ≠ c, then P(x) is false. Example: Show that if a, b ϵ R with a ≠ 0, there is unique r ϵ R such that ar - b = 0. Certainly r = b/a satisfies ar - b = 0. [1st & 2nd part] Next, suppose s, t both satisfy as - b = 0 and at - b = 0. Then as - b = at - b and so as = at. Since a ≠ 0, we have s = t, which means that the solution is unique. [3rd part] 10. Proof by giving a Counterexample A counterexample is used to disprove something. This is super easy, for example: Prove or disprove that the product of 2 irrational numbers is irrational. Using a counterexample, √2 × √2 = 2, which is rational. So the statement is invalid. All you need to disprove something is to find a counterexample. That’s it. MATHEMATICAL INDUCTION Mathematical induction is the most common proof that you will use and see in your exams. There are lots of information on this proof in A-level books, so I don’t need to give you too much examples. What mathematical induction does is that it proves that an equation is true for a particular value, we call it the basis. Then we go on to prove that the equation is valid for any value greater than that basis. To sum up, mathematical induction involves 2 steps: 1. The basis step: you proof that the equation is true for n = 0, n = 1 or whatever initial value they give you in the question. 2. The inductive step: you now assume that the equation is true for n = k. Then you try solving the equation in terms of n = k + 1, and show that the relationship holds too. From here you can conclude that by mathematical induction, the equation is true in that domain. Let me show you an example: Use proof by induction to prove that Since LHS = RHS, we see that the formula is true for n = 1. We assume that the formula is true for n = k, then we have and letting n = k+1, we have Hence is true for all n ≥ 1. [proven] I will show you some tips on how to solve different kinds of questions. Remember to do A LOT of exercises on Mathematical Induction! * Questions involving summations Try to make use of the fact * Questions involving matrices Try to make use of the fact Ak+1 = Ak × A, where A is a matrix. * Questions involving differentials Try to make use of the fact * Questions involving recurring terms Try to make use of the fact if un + un+1 is divisible by a, then either un & un+1 are both divisible by a, or both not divisible by a. BAB 2 2. Complex Numbers 2.1 Polar Form (geometrical effects, exponential form) The basics of this chapter on complex numbers should have been covered in Mathematics T. So while I explain stuff over here, I assume that you already know the following about complex numbers: a. understand what is the real part, imaginary part, and conjugate of a complex number b. find the modulus and argument of a complex number c. represent complex numbers geometrically by means of an Argand diagram d. use the condition for the equality of two complex numbers e. carry out elementary operations on complex numbers expressed in Cartesian form So in this post, I’ll be introducing to you another 2 forms of which you can represent a complex number. This will be a short one. You still remember that the modulus of a complex number, a + bi is denoted by |z|, which has the value √(a2 + b2). And then, you also remember that the argument of a complex number, denoted by arg z, has the value of tan-1 (b / a), putting in account which quadrant is the complex number in the Argand diagram. So with the modulus and argument, you can represent a complex number in polar form, or (r, θ) form, which is |z| (cos (arg z) + i sin (arg z)) For example, the modulus of the complex number –3 + 3i is 3√2, its argument is 3π/4, so its polar form is 3√2 [ cos (3π/4) + i sin (3π/4)]. Using this form will give us a lot of advantages, especially when we learn de Moivre’s theorem in the next post. Another way of expressing complex numbers using the argument and modulus is the exponential form. The term cos θ + i sin θ can be written as eiθ, where e is the natural exponent. So by multiplying the modulus in the front of both terms, and substituting the θ with the argument, you get another form of complex numbers! Using the example above, the exponential form of –3 + 3i is 3√2ei(3π/4). This is something like a compressed form of complex numbers. By now you will be puzzled as in how do you relate cosine and sine with exponents. Well, I’ll show the derivation, only in the chapter Power Series. As for now, just take it as it is. (Extra info: go google for “Euler’s Equation”. You will be surprised by the equation eiπ + 1 = 0 !) Now that we know 2 extra forms of complex numbers, we want to know how multiplication and division of complex numbers can be done easily in these 2 forms. The general rule is this: When you multiply 2 complex numbers, you multiply their modulus and add up their arguments. But when you divide 2 complex numbers, you divide their modulus and subtract their arguments. Let’s give it a try. Suppose we multiply 2 complex numbers, –3 + 3i and –1 – i. You will have: 3√2 [ cos (3π/4) + i sin (3π/4)] × √2 [ cos (3π/4) - i sin (3π/4)] = 6 Did you catch my working? Try using some trigonometry formulas, you will eventually get the answer. Take note that 3π/4 - 3π/4 = 0! There’s also another way of going about it. Let’s try using the exponential form: 3√2ei(3π/4) × √2ei(-3π/4) = 6 This is multiplication. It will be the same for division, go ahead and give it a try. There’s no shortcut for addition and subtraction though, so it’s better to use the cartesian form to solve them. Now we will learn the geometrical effects of basic operations on complex numbers. We will be using the Argand diagram (which I assume you already know what it is, and what are its axes). 1. Conjugation Basically what conjugation does is that it reflects a complex number across the real axis. Try visualizing these complex numbers as vectors, and you will understand more. 2. Addition & Subtraction The diagram above shows the addition of 1 + 1i with 1 – 1i. As you can see, it is merely a vector addition, you add up the two of the complex numbers as if they were vectors. You should be able to deduce that for subtraction, just by using the fact that the added minus sign switches the direction of the arrow, you will be able to get the answer, which will be 2i, lying on the imaginary axis. Quite straightforward right? 3. Multiplication & Division Possibly you couldn’t catch what the diagram meant. The green arrow, 3 – i is the result of the multiplication of 1 + i and 1 – 2i. What happens here is just exactly what we learnt above: the modulus multiplies (the length does not add up, but multiplies as in √2 × √5 = √10) and the argument adds up (0.25π rad – 0.35π rad = –0.1π rad). You can try expressing the complex numbers in polar form, which will help you to identify it clearer. These are really simple stuff, because your head will start to spin when the loci of complex numbers come in. 2.2 de Moivre’s Theorem de Moivre’s theorem states that: for all real values of n, (cos θ + i sin θ)n = cos nθ + i sin nθ. This is a very important relationship we need to know about complex numbers. before we start using it, let’s try to prove it first. PROOF When n = 1, (cos θ + i sin θ)1 = cos θ + i sin θ and so the theorem hold for n = 1. Now, we assume that the theorem is true for n = k, so (cos θ + i sin θ)k = cos kθ + i sin kθ if the equation is true for n = k, it should be true for n = k + 1, and therefore (cos θ + i sin θ)k+1 = (cos kθ + i sin kθ)(cos θ + i sin θ) = cos kθ cos θ + 2i cos kθ sin θ – sin kθ sin θ = cos (k + 1)θ + i sin (k + 1)θ which is true. .·. by mathematical induction, de Moivre’s Theorem is true for all integers n > 0. Let’s try proving for negative numbers too. Let n = –p. since p = –n, cos (-n)θ – i sin (-n)θ = cos nθ + i sin nθ. .·. once again, this theorem is proven. So we see that actually de Moivre’s Theorem is true for all values of n, where n is any integer. We can also show that it is true for fractions, but this is beyond what we can learn. However, one thing to note that if n is not an integer, cos nθ + i sin nθ is only one of the possible values. I will elaborate more on the next post in the section on roots of unity. The most important thing for this section, is that you need to remember how to prove this theorem, and know how to use it. You will be able to simplify a lot of complex number equations by changing the exponents into just multiplication of numbers. Another thing you should note is the relations of negative angles. cos (-θ) = cos θ sin (-θ) = - sin θ You will be dealing with all these a lot. It is good to memorize it, and be careful not to make mistakes. APPLICATIONS 1. I’ll show you an example how de Moivre’s Theorem help you in proving trigonometric identities. Express sin 3A in terms of sin A. sin 3A = Im (cos 3A + i sin 3A) [here, “Im” means imaginary, while “Re” means real.] = Im (cos A + i sin A)3 = Im ( cos3 A + 3 cos2 A i sin A - 3 cos A sin2 A – i sin3 A) = 3 cos2 A sin A – sin3 A = 3 sin A – 4 sin3 A Okay, I need to explain this. Here, we are trying to project the term sin 3A in terms of a complex number, which can be dealt with using de Moivre’s Theorem. So sin 3A, is actually the imaginary part of cos 3A + i sin 3A, and we put the “Im” there because sin 3A belongs to the imaginary part (this means that if our question was cos 3A, we have to put “Re” in front of it instead). We evaluate it, and when we remove the “Im” sign, we remove all the real parts (terms without the ‘i’), leaving the imaginary part without the ‘i’ in it. Try using this method to solve cos 3A, you will understand more by then. 2. If we set z = cos θ + i sin θ, then From here, you can further deduce that With all these, we can do the above example backwards. Express sin3 A in terms of sines of multiple angles. 2.3 Equations (roots of unity, loci, transformation) ROOTS OF UNITY When you take the square root of 1, you will get two answers: 1 and –1. But what happens when you take the cube root of 1? All these while, you studied that there’s only 1 cube root, which is 1 itself. But the actual fact is that it has 3 roots, –1/2 + √3/2 i, –1/2 - √3/2 i, and 1. From here, you start guessing that probably the 4th root of 1 will yield 4 roots, 5th root of 1 will yield 5 roots and so on! You are correct. So in this section, I will teach you how to find the roots of unity. Here unity means ‘one’, not ‘bersatu padu’. You will see this term very often in higher level physics. Let us try finding the cube root of 1. If you use 3√1, you won’t get anywhere. We need to use de Moivre’s theorem. So expressing 1 in polar form, we get |z| = 1, arg z = 2π (It is actually 0, but I recommend you using this instead) and we have 1 = cos 2π + i sin 2π. Let us continue from here: Does it look familiar? Now you might want to try to use de Moivre’s theorem to solve it. But wait! de Moivre’s theorem is true for integers only, and that if n is not an integer, it is only one of the possible solutions. Which means, there are other solutions to be found. Let’s use de Moivre’s theorem first: Now I will teach you how to find the remaining roots: you need to add 2nπ to the angle to get another root. You do this all the way until you reach the angle 2π (or, not exceeding 2π). n here denotes the exponent that you use, here it is 1/3, so you need to add 2π/3 to the angles. This means that we have So we have found the 3 roots of the cube root of unity. By the way, it could have been faster to write if we used the exponential form, so 3√1 = e(2/3)πi, e(-2/3)πi and e2πi (which is actually 1). Did you notice that cos (4/3)π + i sin (4/3)π is actually cos (2/3)π – i sin (2/3)π, and it is actually the conjugate of cos (2/3)π + i sin (2/3)π? So it seems there is a faster way of finding the roots. Putting in mind that 1 is always a root of itself (and –1 is also a root ONLY if n is even, for example the square root, or the 4th root), you just need to add 2π/3 to the angles up to and not exceeding π, and you already know the rest of the roots, because every root’s conjugate is also a root! You can try to solve the roots of unity for the 4th root, 5th root all the way up to the 9th root or beyond. Let me tell you some applications of this: 1. If you were to plot the roots of unity on the Argand diagram, they form a circle with |z| = 1, and they are equally spaced from each other! Here is the Argand diagram for the cube roots of unity: This actually applies to the roots of any number, just that |z| might equal to something else. 2. Using this information for roots of unity, you can find all the real and complex roots of any other number. For example, 4√16 = 4√1 × 2 or 6√-64 = 6√1 × 2i . The terms 2 and 2i refer to the absolute root of 16 and –64 respectively. Using de Moivre’s theorem to find out the 4th root and 6th root of 1, you can just multiply the roots of unity with 2 and 2i respectively to get the required roots for the answer. If you remembered that the 4th roots of unity are 1, –1, i and –i respectively, you get 4√16 = 2, –2, 2i, –2i. Try solving the other one on your own. There are many questions of this kind, for example, solving for z as in z5 = –8 + 8i, or (z + 2i)2 = 4. This needs some practice. 3. Now you will probably face polynomial questions with complex coefficients, like 2ix2 +3x + 3 + 4i = 0. So be ready to know how to solve. 4. Try adding up all the roots of unity. You will be surprised that for every nth root of unity, all the roots add up to 0! Try answering the following question using this piece of information: By considering the ninth roots of unity, show that Hint: Find the all the ninth roots of unity, sum them up (equals to 0), and you’ll prove the equation. Also, Remember what you learned in high school Maths, the 2 complex roots in a quadratic expressions are conjugates of one another. This means that for whatever nth root of unity, other than 1 (or –1), the rest of the complex roots are paired up such that one root is sure to be the conjugate of another. 5. The square of one root is actually the conjugate of itself, which is just another root! So if w is a cube root of 1, then 1 + w + w2 might have 2 answers. If w is a real root, then it will be 3, but it is either of the complex roots, the answer will be 0. Try verifying this. COMPLEX INTEGRATION I hope you have learned the chapter on Integration in Maths T already. You solved the integral by using integration by parts, and handing over from left to right and etc. But you could actually solve such an equation by complex integration. Let me show you how: Do you think this is easier compared to the integration by parts? I thought so, just don’t be careless. LOCI IN THE COMPLEX PLANE Still remember learning loci in form 3 Mathematics? It’s read as ‘lo-sai’ by the way, not ‘lo-kee’ or ‘lo-chee’. Loci is actually just a representation of an equation in a graph or diagram. In a cartesian plane, you should be familiar with the equations for circles, straight lines, or ellipse (please study Coordinate Geometry in Maths T before starting this section). Here, the representation of complex loci in the Argand diagram is similar to the one in polar coordinates (which you do not learn in STPM anyhow). Anyway, I’ll start by introducing you 6 types of common loci. Throughout this section, z is a variable, while z1 or z2 will be complex numbers, and k, r, c or θ are just constants. 1. |z - z1| = r This basically represents a circle, with the centre z1, while r is the radus of the circle. For example, take |z + 2 + 2i| = 2√2, we have 2. arg (z – z1) = θ This is a half line, starting from the point z1, making an angle of θ with the positive real axis, going all the way to infinity. Taking arg (z + 1 + i) = π/4, we have 3. |z – z1| = |z – z2| This is a perpendicular bisector, in between the two complex numbers z1 and z2. To find the distance between z1 and z2, just find |z1 – z2|, and to find the mid-point, use (z1 – z2)/2. Taking |z – 1 – i| = |z + 2 + 2i|, we have 4. |z – z1| = k |z – z2| Adding a constant ‘k’ really changes this loci compared to the one before. This loci turned into a circle instead! It is the ratio between the distances between 2 complex numbers. Probably this is the hardest one, because you need to find out the centre of the circle before you can draw it out. Taking |z – 2| = 3 |z + 2i|, we can’t easily draw it out straight away. Using the fact that z = x + i y and using the definition of modulus, you can change the equation into and you can further solve the equation into an equation of a circle, which is and now we plot the loci onto the Argand diagram, which is 5. arg [(z – z1) / (z – z2)] = θ This one is hard to explain in terms of Cartesian coordinates. It is actually a circle drew out by 2 lines connecting 2 complex number points at a fixed angle. Using the equation The 2 points are (0, 2) and (0, 0) respectively. We draw 2 lines out of these 2 points, such that they intersect at some point which nicely makes the angle π/4. So this is shown below: The loci is not the blue line. It is the arc of a circle traced out by the tip of the arrow shown below. 6. |z – z1| + |z – z2| = k This one plots out an ellipse, where z1 and z2 are the foci of the ellipse, and k is the sum of the distance between z1 and z2. To plot this one, you also need to change the equation into cartesian coordinates as in the previous example. I leave this to you as an exercise. A general ellipse looks like this: Now that you know all the possible types of loci, you need to practice plotting them when given a loci equation. Then, you should also know how to convert them into equations of cartesian coordinates. Besides, you should also know how to transform loci. I give you a few general rules: 1. w = z2 transforms a line into another line, and a circle into another circle. 2. w = (z1 + z) / (z2 – z) transforms a circle into a bisector. 3. w = 1/(z1 – z) transforms a circle into a line. 4. w = kz – c/z transforms a circle into an ellipse, where k and c are constants. 5. w = z̅ also transforms a circle into a circle. I don’t think this is in the syllabus, but I will give you an example. For the transformation w2 = z, find the locus w for |z| = 5. z = 5 (cos θ + i sin θ) w = √z = √5 [ cos (θ/2) + i sin (θ/2) ] .·. w is a circle, with radius √5 and centre at 0. 3. Matrices 3.1 Row & Column Operations (properties of determinants) This section is mostly covered in Maths T. So I will only discuss on properties of determinants. You should possibly know what a determinant is by now, at least for 2 × 2 and 3 × 3 matrices. In Maths T, you are told to evaluate the determinant of a 3 × 3 matrix just by how it is. Here, I’m going to teach you that there is a shortcut operation such that you could calculate the determinant in a faster method. Sometimes, we are required to change the appearance of the determinant to ease us in our calculations. There are several ways to change the determinants without altering its value: 1. add / subtract any row to any other row adding the second row to the first row yields: 2. add / subtract any column to any other column subtracting the first column from the third column yields: 3. add / subtract any multiple of any row / column to any other row / column adding 3 times of the third row to the first row yields: 4. interchange 2 rows / columns and change its sign (+ / –) interchanging column 1 and column 2 yields: 5. factor out a constant k from any row / column factoring out 3 from all 3 rows yield: Note the difference here between matrix and determinant. You only factor out one ‘3’ if it were a matrix. Don’t make mistakes. 6. transpose the determinant I think you understand this without illustration, right? One more interesting fact about determinants is, whenever 2 rows / columns are equal, the value of the determinant is zero. Knowing how you can simplify the determinants, this gives you the advantage when you calculate inverse matrices. You can now calculate the value of the determinant faster! Besides, it can be useful for situations, just like the one below: Factorize the following determinant: So, the answer will be: 3.2 System of Linear Equations (consistency, uniqueness, Gaussian elimination, Cramer’s rule) Are you aware that not all simultaneous equations have unique answers (that means, you can find the unknowns x, y or z absolutely)? Let’s take a look at the equation below: -3x – y + z = -1 If I give you just one equation up there, it is certain that you will not be able to find an absolute answer for x, y and z. In fact, you could substitute any value for x and y, and you get a value for z!. From here you could have guessed that equal amount of equations and unknowns are required for you to solve the problem. This means, if you have 2 unknowns, you need 2 equations to solve for x and y. If you have 3 unknowns x, y and z, the minimum amount of equations you need to solve the problem is 3, and etc. So now let me give you another 2 equations below: -3x – y + z = -1 x + 4y – z = 3 -5x + 2y + z = 2 Are you able to solve these 3 equations simultaneously? Alright, you only have learnt 2 variables, so let me give you another example below: 2x + y = 1 4x + 2y = 3 If you tried solving it, you find out that it is not solvable. A certain amount of unknowns, expressed in the same amount of equations is what we called as a system of equations. It is linear when all the variables, x, y, z, w or so on are to the power of one. So this section, I’m going to teach you how to solve systems of linear equations with 3 variables, and also know how to identify whether a system of equations have unique solutions. CONSISTENCY When we talk about consistency of equations, we are actually trying to determine whether this system of linear equations have a unique solution, infinitely many solutions, or no solutions. Every system of linear equation (I’ll be using 3 variables only in this section) will have either one of the outcomes. Having a unique solution means that there is a definite value for x, y and z, and when x, y and z are not those values, the 3 equations will contradict one another. For example, 2x + z = 0 2x + y + 4z = 1 3x + y + 8z = 1 will be consistent with each other if and only if x, y and z are 3, 3 and –1 respectively. Any other value will be wrong. Now, if a system of linear equations don’t have a unique solution, it has either infinitely many solutions, or no solution at all. Having infinitely many solutions means that for the terms x, y and z are all dependent on another variable, let’s call it t, and t can be ANY VALUE. For example, 3x + y – 2z = –4 x +2y + 3z = 11 3x – 4y –13z = –41 have answers x = 7t - 19, y = 37 – 11t and z = 5t respectively, and t can be any number. As of the example above (the one in blue), it has no solution, because any value of x, y and z will not be able to get any consistent answer. To determine whether a system of solution has unique, infinitely many or no solution, we first need to represent them in the form of a matrix. Taking the blue example above, we have From here, we will determine the determinant of the 3 × 3 matrix. If the determinant gives a non-zero value, then there IS a unique solution for x, y and z. But if the determinant value equals to zero, then we have a non-unique son, in which we have to find out whether x, y and z are linearly dependent of each other, or there just isn’t a solution for x, y and z. Try calculating the determinants of the blue, red and green example to verify whether it is so. It can be quite hard to identify whether the system of equations are linearly dependent (infinitely many solutions) or inconsistent (no solution). Basically, when you only have 2 or 1 equation available for a system of 3 unknowns, that system of equations is definitely linearly dependent. Try checking out the green example. Take the 1st equation, subtract 3 times the 2nd equation, you get –5y – 11z = –37. Taking the 3rd equation, subtracting 3 times the 2nd equation, factoring out 2, you get the same equation –5y – 11z = –37! This means that you actually only have 2 equations for 3 unknowns, which tells that there is infinitely many solutions. To solve this system of equations, you simply let any variable be t, as long as it doesn't give you trouble. For this case, I'll use z = 5t. Substituting it back to any 2 equations, you will get the final answer (x, y, z) = (7t - 19, 37 - 11t, 5t). For a system with no solution, you need to find 2 equations which contradict with one another. Using the orange example, after multiplying 2 to the 1st equation, you get 4x + 2y = 2 4x + 2y = 3 and you are about to conclude that 2 = 3, which is a contradiction! This shows that the system of equations are inconsistent with one another. Equations can get very complicated, and it is not easy to identify contradicting or same equations within the system of equations. But there is a faster way. GAUSSIAN ELIMINATION Before I introduce this method, I need to tell you that the system of 3 linear equations could be represented in an augmented matrix below: The line separates the 3 × 3 matrix and the constants on the right hand side. This is the blue example I take from above. The aim of Gaussian Elimination is to represent the augmented matrix in the row-echelon form before we solve the equation. The rowechelon form of augmented matrices have the following characteristics: 1. The 1st non-zero entry of each row is 1 (which is called the “leading 1”). 2. Below each “leading 1” are all zeroes. 3. Each “leading 1” is placed one position to the right of the “leading 1” in the row above. 4. Any row consisting entirely zeroes (if there is) will be placed at the bottom of the matrix. Confused? Look at the matrix below: In human language, as long as you have a diagonal of 1’s slanting from left to right downwards, and there are 3 zeroes on the bottom left corner, then that is a row-echelon form. So if the last row has all zeroes, so be it, as long as that row is not placed in the middle or the top. In order to transform the blue example above into the row-echelon form, we need to learn some elementary row operations. Let i, j, and k be the labels for the 1st, 2nd and 3rd row (I hope you are able to distinguish a ROW and a COLUMN by now), and c be a constant. There are 3 things we can do to the augmented matrix without altering the final outcome: 1. interchange rows i & j, denoted by Rij. 2. multiply a row with a number c, denoted by cRi. 3. add c times of row j to row i, denoted by cRj + Ri. So to show you how to go about, let’s use the blue example: First, I switched the first and 2nd row, so I’ve got a “leading 1” for my first row. Next, I try to add 3 times the 1st row to the 2nd row to get a zero in the front of the 2nd row, then divide by 11 so that I get the 2nd “leading 1”. Since the 1st and 2nd row are done, I now solve for the 3rd row, and so happen that the 3rd row has 3 zeroes on the left (which has to be at the bottom, remember?). Converting this augmented matrix back to the system of equations, I have: which has a contradiction! Therefore there is no solution for this system of equations. Take note that there is a difference between no unique solution and no solution, and both mean very different things! Now, try using Gaussian elimination on the red and green example. For the red one, you will get the last row z = –1. By using back substitution, you will solve for x and y easily. For the green one, you will eventually get the last row filled with all zeroes. Please do lots of exercises on Gaussian Elimination, as you don’t want to make stupid mistakes in exam. CRAMER’S RULE Consider a system of n equations in the n variable x1, x2, ..., xn, expressible in matrix form as AX = B, where A is an invertible matrix. Let A1 be the matrix obtained by replacing the ith column of A with the n × 1 matrix B, Then the solution to the system is given by I suppose you don’t like definitions in alien language. Cramer’s rule is applicable for systems of linear equations which have unique solutions only (probably when you are not having your calculator with you). In human language, Cramer’s rule states that when you have 3 linear equations a1x + b1y + c1z = d1 a2x + b2y + c2z = d2 a3x + b3y + c3z = d3 Your expression for x, y and z are The expression for x, y and z are just fractions of 2 determinants. The bottom determinant in blue are the same for all 3 x, y and z, which is the one you use to determine whether a system of linear equations have unique solutions (notice that if the blue determinant is 0, you don’t get an answer!). The determinant on the top differs by substituting the coefficient of that variable with the d’s. For example, the coefficient of x are the a’s, so just substitute them with all the d’s. I highlighted the d’s for clarity. Now, use the red example above and try solving it with Cramer’s rule. If you feel like freaking your Maths T teacher out, use this method in exam, and risk your marks. :P But as I said earlier, you can actually solve simultaneous equations with 3 variables using your calculator. So probably you will only use this method when asked. 3.3 Eigenvalues & Eigenvectors (diagonalization, Cayley-Hamilton theorem) Before I continue, I need to teach you some basics of transformation. You will learn the further details in Chapter 11. You probably still remembered the transformations you learnt in form 4 Mathematics. There are translations, enlargement, rotation, reflection and etc. Basically, every kind of transformation, whether in 2 dimensions or 3 dimensions, can be represented by a matrix. I won’t be teaching you how, but you need to know this before I continue. So it simply means that a vector (x, y), after being transformed by the transformation T (which can be represented by a matrix) will change its coordinates to (x1, y1). It can be written as Now, in some special cases, you will find that a particular line after undergoing transformation T, will remain unchanged. A good example is the reflection across the line y=x. You can reflect the line y = 2x and it turns into the line 2y = x. However, you find that the line y = x, after the reflection, is still y = x! This line is called an invariant line. An eigenvector is a vector pointing in the direction of an invariant line under a particular transformation. An eigenvector is not unique, for example, the eigenvector for the line y = x could be (1, 1), (2, 2) or etc., but they both mean the same thing. An eigenvector, after the transformation T, will still fall in the same line (same direction, or rather, same ratio between x and y), but not necessarily the same position. So using matrices, we can represent the transformation T as Notice the right hand side. Since I said earlier that the vector (x, y) after being transformed, will still be in the same line (same x:y ratio), it means that it will just transform to another vector (x, y) multiplied by a constant λ. λ is what we called as an eigenvalue. The aim of this section is simple, you just need to know how to identify the eigenvalues and eigenvectors. I’ll be focusing on 3D vectors in this section (you will learn them in detail in Chapter 12). So let’s say we let the transformation M be And then we have As I said, our aim is to find what is the eigenvector and eigenvalue for this transformation. Let’s try to find λ. Doing a little algebra, Now let’s get back to determinants. Looking at the situation, we know that the determinant of the big chunk matrix over there must be zero, because since the eigenvectors are not unique and are non-zero. Besides, there’s no particular x, y or z that we are finding, it is an invariant line, which is dependent on a parameter t (recall your coordinate geometry in Maths T). So we are almost there! By writing down we can find λ by forming a quadric equation, then we can substitute it back to the initial equation to find its eigenvector. Let me show you an example: FInd the eigenvalues and their respective eigenvectors for the matrix So let’s start by finding the eigenvalues. (1 - λ)[(2 - λ)(3 - λ) - 2] – 1(2) + 2(2 – λ) = 0 λ3 – 6λ2 + 11λ – 6 = 0 [this equation is called the characteristic polynomial] λ = 1, 2, 3 So by simplifying the equation and using your calculator, you find that this matrix has 3 eigenvalues. Note that it is possible such that a matrix only has 2, 1, or no eigenvalues. Now that we found the eigenvalues, let’s try to find the eigenvectors. We need to substitute each value of into the equation When λ = 1, We get a system of linear equations, y – 2z = 0 y – 2z = 0 -x + y –2z = 0 So you immediately notice that this is actually a system of linear equations which are linearly dependent. Letting z = t, you have (x, y, z) = (0, –2t, t). However, your answer is not . as you need to substitute a value for t into it. So your answer should be (0, –2, 1). Note that you can also put (0, –4, 2), (0, –2.666, 1.333) as any scalar multiple of any eigenvector is still the same eigenvector. But we really chose the first one for simplicity. Hey, the working is not done yet! When λ = 2, … … the eigenvector is (1, 1, 0) When λ = 3, … … the eigenvector is (2, 2, 1) Try do the working yourself. In the end, you find yourself with 3 eigenvalues and their respective 3 eigenvectors. The finding process may be hard if you are a careless person. So please do a lot of practices on this section. CAYLEY-HAMILTON THEOREM If the characteristic polynomial p(λ) of an n × n matrix A is written p(λ) = (-1)n (λn + bn-1λn-1 + … + b1λ + b0), then An + bn-1An-1 + … + b1A + b0I = 0 Basically what this theorem means is that the λ in the characteristic equation of the matrix can be substituted with the whole matrix itself. Taking the characteristic equation example above, λ3 – 6λ2 + 11λ – 6 = 0 tells us that actually M3 – 6M2 + 11M – 6I = 0 where M is the matrix itself. From here, you could actually find the inverse matrix M-1 a fast way. Post-multiplying M-1 to the equation, we get M3M-1 – 6M2M-1 + 11MM-1 – 6IM-1 = 0 M2 – 6M + 11I – 6M-1 = 0 and from here we will get That is the only reason I think Cayley-Hamilton Theorem is used for. This chapter should end here. However, I think it is better that you know some applications of eigenvalues and eigenvectors. One such application is diagonalization. BAB 4 4. Recurrence Relations 4.1 Recurrence Relations (problem models) I supposed that you have learnt the chapter Sequences & Series in Maths T before you arrive at this chapter. A recursive definition of a sequence specifies one or more initial terms and a rule for determining subsequent terms from those that precede them. A recurrence relations for the sequence {an} is an equation that expresses an in terms of one or more of the previous terms of the sequence, namely, a0, a1, …, an-1, for all integers n with n ≥ n0, where n0 is a non-negative integer. That was the formal definition of recurrence relations. When you say that something is recursive, it means that there is a repetition. So a recurrence relation is basically just an equation which relates a term, with the term before it. Let’s take the arithmetic sequence 1, 2, 3, 4, 5, … till infinity. So the term ‘2’ is derived from the term ‘1’, by adding 1 to it. Similarly, there is the same relationship for all the terms, which is to add 1 to it. We shall denote ‘1’ as a0, which is the initial term. Then, we find that the term a1 which is related to the initial term by the equation a1 = a0 + 1 So after generalizing the sequence, we can conclude that the arithmetic sequence can be represented by the recurrence relations an = an-1 + 1 where n ≥ 0 (non-negative integer). Using this equation, and given the initial condition a0, you can write down the rest of the terms by slowly adding all the way up (just imagine if I asked you to find the term a109!). So now you know that a recurrence relation is just an equation which has an and at least another term an-x. Examples of recurrence relations are an = 6an-2 an = 5an+4 - 2an+3 + n We say that a recurrence relation is homogeneous when it only contains the terms an-x. For example, an = 6an-2 is homogeneous, while an = 5an-1 – 2an-2 + 3 is not, as 3 is not an an-x. term. We say that a recurrence relation is linear when the maximum power of the an-x terms is 1. For example, an = 6an-2 is linear, but an = 6(an-2)2 is not, as its maximum power is 2. The order / degree of a recurrence relation tells us the maximum amount of terms away is the term an related from itself. For example, an = 6an-1 is a first order recurrence relation, while an = 6an-1 + an-3 is a third order recurrence relation. Any recurrence relation with the k-th order requires k amount of initial conditions to be solved. For example, we see that the equation an = 8an-1 + 9an-2 needs 2 initial conditions, a0 and a1 to be defined. In STPM, you will only be dealing with linear and 2nd order recurrence relations, for both homogeneous and non-homogeneous. Now that you know what a recurrence relation is, I will guide you with some basic modelling. You need to learn how to use recurrence relations in a given situation, or question. Let me start with 2 very famous examples, the Fibonacci Numbers and the Tower of Hanoi. RABBITS, AND THE FIBONACCI NUMBERS Leonardo Pisano, also known as Fibonacci, came up with this problem in the 13th century. Suppose a young pair of rabbits (one male and one female) is placed on an island. A pair of rabbits does not breed until they are 2 months old. After they are 2 months old, each pair of rabbits produces another pair each month. He wanted to find a recurrence relation for the number of pairs of rabbits on the island after n months, assuming that no rabbits ever die. Let’s try counting. In the beginning, there were only 2 rabbits. Then in the first and month, there are still 2 rabbits on the island, because they are still not old enough to breed. But in the second month, the pair of rabbits started to breed, and they produce another 2 rabbits on the island, making it 4 rabbits. In the third month, there will be 6, because the old rabbits reproduce, but not the young rabbits. Counting by pairs, we found out that the rabbits grow according to a sequence of 1, 1, 2, 3, 5, 8, 13, … and so on. Take a look at the bunny diagram below. Now, here is the hard part. To solve this problem, you know that there are 2 initial conditions, a0 and a1, which are both 1 (a0 is the starting, which I will call it as month 0, and a1 is for the first month). As we step into month 2, the amount of pair of rabbits will be the number of pairs of rabbits in the previous month (month 1) plus a new line of rabbit which it reproduced (which has the condition of the rabbits in month 0). The progress goes on and every time we reach a new month, we will add up the number of pairs of rabbits in the previous month with the number of pairs of rabbits in the month before the previous month. So in the end, we come up with the famous Fibonacci Sequence, which is represented by the recurrence relation fn = fn-1 + fn-2 I bet you got lost somewhere, but this is the best explanation I could come up with. You can try reading the textbooks, and you might not even understand it at all. We see that the Fibonacci sequence is a 2nd order homogeneous linear recurrence relation. This chapter really needs you to think a lot. Do you know that Fibonacci numbers also exist in sunflower patterns, pinecones, and spiral seashells? Get to know more about Fibonacci Numbers in Nature. THE TOWER OF HANOI Have you played this game before? You are given a chunk of disks of different sizes on the left. Your objective is to transfer all the disks from the left pole to the right pole, only moving one disk at one time, and not stacking a bigger disk onto a smaller disk. At every move, only one disk can be in your hand, and the disk could only be placed in any of the 3 poles. Watch this video to see how others do it: Take 5 textbooks of different sizes to represent the disks, and play this game with your classmates in school. I did that last time… Interesting? A myth created to accompany the puzzle tells of a tower in Hanoi where monks are transferring 64 gold disks from one peg to another, according to the rules of the puzzle. The myth says that the world will end when they finish the puzzle. Detail calculations show that if they move one disk per second, it will take them more than 500 billion years to complete! Anyway, enough of fun stuff. Our goal here is to find a recurrence relation for the minimum amount of moves required to move n pegs from the left to the right. Let’s start from scratch. If there was one disk, you only need one move to solve the problem. If there were 2 disks, you need to take the top disk to the middle peg, transfer the bottom disk to the 3rd peg, and transfer the top disk back on top of the bottom disk on peg 3. So if we have n disks, we can see that we need to move n-1 disks to the middle peg, move the bottom disk to the right, and then move the n-1 disks to the last peg, on top of the bottom disk. The bottom disk only requires one move, but you need 2 moves to transfer the n-1 disks, which is once to the middle peg, then twice to the 3rd peg. So here, we can deduce that the recurrence relation can be represented by Hn = 2Hn-1 + 1 where Hn represents the minimum number of moves required to transfer n pegs from the left to the right pole. The initial condition, H0 is 1 move. I suppose you are terribly confused by now. These are only 2 examples! The hard part of this chapter is to model recurrence relations. The solving part (will be dealt in section 4.2 & 4.3) are actually much easier. Spend more time thinking and try to figure out some of my examples below. 1. A pond with a0 amount of fish will double every month. So for n months, the number of fish can be represented by the relation an = 2an-1. 2. In the first month, you date 1 girl, the second month 2 girls, and the nth month you dated n girls. So the recurrence relation an = n + an-1 will be the total amount of girls you have dated in the first n months. How nasty of you… 3. You have a loan of RM a0 from Along Bukit Beruntung. You now pay RM100 every month to the him, who charges you a rate of 10% increment every month. So the balance you owe the loan shark on the nth month can be represented by the relation an = (1 + 0.1)an-1 – 100. 4. The cash deposit machine in CIMB bank only accepts RM1 coins (if they exist), RM1 notes and RM5 notes. If the order of the deposition matters, the number of ways you deposit RM n into the machine can be represented by the relation an = 2an-1 + an-5. [5th order recurrence relation!] 5. If you can climb up a flight of stairs by taking either one step or two steps at one time, the recurrence relation for the number of ways to climb n stairs can be represented by the equation an = an-1 + an-2. 6. You are laying tiles on a walkway in a single line. You can only lay either red, green or blue tiles, in which no 2 red tiles are adjacent to each other, and the tiles of the same color are considered indistinguishable. The recurrence relation for the number of ways to lay out a walkway with n tiles is an = 2an-1 + 2an-2. [Go think about it. This is hard…] 4.2 Homogeneous Linear Recurrence Relations (2nd order, constant coefficients) Recall that you learnt in the previous section how to model a situation using recurrence relations. The equations are helpful, however, it doesn’t really help much if you are searching for a huge term. For example, the relation an = 2an-1, given the initial condition a0 = 1, finding the term a109 will be tiring, as it will take you forever to get there. When we say that we solve a recurrence relation, it means that we are trying to convert the relation into an equation in terms of n instead of an, which obviously, would be easier for you to calculate the nth term. In this section, I’ll be showing you how to solve 2nd order homogeneous linear recurrence relations. The non-homogeneous part follows from here in the next section. 2 DISTINCT ROOTS Given a recurrence relation an = 5an-1 – 6an-2, with initial conditions a0 = 1, a1 = 0. To start off with, we let an = rn. This is a smart guess which we will find eventually that it is correct. We can then further deduce that an-1 = rn-1, and an-2 = rn-2. Substituting everything back into the equation, we have rn = 5rn-1 – 6rn-2 dividing the equation by rn-2 (which is the smallest power), we get r2 = 5r – 6 r2 – 5r + 6 = 0 which is a quadratic equation! This equation is called the characteristic equation, and r is called the characteristic root. Solving the equation, we get r = 2, 3. Again, using a smart guess, we deduce that the term an can be represented by the equation an = c12n + c23n So you noticed that the 2n and 3n must have came from the characteristic roots earlier on. This is the general solution of the recurrence relation. The terms c1 and c2 are just 2 constants, which we will find by using the initial conditions. When a0 = 1, a0 = c1 + c2 = 1 (1) When a1 = 0, a1 = 2c1 + 3c2 = 0 2c1 = –3c2 (2) Now you have 2 simultaneous equations. Using the calculator, you can easily find that c1 = 3, c2 = –2. Substituting the constants back into the equation, you get an = 3(2n) –2(3n) which is what we called as the particular solution. This is the final answer that we are looking for. Now that you substitute n = 109, you can get the answer straight away for an! Now that you find the answer, try finding the first 5 or 6 terms, using both the recurrence relation an = 5an-1 – 6an-2 and the equation an = 3(2n) –2(3n). Do they contradict one another? Congratulations, you just learnt how to solve homogeneous recurrence relations! 2 EQUAL ROOTS However, the above method is only true for 2 distinct roots in the characteristic equation. Take another example, an = –4an-1 – 4an-2, a0 = 0, a1 = 1. You get a characteristic equation r2 + 4r + 4 = 0, r = –2. If you take the general solution as an = c1(-2)n, then you are totally wrong. The correct answer should be an = c1(-2)n + nc2(-2)n. Notice the extra multiplied n in the second term. To summarize: 1. If the characteristic roots r1 and r2 are distinct, represent them as an = c1r1n + c2r2n. 2. If the characteristic roots r are equal, represent them as an = c1rn + nc2r2n. Distinct roots could be either real or complex. The method for both is the same. 4.3 Non-homogeneous Linear Recurrence Relations (2nd order, constant coefficients) Consider the following non-homogeneous linear recurrence relation: an = { an-1 + an-2 } + { 3n + n3n + n2 + n + 3 } (1) (2) Part (1) is the homogeneous part of the recurrence relation, which we now call it as the associated linear homogeneous recurrence relation. Part (2) is of our interest in this section, it is the non-homogeneous part. Solving this kind of questions are simple, you just need to solve the associated recurrence relation (just like how you did in the previous section), then solve the non-homogeneous part to find its particular solution. These two sections are solved separately, which we will combine the results together in the end. Example 1 (terms of the form kn): an = 3an-1 + 2n We first proceed to solve the associated linear recurrence relation (a.l.r.r.), which is an = 3an-1 The characteristic equation gives us r = 3, and therefore an = c1(3n) Now that the associated part is solved, we proceed to solve the non-homogeneous part. Using a smart guess, we let an = c22n From here, we then deduce that an-1 = c22n-1. Putting these 2 equations back to the initial recurrence relation an = 3an-1 + 2n, we have c22n= 3c22n-1 + 2n (c2 – 1)2n= 3c22n-1 2(c2 – 1)= 3c2 2(c2 – 1)= 3c2 And so we have c2 = –2, which then gives us an = –2(2n) = –2n+1. Combining both the answers for the associated and non-homogeneous part, we have our general solution an = c1(3n) – 2n+1 If we were given the initial condition a0 = 2, then our particular solution will be an = 4(3n) – 2n+1 This is the the general rule that we follow: For any amount of terms with the form kn, we shall let an be kn multiplied by a constant. So if the non-homogeneous part is an = 5n + 78n, then we let the answer be an = c15n + c278n, in which c1 and c2 are constants to be found. The same goes to the form nkn, in which you let an = c1nkn. However, there is an exception, when the root r is of the same form as kn. For example, an = 2an-1 + 2n You get r = 2, which you will get an a.l.r.r. of an = c12n, which has the same form with the nonhomogeneous part! In this case, you need to multiply your non-homogeneous part with n. Which means, you let an = nc12n and an-1 = (n – 1)c12n-1 And using the same method, you put it back to the initial equation, nc12n = 2(n – 1)c12n-1 + 2n and you find c1 from here. Similarly, if an = 2an-1 + 3(2n) + 5n(2n) you let your non-homogeneous part be an = c1n2n + c2n2(2n) and if an = –4an-2 – 4an-2 + 3(2n) + 5n(2n) which has a double root r = 2, then you will have a non-homogeneous part of an = c1n2(2n) + c2n3(2n) as long when the kn or nkn term is already found in the a.l.r.r. once, then multiply n to all the terms, and multiply n2 if it is found twice. If you are curious why it is so, you could actually try without following this rule. You find that you can’t get the correct answer. Example 2 (polynomial terms, n2 + n + c or etc) an = 3an-1 + n2 + 5n + 3 It is the same for the a.l.r.r., an = c1(3n). But for the non-homogeneous part, we let an = c2n2 + c3n + c4 (1) an-1 = c2(n – 1)2 + c3(n – 1) + c4 (2) I think you might have got the pattern by now. Note that if the equation was an = 3an-1 + n2 + 3 an = 3an-1 + n2 + 5n or an = 3an-1 + n2 we still need to use the above, an = c2n2 + c3n + c4. This is because we need to account for the possibly missing terms which might arise in the particular solution. So, just like example 1, substitute back both the equations (1) and (2) into the initial recurrence relations, then find c2 to c4, and combine with the a.l.r.r. to find c1 with the given initial condition, say a0 = 1. However, there is also an exception for this case, which is when one or two of the characteristic roots r = 1. For example, an = 2an-1 – an-2 + n2 + 5n + 3 You obtain a double root r = 1 for your a.l.r.r.. Since 1n = 1, then your a.l.r.r. will be of the form an = c1 + nc2 which will clash with you equation for your non-homogeneous part if you use the same equation like the above, an = c2n2 + c3n + c4. Instead, you should use an = c2n4 +n3 + c4n2, which is multiplied with n2 to it. Similarly, if it were a first order recurrence relation with one root r = 1, then you multiply n, and if it were a third order recurrence relation with a triple root r = 1, then you multiply n3 (notice the similarity with example 1). Again, you can try doing without following the rules, which will result in you not getting the required answer. 5. Functions 5.1 Inverse Trigonometric Functions (graphs, identities) This chapter will be of less words, but more formulas. What you need to do in this chapter is: 1. memorize the useful graphs, identities and formulas. 2. spend your time trying to derive all the identities. With this 2 points done, you are sure to score for this chapter. STPM questions will be about proving them, sketching graphs, or differentiating and integrating them (which will be covered in the next chapter). You have learnt about trigonometric functions throughout your secondary school years. Now, we let sin y = x. An inverse trigonometric function inverses the trigonometric function, and is denoted as y = sin-1 x. Note that there is a difference between sin-1 x and (sin x)-1. This is only one of the 6 inverse trigonometric functions, the rest of them are cos-1 x, tan-1 x, sec-1 x, csc-1 x, and cot-1 x. Following are the graphs of the 6 inverse trigonometric functions: The domain and the range of the functions are as follows: Now that you the details about these 3 inverse trigonometric functions, it’ll be formulas and identities. Try to remember as many as you can. In fact, make sure you know how to derive every single one of them. Prove the first one by letting x = cos y, the rest follows. Inverse-Forward Identities Forward-Inverse Identities Proving this one is not hard too. Make x = cos y, and make use of the identity cos2 x + sin2 x = 1. The rest follows too. Just that probably the tan(cos-1 x) one will be harder. Give it a try. Inverse Sum Identities Prove the first one by letting x = cos (π/2 – y) = sin y. Try figuring out the rest yourself. sin-1 (-x) = –sin-1 x csc-1 (-x) = –csc-1 x cos-1 (-x) = π – cos-1 x sec-1 (-x) = π – sec-1 x tan-1 (-x) = –tan-1 x cot-1 (-x) = –cot-1 x This one is proven by letting sin y = x, and sin –y = –x. The rest follows. I don’t think this one will come out in exams. However, the proof requires you to learn the inverse hyperbolic in the next section first. I’ll leave this proof to you to try. This is one is the hardest to prove. Try proving using the formula You probably don’t even know that this formula exist. 5.2 Hyperbolic Functions (graphs, identities, Osborn’s rule) The hyperbolic functions, of which there are six, are so named because they are related to the parametric equations for a hyperbola. The 2 main hyperbolic functions are sinh x and cosh x (and so now you know what the ‘hyp’ button on your calculator is for). The hyperbolic functions are actually functions of the natural exponents ex through the following equations: We now relate the hyperbolic functions with the hyperbola. The equation for the hyperbola is We let x = a cosh u y = b sinh u We find that cosh2 u – sinh2 u = 1, which is true (This can be proven by substituting the ex into the equation). Now that we have 2 hyperbolic functions, we use it to further derive a few other functions following a similar convention which the trigonometric function uses: All these 6 hyperbolic functions have their special pronunciation. sinh is read as ‘shine’, cosh as ‘cosh’, tanh as ‘than’, sech as ‘sheck’, csch as ‘co-sheck’ and coth as ‘cough’. Now we shall see the graphs of the 6 hyperbolic functions. Note that they are all derived from the exponential function: cosh x sinh x tanh x sech x csch x coth x Their domain and ranges are as follows: Now that you know the basic information of these functions, it’s time to memorize formulas. But before you start, I need to introduce a special rule which makes the memorizing easier. The Osborne’s Rule states that to change a standard ordinary trigonometric identities into the equivalent standard hyperbolic identity, change the sign of the term which is the product of two sines, and substitute the corresponding hyperbolic functions. This means that if you remember all the trigonometric identities, you can remember the hyperbolic identities. Please note that all the trigonometric formulas which have the periodic characteristics (for example, the R formula and the phase shifts) do not apply to hyperbolic functions, as they are not periodic. For each case, you should be able to derive them. Proving them is simple, just plug in the ex relation into it and you are sure to get it. The formulas and identities are as follows: Double-Angle Formula Besides all these formulas, you should also know the relations between hyperbolic functions and trigonometric functions. Use the following to derive those for tanh x, sech x, csch x and coth x too. Bear in mind that i × i = –1. 5.3 Inverse Hyperbolic Functions (graphs, identities, logarithmic form) nverse Hyperbolic Functions are obtained in the same way as the Inverse Trigonometric Functions. I think I don’t need to explain much, I’ll straight away show you the graphs: cosh-1 x sinh-1 x tanh-1 x sech-1 x csch-1 x coth-1 x Note that due to the definition of functions, we only take the positive y values of the functions cosh-1 x and sech-1 x. The domain and ranges are as follows: There are not much formulas and identities for this section. But there is one very important thing that you are suppose to learn how to prove, which is the logarithmic form of inverse hyperbolic functions. I’ll show you the proof for sinh-1 x: Please promise me that you will learn how to prove the rest, this is super important. Here are some identities to remember. Note that they are quite similar to the inverse trigonometric ones: For all the above identities, please try to prove all of them. Refer to the section inverse trigonometric functions for some hints on the proofs. BAB 6 6. Differentiation & Integration 6.1 Differentiability of a Function (continuity) In Maths T, you already learnt how to prove whether a function is continuous. Now you need to know the relationship between continuity and differentiability. A differentiable function has to be continuous, but it doesn’t mean that a continuous function is differentiable. Using logical propositions, it means that if f(x) differentiable, then it is continuous, but not conversely. Normally, the non-differentiability occurs in graphs with 1. a corner 2. a vertical tangent line 3. a discontinuity 4. at end points For piece-wise defined functions, it is easy to see whether a function is differentiable at the joints. If the joints have different gradients for the different sub-functions, then it is definitely not differentiable. However, there should be a formal definition for differentiability. For a number a in the domain of the function f, we say that f is differentiable at a , or that the derivatives of f exists at a if or exists. You can go on to prove that both formulas are actually the same thing. Of course, differentiability does not restrict to only points. We could also say that a function is differentiable on an interval (a, b) or differentiable everywhere, (-∞, +∞). I’ll give you one example: Prove that f(x) = |x| is not differentiable at x=0. So, f(x) = |x| is not differentiable at x = 0. [proven] 6.2 Derivatives of a Function Defined Implicitly or Parametrically (2nd derivatives) You probably have learnt how to differentiate and integrate functions implicitly and parametrically, but only up to the first order. Here, we will be learning how to continue on to the 2nd order. It is actually very easy and straight-forward, so there is nothing too important in this section. IMPLICITLY I think I don’t need to tell you how to do it. differentiating a function implicitly for 2nd order is just the same as 1st order. I’ll show you an example: Find the 2nd order derivative of the function x2 + y2 = 2. Note the use of the product rule in this question. Just do more exercises, then you will get used to these kind of questions. PARAMETRICALLY Probably there’s something new in this section. Again, I’ll show you an example: Consider the parametric equations x = t + 1 and y = t3. Differentiating each other with respect to t gives To find d2y/dx2, But we cannot differentiate 3t2 with respect to x. Therefore, using chain rule, To summarize it up, finding the 2nd order derivative for parametric equations x and y is by the equation: 6.3 Derivatives & Integrals of Trigonometric & Inverse Trigonometric Functions The derivatives and integrals of trigonometric functions are covered in Maths T. So in this section, I’ll only teach you how to differentiate inverse trigonometric functions. A warning here is that you must study the chapter Integration (especially the part on integration by parts) in Maths T before you come to this section, if not you will get really confused. To find the derivative of sin-1 x, we need to make use of our knowledge on differentiating a function implicitly. We let x = sin y. Differentiating the function implicitly, we have So as a result, we get From here, you can further deduce that the derivations of the derivatives of inverse trigonometric functions should follow the same rule, i.e., differentiating the functions implicitly, then making use of their trigonometric identities. The list of derivatives of all the inverse trigonometric functions are as follows: where a is a constant. You should try to prove each and every one of them as an exercise. You should further try to differentiate these functions with complicated variables using all the differentiation rules you learnt. For example, while Take note that once you differentiate an inverse trigonometric function, it becomes a fraction of polynomials. Do not worry about the anti-derivatives of these inverse polynomial functions now, as I will give you a summary table in the section on Reduction Formulae. However, I want to discuss on the anti-derivative of the inverse trigonometric function itself. For example, I want to find To do this, you need to make use of integration by parts. If you followed the formula in the Maths T formula sheet, it would be However, I suggest that you use this formula which makes you remember easier: Before I continue, let me explain this formula. Normally, you only use integration by parts when you are trying to integrate a product of 2 functions, which are most likely logarithmic, exponential, polynomial and trigonometric functions. So in any case, you let one function be u, and the other function be v. Notice that v has to be a function that is easy to integrate, while u has to be the other one which is hard to integrate / easy to differentiate. In words, this formula can be read as “Integration of u × v = [ u × integrate v ] – integration of (differentiate u × integrate v)” Never mind if you don’t get it, as long you have your own version of I by P. So continuing on integrating sin-1 x, we let u = sin-1 x, and v = 1. We have Get it? So the important tips to this question is to put v = 1 (you might recall that this is the method you use to integrate ln x). So the rest of the functions, after integration gives Try to derive all of them as an exercise. Note that the term ln [x + √( x2 – 1)] is actually a cosh-1 x function. 6.4 Derivatives & Integrals of Hyperbolic & Inverse Hyperbolic Functions The derivatives and integrals of hyperbolic functions and inverse hyperbolic functions are very similar to those of trigonometric and inverse trigonometric functions, just with a difference of a negative sign somewhere within the formulas. There is no rule that we can tell where the minus sign has changed, so this section requires a lot of memory work. HYPERBOLIC FUNCTIONS The derivatives of hyperbolic functions can be derived easily by converting the functions into their exponential form. I’ll leave it for you as an exercise to derive all of them. The list of derivatives are as follows: As you can see, the derivative of sinh x is cosh x, and vice versa, which is different from trigonometric ones by a minus sign. The functions whose derivatives have minus signs are the secondary hyperbolic functions, csch x, sech x and coth x. The integrals, again, are very similar to trigonometric integration. The integrals for sech x and csch x may look a little weird. You should try to differentiate the right hand side and see whether you get the expression on the left. Again, you should do some homework to derive all of them. INVERSE HYPERBOLIC Again, the inverse hyperbolic functions have similar derivatives to what the trigonometric functions have, and it is just a matter of a minus sign, with or within the square roots. Deriving is similar: derive them implicitly and make use of the hyperbolic identities (do not confuse with the trigonometric ones. Remember Osborne’s rule). Here you go The integrals, as usual, are harder to do. You need to use integration by parts, as I said in the previous section. Try doing them as how you did for the previous section. As a matter of fact, the huge ‘ln’ terms in the integrals of csch-1 x and sech-1 x are just logarithmic forms of cosh-1 x and sinh-1 x. 6.5 Reduction Formulae SUMMARY OF PREVIOUS SECTION Before I start, let me just give you some results of combining all of the derivatives and integrals of trigonometric, inverse trigonometric, hyperbolic and inverse hyperbolic. This will give you a clearer picture of what you have learnt for the past 2 sections: 1. The Integrals of the Inverse Polynomials Here I reorganize the tables of integrals for your reference: As you can see, there is a pattern that you can easily memorize. It’s either of the form a2–x2, x2– a2 or a2+x2, whether with the square root or not. You also see that they are all quadratic expressions, in which you could use the method of completing the squares to solve similar cases. For example, Also, make sure that the coefficient of x is always 1. Another example, Notice that if you didn’t, you would have got a different answer. 2. Trigonometric & Hyperbolic Substitution Examples of integration like can’t be solved by normal ways. You might have learnt one trigonometric substitution to solve this kind of questions in Maths T. But now that you have learnt hyperbolic functions, your vocabulary of substitutions increases to 3 of them. Whenever you face the integrals of this kind, you will: 3. Some extra tips on integration These are just some short notes that I jotted down while I was studying for this chapter few years ago. I thought I might wanna share with you all: a. This kind of integration makes use of the half angle formula. This applies to hyperbolics as well. b. From here, you do integration by parts, with t2 as u and the term in the bracket as v. c. Notice that it must be e2x. Here you use the substitution ex = sinh x. Similarly, if the term in the square root was e2x – 1 or e2x + 1, you substitute ex as cosh x or sin x respectively. Try and see whether it works. d. You might want to try proving this before you use it. This will be useful for the next section. e. I actually learnt this in University. You should remember this by memory, it might come useful. Alright, let’s get into the topic: REDUCTION FORMULAE A reduction formula is an expression of a definite integral in terms of n, relating the integral to a similar form of itself. For example, which can be represented as Notice that firstly, it is a definite integral, which means that it has upper and lower limits. Then, it relates to itself, with a decrease of power or so. These formulae can be very helpful, especially when you calculate high powers of these functions. So if you want to find You can use the reduction formula to get which is easily solvable. Solving is easy, but the harder part is the proof. It can be very very complicated and tedious if you are doing this for the first time. It is not easy to straight away identify how to integrate (as in who is the ‘u’ and who is the ‘v’ if you’re using integration by parts), and sometimes, you take hours to solve just a simple question. I’ll show you the proof for the above example so you’ll know what I mean. Using my famous colour coded integration by parts formula, we have handing over the sinn x term from the right to the left, we get Complicated? Unfortunately, most exam questions on Reduction Formulae are all on proving them. Since you need A LOT of exercises (seriously, I bold it because this is no joke), I’ll give you some examples for you to prove. Not enough? There’s more: and more… Not hard enough? Try 2 variables then: Hope you haven’t start to freak out yet. I seriously haven’t tried proving all these Reduction Formulae, so if you have done so, I salute you. I can give you some tips here though: 1. Break down cosn x = cos x cosn-1 x and tann x = tan2 x tann-2 x. 2. Try checking out the expressions on the right. When there’s a n – 1, you know that the term with the power of n needs to be differentiated once, and n – 2, will be differentiate twice. m + 1 means that term will be integrated. 3. For those which are related to polynomials and roots, you will find the formula d. above very useful. 6.6 Applications of Integration (length of arc, surface area of revolution) You probably have learned how to find the area enclosed between the function f(x) and the axes, or between 2 functions. You have also learned the volume of revolution for a function f(x) with the x or y-axis as the axis of rotation. In this section, you’ll be learning 2 new applications, which are the arc length and the surface area of revolution. ARC LENGTH Consider 2 points, P and Q, on a curve. P is the point (x, y) and Q is the point (x + δx, y + δy). Let s be the length of the arc from a point on the y-axis, and δs the length of the arc PQ. Since δs is very small, we can approximate the arc PQ to a straight line. Hence, using Pythagoras’ theorem, we have (δs)2 = (δy)2 + (δx)2 Dividing by (δx)2, we obtain As δx → 0, this gives and after square-rooting both sides, we end up with The parametric form of s can be obtained by dividing the equation (δs)2 = (δy)2 + (δx)2 with (δt)2. While the polar form is probably not in your syllabus, so don’t worry too much. To find the arc length of a particular function, just differentiate it with respect to x, then substitute it in the formula above. SURFACE AREA OF REVOLUTION Let A be the area of the surface formed by rotating the curve y = f(x), between the lines x = a and x = b, about the x-axis. Let the curved surface area of a blue ring shown be δA. Treating the strip as being bounded by 2 cylinders, we have 2πy δs ≤ δA ≤ 2π(y + δy) δs As δx → 0, δs → 0, so we have which gives us the formula Again, differentiate the function, and substitute it into the formula to find the surface area of revolution. BAB 7 7. Power Series 7.1 Taylor Polynomial (remainder theorem) A power series is an expression of a function as a sum of infinite polynomials. Every differentiable function f(x) can somehow be approximated by a series of polynomials, such that f(x) = a + b(x-x0) + c(x-x0)2 + d(x-x0)3 + e(x-x0)4 + … + f(x-x0)n When x is close to x0, and where a → f are constants. If you remembered the Binomial Expansion for real numbers, the function (1+x)r can be represented by the series Compare the Binomial Series above with the formula for f(x). You see that it is just a special case of the above function, such that x0 is zero, and the constants are defined in a special relation. Our question is this: Since we could represent the above bracketed polynomial function as an infinite series of polynomials, so is it possible that we represent other functions, like sin x, ln x, ex or anything else? If it is doable, how do we determine the constants a, b, c and so on as in the function f(x) above? Before we get into our topic Taylor polynomials, let me introduce to you Taylor’s Theorem with Remainder. The theorem states that if a certain function f(x) is (n+1)-times differentiable, then Let me explain this a little. The term a is used when we measure the f(x) close to it. For example, when a = 0, we substitute it into the series, and the new expression will be definitely quite accurate for estimating values x which are close to a (of course, for certain functions, the value x is accurate for whatever value a. We’ll discuss this in the later section). This means, we vary a to approximate the different values of the same function. Then, the term f’(a), f’’(a) are the 1st and 2nd derivatives of the function f(x). Note that the term f(n)(x), the ‘n’ has a bracket, to tell us that it is not the ‘nth power of f’, but the nth derivative of f. The entire series is what we called as Taylor series. All those terms between the equal sign and the Rn are called as the Taylor polynomial, and sometimes we denote this whole chunk of polynomial as pn(x). Writing the whole equation in another form, we have Now, the term Rn(x) is what we call as the remainder term. Since the Taylor series is an infinite series, we won’t possibly write down all the terms of the series. So sometimes we just set our limits, for example, we want the series corrected till the 6th order. So in this case, we see that Rn(x) is the difference between f(x) and the sum of its first 6 polynomials. The remainder term, could also be written as I’ll try to give you an illustration to make you understand how this Taylor Series thingy work. By the way, we are not required to prove the formula for Taylor series. For an example, take the function Using Taylor’s Theorem, we find the Taylor series expanded at x = 0 (which means, a = 0) for this function. By the way, there is a special name for the Taylor series expanded at x = 0, which is named Maclaurin Series. We find f’(x), f’’(x) and so on, substituting them into the formula, we get f(x) = x + x2 + x3 + x4 + … Notice that this function could be expanded by binomial expansion, which is faster. Now look at the graph below. Notice that the blue line sketches the exact graph of the function f(x). As I said earlier, the Taylor series is only an estimation. This means that, the more Taylor polynomial terms we keep, the more accurate the Taylor series estimates the function f(x). Look and see that the graph of degree 1, and degree 2 are actually quite far off from representing f(x), but is quite accurate for values of x near 0. As the degree of polynomial increases, the graph of the Taylor series will eventually be the same as the actual function f(x). So now, we want to learn how to find the series for some functions that we know of. Let’s try ex. Since there can be an infinite amount of Taylor series expanded at any a, we shall focus on deriving the Maclaurin series of functions. Recalling the formula, We find that ex will still be itself after infinite derivatives, and e0 = 1. So plugging in what we have to, we get the Maclaurin series Try finding the Maclaurin expansion for other functions, ln (1 - x), sinh x, and any other functions you can think of. Note that not all Maclaurin series of functions could have such beautiful series. Some might end up with non-ordered coefficients. Below is a list of common Maclaurin expansions: I want you to note a few things: 1. There is no Maclaurin expansion for ln x, because ln 0 is not defined. 2. Notice that the Maclaurin expansion similarities for trigonometric and hyperbolic functions. Here you are able to proof the hyperbolic-trigonometric identities, which relates both the functions. 3. Some expansions are either odd or even. In other cases, there might be missing a power as well, so it is normal for a function not to have all the powers of x. REMAINDER ESTIMATION THEOREM If a function f(x) can be differentiated n + 1 times on an interval I containing a & if M is an upper bound for fn+1(x) on I, i.e., | f(n+1)(x) | ≤ M, then Ignore the alien language first. Continuing from the previous part, the remainder of the series is actually quite significant. When you use a Taylor series to estimate something, you are interested in knowing the error you estimate, or the difference between your estimate and the actual value. If you remembered from the previous section, the remainder is given by the formula The formula gives the exact error when f(x) is approximated by the nth Taylor sum. The problem is that it is too difficult to evaluate it this way, so we are going to find an overestimate of the remainder instead. We look at the magnitude of the (n + 1)th derivative of f(t) as t varies between a and x, and overestimate that by a single number M (known as upper bound, as stated above). So here, we are saying that the remainder is definitely smaller or equal to the upper bound, and thus the formula above, This information is important, as we will use it to 1. Estimate the error between the function and the series 2. Approximate a function to n decimal places I understand that this might be hard for you to catch, so I will give you 2 examples here. EXAMPLE 1 (ESTIMATE ERROR) Find the Taylor series of the function ln x expanded at x = 1, to get a cubic approximation, and estimate the error for ln 2. Have I taught you how to find a Taylor series for a function? We first list the function in terms of what we are looking for. In this case, since it is expanded at x = 1, so the terms are powers of (x – 1). It will be in terms of x or (x + 5) if it is expanded at x = 0 and x = –5 respectively, so ln x = a + b(x-1) + c(x-1)2 + d(x-1)3 Now, we need to find the constants a, b, c and d. You can find all of them by substituting x=1, and by differentiating the left and right side of the function. Which means, which gives you and then To go on, we need to use the formula above. To find M, we need to first find f(n+1)(x), which is – 6x-4. Remember the part above which says | f(n+1)(x) | ≤ M, we find that the maximum value of – 6x-4 is 6 if we use values 1 ≤ x ≤ 2 (interval I containing a), so we have Thus, ln 2 = 5/6 within ± 1/4. EXAMPLE 2 (Approximating decimal places) Use an nth Maclaurin polynomial for ex to approximate e to 5 decimal places accuracy. Find n. (Note that if you are finding f(n+1)(x) = cosn x or sinn x, then M ≤ 1 instead. Useful information.) Now, the different thing here compared to the previous example is that we don’t know n, so we can’t substitute n for any value (in fact, we are looking for n!). But we do have another piece of information, which is, to 5 decimal places. We take that decimal place, give it a ± 50%, and now the we know that the remainder must be smaller than 0.000005. So we have By trial and error, we find that n = 9, then the equation holds. Therefore, To summarize things up, this is what your checklist when you are dealing with such related questions: STEP 1: Write down the series f(x), f(c) (the function substituted with the value you want), STEP 2: Find the interval [a, x] (a is what the series is expanded at, and c is within this limit).. STEP 3: Find M (the upper bound), which is f(n+1)(x) ≤ [something] STEP 4: Write down the equation |Rn (x)| ≤ [the equation above] STEP 5: If required, write down [the equation above] ≤ [amount of decimal points ± 50%] STEP 6: Continue on the estimation. 7.2 Taylor Series (Maclaurin series, limits) Generally, Taylor series has a lot of uses. We can use it to do one of the following: A. DERIVE A GIVEN FUNCTION You were given a list of Maclaurin series in the last section. Now I show them to you again below: These are not all though. You can still find and derive the Taylor or Maclaurin series of other functions like sin-1 x, coth-1 x or lg x2. The method is the same, by listing down the Taylor or Maclaurin series of the functions. For example, sin-1 x = a + bx + cx2 + dx3 + ex4 + … and you substitute x = 0 to get a. To get b, you differentiate once and substitute x = 0, and c, differentiate twice, and etc. The coefficients a, b, c and so on might not have a certain order like the functions listed above, but at least you have a reasonable polynomial to estimate the function in the absence of a calculator. Besides, you could also combine more than 2 functions to find a new Taylor series for them. For example, (1 + x)2 cos x can be derived from Adding and subtracting of functions (like sin x + cos x) or even substitution of variables (like e8x or sin x2) can be easily derived too. B. DIFFERENTIATE AND INTEGRATE THE SERIES TO GET OTHER RESULTS Did you notice that the laws of calculus also obeys the rules of power series? Taking cos x for an example, differentiating both sides, gives This is a very useful information. You can speed up the calculations if you were asked to derive the series of a function which relates to on of the known functions above. By the way, if you were able to find the listing of the polynomials, you would want to learn how to find the summation notation of the derived series as well. Read through your Maths T Sequence & Series, and try to make use of the knowledge you learn there. C. FINDING LIMITS OF FUNCTIONS When you are asked to find the limit of a complicated function as x → 0, you can actually make use of the Maclaurin series of the function. For example, To help you, you might want to learn L’Hôpital’s rule as well. This rule comes really handy in this situation, it states that if f(a) = 0, g(a) = 0, and g’(a) ≠ 0, then Use this rule when you get a 0/0 results. Remember that this rule only holds if the f(a) = 0 thingy is true. D. SOLVING DIFFERENTIAL EQUATIONS NUMERICALLY I believe you already know what are differential equations, just that you only know how to solve a little of them. So here, we are trying to estimate and represent a set of differential equations as a Taylor series, and thus try to estimate the function for values x close to a, when expanded at x = a. I’ll show you an example: Find the Taylor’s series solution for y up to and including terms in x4 for the differential equation Hence, find y correct to 9 d.p. when x = 0.01. BAB 8 8. Differential Equations In Maths T, you learnt how to solve 2 types of differential equations, namely the separable variable and the homogeneous differential equations. In FMT, you will learn how to solve linear differential equations. A differential equation is linear if it is of the form where a is a function of x. It can be solved by introducing an Integrating Factor, e ∫ a dx. This term is multiplied to the left and right of the equation, then we will get integrating both sides, we get Which is an expression of y in terms of x. This method is very simple, let me give you an example: Find the general solution of the differential equation We start by expressing it in the form Which is Now that we know the a, we can find the integrating factor, Note that the integration in the integrating factor doesn’t need a constant, because it will eventually cancel out later. So multiplying it both sides, 8.1 1st Order Linear Differential Equations (integrating factor) In Maths T, you learnt how to solve 2 types of differential equations, namely the separable variable and the homogeneous differential equations. In FMT, you will learn how to solve linear differential equations. A differential equation is linear if it is of the form where a is a function of x. It can be solved by introducing an Integrating Factor, e ∫ a dx. This term is multiplied to the left and right of the equation, then we will get integrating both sides, we get Which is an expression of y in terms of x. This method is very simple, let me give you an example: Find the general solution of the differential equation We start by expressing it in the form Which is Now that we know the a, we can find the integrating factor, Note that the integration in the integrating factor doesn’t need a constant, because it will eventually cancel out later. So multiplying it both sides, 8.2 2nd Order Linear Differential Equations (complementary function, particular integral, general & particular solution, problem models) In this section, we will be learning how to solve second order linear differential equations, both homogeneous and non-homogeneous. HOMOGENEOUS CASE A second order homogeneous linear differential equation has the form where a, b and c are constants. We first give a smart guess (ansatz) that the solution has the form y = Aenx, where A is a constant, and n is an integer. Differentiating it yields and once we substitute all equations into the differential equation, and eliminating Aenx, we get a quadratic equation of the form which we call as the auxiliary equation. From here we can see that y = Aenx is indeed a solution for the 2nd order differential equation, provided that the value of n satisfies this equation. Once we find the values of n, we can thus write down the general solution of the differential equation. However, the equation will give you 3 outcomes, which is either it has 2 distinct roots, 2 equal roots or 2 complex roots. Case 1: 2 Distinct Roots In this case, suppose the auxiliary equation gives you 2 roots n1 and n2. your answer for y will be in the form of Remember that your initial guessed solution for the differential equation was y = Aenx? Notice that if y = Aenx and y = Bemx both are solutions of the the differential equation, then the sum of both the solutions, y = Aenx + Bemx is also a solution for the differential solution. That is why, our solution for y is the sum of both solutions. You may want to prove it. Given the differential equation You find the auxiliary equation to have the values n = –1, –2 respectively. Do try substituting y = Ae-x, y = Ae-2x and y = Ae-x + Be-2x into the equation. All of them are consistent, aren’t they? Case 2: 2 Equal Roots Suppose your auxiliary equation gives you only one value of n. Your answer will be in the form of When there is a repeated root, you multiply it by x. Try recalling the connection of this chapter with what you learnt in the chapter Recurrence Relations. Case 3: Complex Roots Suppose you get 2 complex roots, m + in and m – in. Your answer will then be in the form of Notice the second line of the equation. Remember the fact that e(m+in)x = emx(cos nx + i sin nx), and you get y = emx[ (A + B)cos nx + i(A – B)sin nx ], in which you represent the terms (A + B) and i(A – B) as C and D respectively. You will be surprised that D is actually a real constant, so somewhere on the way, A and B must have been complex. As I said, these are the forms of general solutions that you can get. To get a particular solution, you need to have an initial condition, something like when y = 1, x = 0 or so. The particular solution eliminates the constants ABCD, and gives them in terms of real numbers instead. NON-HOMOGENEOUS CASE A second order non-homogeneous linear differential equation has the form Again, a, b and c are constants, and f(x) is a function of x, which is either a polynomial, a constant, an exponential function, a cosine or sine function, or a combination of any 2. Functions like tan x, sinh x or ln x will be out of your syllabus, in which the solving of these kinds of differential equations will require the Method of Variation of Parameters. Try google for it if you want to know more. The solving method is easy. First you separate the differential equation into 2 parts. You let the first part = 0, and this is solved just as above, by finding the auxiliary equation and then representing the answer in the form of y = g(x) = Aenx + Bemx. This solution is called as the complementary function (CF). The other part f(x) will have the solution y = h(x), which is called as the particular integral (PI). Remember that the sum of solutions is also a solution, so our final answer will be y = g(x) + h(x) Since you already know what to do with the CF, we will introduce methods to solve the PI below, which depends on what h(x) is. Case 1: h(x) is a Polynomial Function You should just substitute the PI as a polynomial function. For example, You already know the CF from above, which is y = Ae-x + Be-2x. Then to find the PI, you let y = Ax2 + Bx + C, according to the degree of the polynomial. Differentiating, you get Substituting it back, we get 2A + 3(2Ax + B) + 2(Ax2 + Bx + C) = x2 + 4x –3. Solving for ABC, you get A = 1/2, B = 1/2, C = –11/2. So in the end, our PI is and the general solution, being the sum of the CF and the PI will be Try not to get confused with the constants of the CF and the PI, in which here, I have 2 A’s and 2 B’s. I would suggest you that you should name the constants for the PI as C, D and E instead. This rule applies for any polynomial of degree n. However, there is an exception, when your auxiliary equation has a root n = 0. Since Ae0 = A, you already have a constant term in the CF. So for your PI, you need to multiply your solution with an extra x. So if your f(x) is 4x + 3, your PI should be Bx2 + Cx instead of Bx + C. Similarly, you can guess that if the CF has a double root n = 0, you will then multiply your PI with x2. Try relating this information with the chapter on Recurrence Relations. Case 2: h(x) is an Exponential Function This is easy. If f(x) = 5e2x, our PI will be just y = Ce2x. Just differentiate y to get dy/dx and d2y/dx2, substitute it into the equation, and find A. Again in this case, there are exceptions. If your CF already has a term Ae2x, then like the above, you multiply x in front of the PI to give you y = Cxe2x. If your CF is y = Ae2x + Bxe2x, then your PI will be y = Cx2e2x, multiplying x2 this time. Not hard I think. If you are given Your CF is the same, y = Ae-x + Be-2x. Your PI will be y = Cex + Dxe-2x, and you should further solve the equation yourself. Case 3: h(x) is a Cosine or Sine Function If f(x) = 5sin 2x, or f(x) = 4cos 2x, or f(x) = 6sin 2x + 7cos 2x, your PI will be the same, which is y = Ccos 2x + Dsin 2x. Notice that whether you have only sines or only cosines, you still have to come up with both cosines and sines for your PI. The reason is simple, if you only come up with one of them, your solution is not solvable. Again, there is an exception, which is when your auxiliary equation might have totally imaginary roots, which happens to give your CF a sine or cosine function of the same form. As usual, just multiply an x in front of your PI. For example, You get an auxiliary equation of n = ±4i, CF of y = A cos 4x + B sin 4x. So, your PI should be in the form of y = Cxcos 4x + Dxsin 4x. Differentiate it (might be complicated), substitute it, find constants C and D, and give the general solution by adding the PI and CF. Should be straight forward. Combinations of functions, like f(x) = x cos 3x, f(x) = xe4x, f(x) = e4xsin 3x shouldn’t be hard for you to solve. The basic rule is if your CF already has a solution with the same form as f(x), then just multiply x to that term. If it doesn’t work, multiply x2 then. SUBSTITUTION If you could recall what you learned in Maths T, you have already learned how to use the substitutions v = ax + by and y = vx to transform a complicated-looking differential equation into one that is solvable. You can apply those skills in 2nd order differential equations too. Other kinds of substitution include x = u0.5, u = xy, but I want your attention on solving differential equations of the form You need to use the substitution From here, find dy/dx and d2y/dx2 by using the chain rule. Which in the end, gives you a differential equation of the form which is solvable. PROBLEM MODELLING Seriously, I have looked through many books, but none of them really teach us about modelling for 2nd order differential equations. You should be familiar with modelling of 1st order differential equations though. So here, I have no choice but to introduce to you some university level stuff. 1. LRC Circuits The potential differences of an inductor, a resistor and a capacitor are denoted by So this means that the total voltage across the 3 elements put in series is equals to I assume you know that L, R, C, and Q means inductance, resistance, capacitance and charge respectively. Here we see that the voltage V is a function of time, which makes it a nonhomogeneous 2nd order linear differential equation. Solving the differential equation means finding an equation which relates the charge to time. 2. Oscillators Remember in physics that a simple harmonic oscillator has the equation of mẍ + kx = 0 where m is the mass, and k is the spring constant. Notice that this is a 2nd order differential equation! Solving this makes you find x in terms of t. A damped oscillator has an extra term in it, mẍ + bẋ + kx = 0 where b is the drag constant. A forced oscillator, in turn would be mẍ + kx = F(t) where the force F is a function of time, probably a sine or cosine function. You could have guessed it, that a forced damped oscillator would be mẍ + bẋ + kx = F(t) With these information, you are able to model a second order differential equation once you know all the factors m, b, k and F. There are a whole lot more of physics equations which requires differential equations, like the famous Schrödinger’s Equation and other higher level stuff, which requires higher level physics. I better stop here before I turn this into a physics lecture instead. 9. Number Theory 9.1 Divisibility (prime & composite numbers, unique factorisation, gcd & lcm, Euclid’s algorithm) Number Theory is considered one of the hardest sections in Mathematics. It is the study of the very fundamentals of numbers, yet can be very complicated. Information on this chapter for such a level of study is very rare, so I hope you will appreciate everything that I have for you over here. We have been learning division since standard 2. But today, we will look at it at a different manner. If a and b are integers with a ≠ 0, then we say that a divides b if there is an integer c such that b = ac. When a divides b we say that a is a factor of b and that b is a multiple of a. The notation a | b denotes that a divides b (which means, there is no remainder). We write a ł b when a doesn’t divide b. For example, 2 | 4, but 4 ł 2. Take note that the notation 2 | 4 and 2/4 are 2 different things. The former is the notation for divisibility, while the latter is simply a fraction. There are certain rules of divisibility that you should know. These are: 1. If a | b, b | c, then a | c. You should know how to prove this. As above, the term a | b can be written as ak = b, bl = c, and therefore akl = bl = c. Here, k and l are integers. 2. If a | b, a | c, then a | (b + c) and a | (mb + nc). 3. If a | b, then a | bc. The above 2 can also be proven with the similar notation as 1. Not every 2 numbers can divide each other. For example, 2 does not divide 7, as it leaves a remainder of 1. Here we represent the above in an equation, which is 7 = 2•3 + 1 Here, 3 is the quotient, we denote the quotient as a div b, which in this case, 2 div 7 = 3. 1 is the remainder, which we denote as a mod b, and here we have 2 mod 7 = 1. Note that a remainder has to be positive. For example, –7 = 2• –3 – 1 is wrong, because it then gives us 2 div –7 = –3 and 2 mod –7 = –1, a negative remainder. It should be –7 = 2• –4 + 1, which in turns give 2 div –7 = –4 and 2 mod –7 = 1. Try doing –2 mod 7 and –2 div 7, and see whether the answers are different. A prime number is a number that is only divisible by 1 and by the number itself. A number which is not prime, is called as a composite number. The smallest prime number is 2, and it goes on as 3, 5, 7, 11, 13, 17, 19… and so on. The interesting thing about prime numbers is that, you are unable to write a formula to determine the sequence or series of prime numbers. So therefore, if we want to find a very huge prime number, we need to slowly divide the number by almost every possible number before we say that it is prime. One very famous example used in the past is the sieve of Eratosthenes, which is used to find all the primes below 100. It is done by first listing down all the numbers from 1 to 100. Then, slowly cross out the multiples of 2, 3, 4 and so on, until you have nothing to cross out. The rest of the numbers, are primes! Another one is The Prime Number Theorem. You might wanna google about it. So how do you know whether a number is prime, for a relatively small number? There is a way to find out, at least a little faster than trying to divide the number by any number smaller than itself. It is found that if a number is not divisible by primes less than its square root, then it is a prime number. This can be proven. If we have a composite number n such that ab = n, then if a > √n and b > √n, then we have ab > √n • √n > n, which is a contradiction. Although it does speed up the process of finding primes, it is still quite a slow method. Prime numbers are the building blocks of all numbers. the Fundamental Theorem of Arithmetic states that: Every positive integer > 1 can be written uniquely as a prime or as the product of 2 or more primes where the prime factors are written in order of non-decreasing size. This is what we called as prime factorisation. For example, 4 = 22, 100 = 2252, 641 = 641 and so on. We can write down any number in terms of products of primes, a = 2x3y5z7w… and so on. There’s a lot to talk about prime numbers. One famous argument was to prove that there are infinitely many primes. Suppose you label every prime number as p1, p2, p3 and so on. You found the greatest prime number in the world, called as pn. So if we write a particular number a such that a = p1p2p3…pn + 1, it must have been a prime, since it couldn’t be represented as the product of any primes smaller than pn. This contradicts with what we said earlier on about finding the greatest prime number, and therefore proves that there are indeed infinitely many primes. Another 2 interesting stuff on prime numbers are the Goldbach’s Conjecture and the Twin Prime Conjecture. Go look up on it if you are free. Now, let’s move on to the gcd and lcm. Try recalling whether this sounds familiar to your Form 1 Mathematics. gcd is the greatest common divisor (you are probably more familiar to the name highest common factor, or HCF), while lcm is the lowest common multiple. Here we denote k = gcd (a, b) to have the meaning of “k is the greatest common divisor of the integers a and b”. Similarly, k = lcm (a, b) means “k is the lowest common multiple of the integers a and b”. For example, gcd (4, 6) = 2 and lcm (5, 6) = 30. Relating this back to prime numbers, for any 2 integers a and b, if gcd (a, b) = 1, we say that they are relatively prime. For example, 5 and 6 are relatively prime. Do you still remember the method to find your lcm and gcd in Form 1? You had to draw out something like a ladder or so. But here, we will use another method, which has something to do with the prime factorization. For example, Find gcd (120, 500) and lcm (120, 500). We first start by representing the numbers 120 and 500 in terms of primes. 120 = 23 • 3 • 5 500 = 22 • 53 Now, the formulas to find the gcd and lcm are easy, it is just gcd (a, b) = p1min(a1,b1)p2min(a2,b2)p3min(a3,b3)…pnmin(an,bn) lcm (a, b) = p1max(a1,b1)p2max(a2,b2)p3max(a3,b3)…pnmax(an,bn) You first compare the primes present among the 2 numbers 120 and 500. p1max(a1,b1) means the maximum of the powers of that particular prime p1 of the 2 numbers a and b, while p1min(a,b) means the minimum. So plugging in the numbers, we have gcd (120, 500) = 2min(3,2) • 3min(1,0) • 5min(1,3) = 223051 = 20 lcm (120, 500) = 2max(3,2) • 3max(1,0) • 5max(1,3) = 233153 = 3000 From here, we obtain a new formula, as we can see that ab = gcd (a,b) • lcm (a,b) The method described for computing the greatest common divisor of 2 integers, using the prime factorizations of these integers, is inefficient. The reason is that it is time consuming to find prime factorizations. Now I will teach you a more efficient method of finding the gcd, called the Euclidian Algorithm (also Euclid’s Algorithm). It is named after the ancient Greek mathematician Euclid, who included a description of this algorithm in his book The Elements. Let’s start with an example. Find gcd (91, 287). First, we use the smaller term to divide the bigger term. Then, we take the divisor of and the remainder of the equation, repeat the process, until we get no more remainder. The last remainder is the gcd that we are finding. So we have 287 = 91 • 3 + 14 91 = 14 • 6 + 7 14 = 7 • 2 ∴ gcd (91, 287) = 7 You might be puzzled as in how did this method work. Basically, this method is formulated from the results if a = bq + r, then gcd (a, b) = gcd (b, r) From a = bq + r I know that if some integer k divides a, it must divide b and r as well. Now I turn the equation around a – bq = r If some integer divides both a and b, then it must divide r. So here, the biggest integer that can divide a, b and r must be the same integer, which is gcd (a, b), and also gcd (b, r). So therefore, the Euclidean Algorithm is valid. 9.2 Modular Arithmetic (linear congruences, Chinese Remainder Theorem) You’ll terribly ‘love’ this section. Consider how you read your time on the clock. Every time the short hand goes one round, it will be 12 hours. So when the shorthand goes past another hour, it will be 13 hours, and the time might be 13 o’ clock. We know, however that 13’ o clock is actually 1 o’ clock. Same to 25 o’ clock, it still means the same thing. We say that the clock follows a modular system. Modular Arithmetic, is the calculations of numbers in a modular system. In the clock’s system, it is of modulo 12. When two numbers a and b are congruent to each other in the same modulo, we denote it by a ≡ b (mod m) This equation is read as ‘a is congruent to b modulo m’. For example, 13 ≡ 1 (mod 12), this means that 13 is the same as 1 in a modulo 12 system. Note that the main equation is the part on the left hand side, 13 ≡ 1, while the right hand side, (mod 12), tells you that this equation is valid only in modulo 12. This modulo system also has another explanation for it. a ≡ b (mod m) means that a and b give the same remainder when divided by m. Notice that 13 divided by 12 gives remainder 1, while 1 divided by 12 also gives the remainder 1. Or using the mod terminology, we say that a mod m = b mod m Take note that a ≡ b (mod m) and a = b mod m both bring different meanings. The latter says that ‘a is the remainder when b is divided by m’. Now, bringing divisibility in, we say that a ≡ b (mod m) if and only if m | (a – b) Can you see that m divides a and b? And if that is the case, a and b actually have a difference of a multiple of m. So this means that, 49 ≡ 37 ≡ 25 ≡ 13 ≡ 1 (mod 12). You just add 12 to the number, you get another number which is congruent modulo 12. If I convert this notation a ≡ b (mod m) into algebra, it can be written as a = b + km, where k is a constant (try verifying this with the divisibility notation above). So to summarize things up: When a ≡ b (mod m), then a mod m = b mod m m | (a – b) a = b + km Before we go into solving linear congruences, we need to know some basic rules of modular arithmetic. These rules below can be proven by yourself, and so try doing it. If a ≡ b (mod m) and c ≡ d (mod m), then 1. a + c ≡ b + d (mod m) 2. a – c ≡ b – d (mod m) When in the same modulo m, the addition and subtraction rules work as usual. This will be useful when you are solving simultaneous modular arithmetic equations. This can be proven by using its algebraic form, a = b + km, c = d + lm. 3. ac ≡ bd (mod m) This is also important, and uses the same method above to prove. 4. ak ≡ bk (mod m) Where k is a constant, a positive integer. I’ll proof this one here for you: When a – b = km, then ak – bk = (a – b)(ak-1 + ak-2b + ak-3b2 + … + abk-2 + bk-1), which is a multiple of (a – b). Therefore, ak – bk = lm, where l is a constant, and therefore ak – bk ≡ 0 (mod m) ak ≡ bk (mod m) 5. ak ≡ bk (mod m) The congruence holds even when a constant is multiplied to both sides of the equation. Same proof as 1, 2 and 3. Next, try proving both the equations below (make use of the information that a ≡ (a mod m) (mod m): 6. (a + b) mod m ≡ [(a mod m) + (b mod m)] (mod m) 7. ab mod m ≡ [(a mod m)(b mod m)] (mod m) 8. The Simplification Law If c | a, c | b, c | m, and a ≡ b (mod m), then To summarize this rule, it means that a constant c can only be divided out from a, b and m if it divides all of them. Provable too. Here’s another one not to be confused with the former, the cancellation law. If gcd (c,m) = 1, then 9. ac ≡ bc (mod m) ⇒ a ≡ b (mod m) You can prove this too. Suppose ac – bc = (a – b)c = km. Since gcd (c, m) = 1, c and m have no common divisors, and therefore c | k. Since c divides this constant k, c can be cancelled out, and thus a – b = nm for some integer n. Here we see that a ≡ b (mod m), which was to be shown. FINDING THE INVERSE b, the multiplicative inverse of a number a is such that ab = 1. Here, we can find that b is actually the reciprocal of the number a. Here in modular arithmetic, we are going to look for an inverse of a, such that ab ≡ 1 (mod m) Let us recall the Euclidean Algorithm. We learnt that we could find gcd (a, m) by dividing the bigger number with the smaller number, and continue to divide the smaller number with its remainder, and so on until there is no remainder. Indeed, we could make use of this information to find the gcd in terms of a linear combination of these 2 integers, such that gcd (a, m) = m • n + a • b where n and b are integers. If gcd (a, m) = 1, then an inverse of a exist, and the integer b happens to be the inverse of a. We will see why this is true in the following example: Find gcd (123, 2347) and write it as a linear combination of these integers, and further find the inverse of 123 modulo 2347. 2347 = 123 • 19 + 10 123 = 10 • 12 + 3 10 = 3 • 3 + 1 3=1•3 ∴ gcd (123, 2347) = 1 Now, to get the linear combination thingy, we have to reverse all of the above equations. Let me rewrite them again: 10 = 2347 – 123 • 19 (1) 3 = 123 – 10 • 12 1 = 10 – 3 • 3 (2) (3) Now we will do some back substitution. We want an equation of gcd (123, 2347) (which is 1) to be in terms of 123 and 2347. We start with equation (3), and substitute equation (2), we have 1 = 10 – 3 • (123 – 10 • 12) = 10 – 3 • 123 + 10 • 36 = 10 • 37 – 3 • 123 Repeating the process with equation (1), 1 = (2347 – 123 • 19) • 37 – 3 • 123 1 = 2347 • 37 – 123 • 706 We have now shown the gcd (123, 2347) in terms of a linear combination of its numbers. This is what we called as the extended Euclidean Algorithm. Here, we find that the inverse of 123 modulo 2347 is –706. We see that -706 • 123 ≡ –86838 ≡ 1 (mod 2347) Note that every integer congruent to –706 modulo 2347 is also the inverse of 123, which we find it best to represent the inverse of 123 as 1641, a positive integer less than 2347. I haven’t tell you why this works. Since gcd (a, m) = 1, and we know that it can be represented as a linear combination 1 = m • n + a • b, we can show that m • n + a • b ≡ 1 (mod m) You should understand this equation. If 1 = 3 – 2, then 1 ≡ 3 – 2 for whatever modulo, and that make sense. Here, since m • n ≡ 0 (mod m), as this is obvious, since m divides itself completely, in whatever given n. So in the end, we have a • b ≡ 1 (mod m), which was what we used just now. Note that not all integers have inverses in a particular modulo. It is only in the case where gcd (a, m) such that there will be an inverse. By the way, the inverse could also slowly be found by trial and error for small moduli. For example, 2 mod 3. Try multiplying the numbers between 1 to 3 to the number 2, and you find that 2 • 2 ≡ 4 ≡ 1 (mod 3). And thus, 2 is the inverse of 2 modulo 3. SOLVING LINEAR CONGRUENCES A linear congruence equation has the form ax ≡ b (mod m) In which we want to find x. If you can relate this to the section above, it has a solution only if gcd (a, m) = 1. This can be solved by finding the inverse of a. Let’s try an example: Solve the linear congruence 3x ≡ 4 (mod 7). We have checked that gcd (3, 7) = 1, and so an inverse of 3 exist, and thus the solution exists. Using the extended Euclidean Algorithm, we get the inverse of 3 as –2. So multiplying –2 to both sides, -2 • 3x ≡ –2 • 4 (mod 7) We know that –2 • 3 ≡ 1 (mod 7), and therefore x ≡ –8 ≡ 6 (mod 7) substituting 6 back into x, you get the answer correct. Besides, substituting any integer which is congruent to 6 modulo 7, like 13, 20, –8, –1 and etc are also solutions of the linear congruence. In cases where gcd (a, m) ≠ 1, there are solutions too, only if gcd (a, m) | b, and there are gcd (a, m) solutions. For example, 2x ≡ 6 (mod 8) has gcd (2, 8) = 2 solutions, but 2x ≡ 5 (mod 8) has no solution. Let’s try to solve the linear congruence 2x ≡ 6 (mod 8). You can solve it as follows: Using the simplification law, you see that 2 divides 2, 6 and 8 and therefore x ≡ 3 (mod 4) which is in another modulo system. If you want the solution to be in the same modulo system, then you need to do some modification. By looking at the equation, you know that x ≡ 3 (mod 8) is one solution. The other solution is by adding 3 to the new modulo system you get above, which is 4. You get another solution, x ≡ 7 (mod 8) So your solution for the linear congruence 2x ≡ 6 (mod 8) is x ≡ 3 (mod 8), x ≡ 7 (mod 8). This same method applies: When there are 10 solutions, you keep on adding the new modulo system integer value to the existing answer, until you get 10 solutions. Let’s try another one, 2x ≡ 6 (mod 9). Using rule number 9 above, you can quickly see that gcd (2, 9) = 1, and therefore x ≡ 3 (mod 9). Try not to confuse this one with the one above. SIMULTANEOUS LINEAR CONGRUENCES Similar to the one above, now you have 2 congruences with 2 unknowns, under the same modulo. Let’s consider a system of linear congruences with 2 unknowns: ax + by ≡ k (mod m) cx + dy ≡ n (mod m) We first write this in matrix form: For this system of congruences to have a solution, there must be an inverse for the matrix. This means, that ad – bc must not be zero, and must exist. Let’s multiply the left and right hand side with its adjoint matrix: Now we get 2 linear congruences, (ad – bc) x ≡ (dk – bn) (mod m) (ad – bc) y ≡ (an – ck) (mod m) and for such linear congruences to have solution, again we must make sure that the equation gcd (ad – bc, m) = 1 holds. With that you can solve the above 2 linear congruences for x and y. This kind of question came out in STPM 2009, my year. Try solving it with the method I just showed you. QUADRATIC RESIDUE MODULO M a quadratic residue modulo m has the form x2 ≡ q (mod m) You are supposed to solve the equation in terms of x. I don’t know of any short cut to solve such a problem, but one way is to list out all the possible values, draw a table, and find the answer. Example, Solve the quadratic residue modulo x2 ≡ 2 (mod 7). We proceed to draw a table: Therefore, we conclude that x ≡ 3 (mod 7) and x ≡ 4 (mod 7). If you have noticed, we could actually solve linear congruences with the above trial & error method too. If m is very big but divisible, we could break the modulo system up. For example, x2 ≡ 14 (mod 35) We can make it into 2 equations, namely x2 ≡ 14 ≡ 0 (mod 7) and x2 ≡ 14 ≡ 4 (mod 5). Tabulating the table, x2 ≡ 0 (mod 7) has the solution x ≡ 0 (mod 7). x2 ≡ 4 (mod 5) has solutions x ≡ 2 (mod 5) and x ≡ 3 (mod 5) x ≡ 0 (mod 7) means that x ≡ 0, 7, 14, 21, 28 (mod 35) are solutions of modulo 35. x ≡ 2 (mod 5) means that x ≡ 2, 7, 12, 17, 22, 27, 32 (mod 35) are solutions of modulo 35. x ≡ 3 (mod 5) means that x ≡ 8, 13, 18, 23, 28, 33 (mod 35) are solutions of modulo 35. We find the intersections of x ≡ 0 (mod 7) and x ≡ 2 (mod 5), we get x ≡ 7 (mod 35) And we find the intersections of x ≡ 0 (mod 7) and x ≡ 3 (mod 35), we get x ≡ 28 (mod 35) And our final answer solution is x ≡ 7, 28 (mod 35) MODULAR EXPONENTIATION I don’t think this is in the syllabus, but it is good for you to know. Modular exponentiations are of the form an mod m. You are normally asked to compute it with a very big value of n. For example, Find 3101 mod 100. First, do you still remember what are binary numbers? Express the term n in binary form, by keep on dividing the number with 2, writing the remainder by the side. Recall your Form 4 Maths: So we get 101 = (1100101)2 = 26 + 25 + 22 + 20 = 64 + 32 + 4 + 1 Substituting it back to the congruence, 364+32+4+1 mod 100 = 3643323431 mod 100 Now, we need to tabulate the amounts congruent to 364 332 34 and 31. 32 ≡ 9 34 ≡ 92 ≡ 81 38 ≡ 812 ≡ 61 316 ≡ 612 ≡ 21 332 ≡ 212 ≡ 41 364 ≡ 412 ≡ 81 Now you know all the values, substitute them back into the equation, 3643323431 ≡ 81 • 41 • 81 • 3 ≡ 3 (mod 100) Spend some time understanding my calculations. If not, just pray that it won’t come out in exams. CHINESE REMAINDER THEOREM In the 1st century, the Chinese Mathematician Sun-Tsu asked: There are certain things whose number is unknown. When divided by 3, the remainder is 2; when divided by 5, the remainder is 3; and when divided by 7, the remainder is 2. What will be the number of things? This puzzle can be translated into the following question: What are the solutions of the systems of congruences x ≡ 2 (mod 3) x ≡ 3 (mod 5) x ≡ 2 (mod 7) ? The Chinese Remainder Theorem, named after the Chinese heritage of problems involving systems of linear congruences, states that when the moduli of a system of linear congruences are pairwise relatively prime, there is a unique solution of the system modulo the product of the moduli. I will omit the proof, because I don’t understand it either. Here are the steps to solve this kind of problems: Firstly, for a system of linear congruences with different moduli, x ≡ a (mod m) x ≡ b (mod n) x ≡ c (mod o) We construct a number M being the product of the moduli, M=m•n•o Then, we construct a number Mm, Mn and Mo such that they are the product of the all the moduli in the system other than itself. Which means, Then, find the inverse of Mm, Mn and Mo respectively: ̅ m ≡ 1 (mod m) MmM ̅ n ≡ 1 (mod n) MnM ̅ o ≡ 1 (mod o) MoM And finally, your answer will be: Let’s try to solve Sun Tsu’s problem. x ≡ 2 (mod 3) x ≡ 3 (mod 5) x ≡ 2 (mod 7) M = 105, ̅ 3 is 2. M3 = 35 ≡ 2 (mod 3), inverse M ̅ 5 is 1. M5 = 21 ≡ 1 (mod 5), inverse M ̅ 7 is 1. M7 = 15 ≡ 1 (mod 7), inverse M ∴ x ≡ (2 • 2 • 35) + (1 • 3 • 21) + (1 • 2 • 15) ≡ 233 ≡ 23 (mod 105) Note that you cannot let M3 = 2, M5 = 1 and M7 = 1, as you will get a total different answer. However, the inverses can be any other number congruent to itself in its particular modulo. FERMAT’S LITTLE THEOREM If p is a prime number and a is an integer not divisible by p, then ap-1 ≡ 1 (mod p) ap ≡ a (mod p) This theorem is here for you to identify if a congruence can be solved easily. Similarly, I won’t prove it, so just keep this theorem in mind and use it if needed. 10. Graph Theory 10.1 Graphs (simple, complete, bipartite) In mathematics and computer science, graph theory is the study of graphs, mathematical structures used to model pairwise relations between objects from a certain collection. A graph, G = (V, E) consists of V, a nonempty set of vertices / nodes and E, a set of edges. In other words, a graph is a discrete structure consisting of vertices, and edges that connect these vertices. Each edge has either one or two vertices associated with it (endpoints). An edge is said to connect its endpoints. A graph looks something like this: As you can see, a and b are vertices, while e and f are edges. the edge g is called a loop. The vertex set V = {a, b}. In this section, there will be many terminologies which you should remember, and should be able to write down their definition in your exam. Here we will be learning the different kinds of graphs and their names: An infinite graph is a graph with infinite vertex set (or rather, an infinite number of vertices). The definition of a finite graph is just the converse. Throughout this section, we will only be learning about graphs with finite amount of edges and vertices. A simple graph is a graph in which each edge connects two different vertices and where no two edges connect the same pair of vertices. A multigraph is a graph that has multiple edges connected to the same vertices, while a pseudograph is a graph that may include loops, multiple edges connecting the same pair of vertices. The 3 pictures below illustrate a simple graph, a multigraph and a pseudograph: Notice for the multigraph, there are 2 edges connecting both a to b and a to c, while 3 edges connecting e to f. As for the pseudograph, there exist loops at the vertices e and f. The complement of the graph, G̅, has the same amount of vertices as graph G but whenever there is a edge between vertices a and b, there won’t be an edge, and whenever there isn’t an edge between vertices a and b, an edge is added to it. This only applies to simple graphs. For example, below is the graph and its complement: All the above graphs are undirected, that means that one can traverse an edge in both directions. A directed graph (or digraph), consists of a nonempty set of vertices and a set of directed edges (or arcs). Each directed edge is associated with an ordered pair of vertices. The directed edge associated with the ordered pair V = (u, v) is said to start at u and end at v. In other words, we say that u is adjacent to v, while v is adjacent from u. Notice thee different uses of { } and ( ) brackets for undirected and directed graphs. Below is a directed graph: For the ordered pair of vertices (u, v), we say that u and v are adjacent, and we say that the edge is incident / connects u and v. u is known as the initial vertex and v being the terminal vertex. Using the similar naming convention, we can describe a simple directed graph as a directed graph in which each edge connects two different vertices and where no two edges connect the same pair of vertices. Then similarly, a directed multigraph can be defined. An underlying undirected graph is the undirected graph that results from ignoring directions of edges. It is just the same graph without the arrows. A mixed graph, is a graph with both directed and undirected edges. A converse of a directed graph, is the graph in which its arrows are reversed. For every graph, we could come up with subgraphs, which are graphs that are subsets of the initial graph. For example, the graph can be broken down into 11 subgraphs below: An exercise for you here is that you can try to figure out whether you can determine the total amount of subgraphs, given the values of V and E. A bipartite graph is a simple graph such that its vertex set V can be partitioned into 2 disjoint sets V1 and V2 such that every edge in the graph connects a vertex in V1 and V2. Consider the bipartite graph below: Notice that I coloured the vertices with 2 colours, red and blue. The blue vertices will not connect to any other blue vertex, and the red vertices too, they don’t connect to any other red vertex. The graph is partitioned such that there are two sets or parties of vertices which can be grouped together. To identify a bipartite graph is simple: As long as you can colour adjacent vertices with only 2 colours, then it is a bipartite graph. For example, you colour the first vertex blue. The vertices adjacent to the first vertex must be coloured red, and if you can fit all the vertices with 2 colours such that no two adjacent vertices have the same colour, then it is a bipartite graph. Notice also, that a graph is bipartite if and only if it has no odd cycles. We will learn about cycles in the next session. Now there are a 5 types of special simple graphs I want to introduce: 1. Complete Graph Kn This graph is a simple graph that contains exactly 1 edge between each pair of distinct vertices. In other words, this graph has the maximum amount of edges it can have, and adding any edge between any 2 vertices will turn it into a multigraph. The graphs look as follows: For the K4 graph, it has 4 vertices, and every vertex is connected to the other 3 vertices. By simple calculations, a Kn graph has n vertices, and n(n-2)/2 edges. 2. Cycle Graph Cn This graph, where n ≥ 3, consists of n vertices and edges. Strictly speaking, C2 is not a Cycle graph, as n < 3. Notice that every vertex is only connected to two other vertices. It looks like a regular polygon with n sides. 3. Wheel Graph Wn This graph looks like a wheel with n sides. We obtain the wheel when we add an additional vertex to the cycle Cn, for n ≥ 3, and connect this new vertex to each of the n vertices in Cn, by new edges. A Cn graph has n + 1 vertices and n 2n edges. 4. n-Dimensional Hypercube, Qn This graph, also know as n-cube, is the graph whose vertices represent the 2n bit strings of length n. Two vertices are adjacent if and only if the bit strings that they represent differ in exactly one bit position. I don’t think this graph is in the syllabus, but I think it will be good for you to know: This graph has 2n vertices and 2n-1 edges. Try proving this if you are free. 5. Complete Bipartite Graph, Km,n This graph is just a bipartite graph, in which there is only 1 edge between each pair of distinct vertices across V1 and V2. Note that the number of edges, | E(m, n) | = mn, and there are m + n vertices. Now that we know everything about the structure of graphs, we shall now get into the a little calculations. The degree of vertex is the number of edges incident with it, except that a loop at a vertex contributes 2 times to the degree of that vertex. The degree of a vertex is denoted by deg (v). When deg (0), we say that the vertex is isolated, and when deg (1), then we say that the vertex is pendant. We now want to find the relationship between the sum of degrees of vertices & number of edges. The Handshaking Theorem states that the sum of degree of vertices is double the amount of edges. In equation form, we have This theorem has many implications. One of them is that we know that a graph cannot exist if the sum of degree of vertex is odd. In the case for directed graphs, we denote deg+ (v) as the out-degree, meaning the amount of arcs pointing away from the vertex, while the in-degree is denoted by deg- (v), which is the amount of arcs pointing towards the vertex. Modifying the handshaking theorem, we have 10.2 Paths & Cycles (walk, trail, circuit, cycle, Eulerian, Hamiltonian) We have already learnt the different types of graphs. Now we are going to learn the properties of these graphs. This section, again, will be full of terminologies to be remembered. Some have quite similar meanings, so take note not to confuse them. 1. A walk is an alternating sequence of vertices and edges of a graph. A closed walk is defined to be a walk which starts and ends with the same vertex. 2. A path is a sequence of edges that begins at a vertex of a graph and travels from vertex to vertex along the edges of the graph. A simple path is then a path which doesn’t contain the same edge more than once. The length of a path is the amount of edges that the path contains. 3. A trail is a walk that has no repeated edges. In some cases when the word trail is used, a path could mean a walk that has no repeated vertices. Take note of the double meaning of the word path. 4. A circuit is a path that begins and ends at the same vertex in an undirected graph. A simple circuit is then a circuit with not repeated edges. 5. A cycle is a path that begins and ends at the same vertex in a directed graph. 6. The degree sequence is the listing down of all the vertices not by its name, but by its degree of vertex. It is listed in non-increasing (which means, decreasing lah…) order. To illustrate these 5 terms, I will show you an example below: * You can construct many walks in this graph. An example of a walk here is (v1, e1, v2, e6, v3, e7, v4). Remember that a walk starts and ends with a vertex. * An example of a path is e1, e6, e7 or e2, e5, e7, which both have length 3. * A trail can be just as the walk above, (v1, e1, v2, e6, v3, e7, v4), as long as there are no repeated edges. * A circuit here, can be something like e1, e6, e5, e2 or e3, e6,e2. It will be called as a cycle if it were a directed graph. Note that you can only follow the directions of the arrow for a cycle. * The degree sequence for v1, v2, v3, v4 is 6, 4, 3, 1. Remember that the sum of the degree of vertex should be an even number. CONNECTEDNESS A graph is connected when there’s a path between every pair of distinct vertices of the graph. This means that, if I start walking from any vertex, I am able to reach any other vertex by traversing the edges (by the way, you pass through a vertex, but traverse an edge). In the case of a directed graph, we say that it is strongly connected when there is a path a → b and b → a whenever a and b are vertices of the graph. And then, it is weakly connected when there is at least a path a → b or b → a for any vertex a and b in the graph. Or in other words, a weakly connected graph exists if there is a path between every 2 vertices in the underlying undirected graph. A connected component is just a connected subgraph. For example, the graph has 3 connected components. We can further say that a component is a strongly connected component / strong component if a component is the maximal strongly connected subgraph. Cut vertices (articulation points) are vertices whose removal and all the edges incident with it produces more connected components than the original graph. Cut edges (articulation bridges) are edges whose removal produces a graph with more connected components than in the original graph. ISOMORPHIC GRAPHS We say that 2 graphs are isomorphic if: 1. They have equal vertices and edges, degree sequence and length of simple circuits (these properties are known as the graph invariants). 2. Follow paths that go through all vertices so that the corresponding vertices of the 2 graphs have the same degree. For example, the graphs below: You notice something? A and R are isomorphic graphs. Also, F and T, K and X, and M, S, V and Z are isomorphic graphs. It is easy to identify isomorphic graphs for small amount of vertices, but remember to use the above rule if the graphs are really complicated. EULERIAN TRAILS & CIRCUITS The town of Kӧnigsberg, Prussia was divided into 4 sections by the branches of the Pregel River. These 4 sections included the 2 regions on the banks of the Pregel (A & B), Kneiphof Island (C), and the region between the 2 branches of the Pregel (D). In the 18th century, 7 bridges connected these regions. On Sundays, the residents take long walks through the town. They wondered whether it was possible to start at some location in the town, travel across every bridge without crossing any bridge twice and return to the starting point. Do you want to try and see whether you can find a simple circuit for them? We’ll come back later. An Eulerian Trail (or Euler Path) is a simple trail that contains every edge in the graph, while an Eulerian Circuit (or Euler Circuit) is a simple circuit containing every edge in the graph. For example, An Euler Circuit in the graph is e3, e1, e2, e6, e7, e5, e4, e9, e10, e8. Take note that an Euler path doesn’t necessarily return to the same vertex, but an Euler circuit has to. By the way, the word ‘Euler’ is read as ‘Oil-lerr’, not ‘you-lerr’. It is found that there is a condition for these Eulerian properties to exist: An Eulerian trail exists in a graph if there are exactly 2 vertices with odd degrees. An Eulerian circuit only exists if every vertex in a graph has even degrees. Take a look at the graph above, you will see that every vertex has degree 4. That explains why an Eulerian circuit exist. Now, let’s rephrase our previous question. So are we able to find an ‘Eulerian circuit’ for the Kӧnigsberg bridge? First, we draw the bridges as edges, the river banks and islands as vertices. We get a graph like this: Counting the degree of vertices, we find that these 4 vertices have odd degrees. Therefore, we can conclude that it is impossible to cross all 7 bridges, and come back to the same spot, neither is it possible to cross all the bridge once in any order. Although we didn’t find a solution, we proved that a solution can’t found. HAMILTONIAN PATHS & CYCLES Just now we did for edges, now we do the same for vertices. A Hamiltonian path (or Hamilton path) is a simple path in a graph that passes through every vertex exactly once, whereas a Hamiltonian cycle (or Hamilton circuit) is a simple cycle in a graph that passes through every vertex exactly once. Look at the graph below: The red lines shows a Hamiltonian cycle. Note that the word cycle here might not necessarily mean that it has to be a directed graph. A Hamiltonian path, doesn’t need to start and end at the same vertex. Surprisingly, there is no known simple necessary and sufficient criteria for the existence of Hamiltonian cycles. However, there are 2 theorems over here which might possibly work. I don’t think this is examinable: 1. Dirac’s Theorem: The graph G has a Hamiltonian cycle if the degree of every vertex is at least half of the number of vertices. n, with n ≥ 3. Note that this theorem doesn’t say anything about graphs whose degree of every vertex less than half of the number of vertices. It might, and might not have a Hamilton path, and you have to check it through. 2. Ore’s Theorem: The graph G with n vertices has a Hamiltonian cycle if for every nonadjacent pairs of vertices u and v, deg (u) + deg (v) ≥ n. I believe Hamiltonian cycle questions in STPM won’t be too hard, so don’t worry too much about it for now. Questions on Euler and Hamilton paths and circuits will involve identifying them, or focusing on what you can do to make an Euler or Hamilton circuit exist. OTHER EXTRA INFORMATION These are probably out of the syllabus, but I treat it as extra information for you: 1. Planar Graphs A planar graph is a graph that can be drawn in the plane without any edges crossing. In this case, a region is the area bounded by a circuit in the graph. There are quite a few corollaries for planar graphs, for example, * If G is a connected planar simple graph, then G has a vertex of degree not exceeding 5. * If G is a connected planar simple graph with e edges and v vertices, where v ≥ 3, then 3v – 6 ≥ e * If a connected planar simple graph has e edges and v vertices with v ≥ 3, and has no circuits of length three, then 2v – 4 ≥ e. * Kuratowski’s Theorem states that a graph is nonplanar iff it contains a subgraph homeomorphic to K3,3 or K5. 2 graphs are homeomorphic if they can be obtained from the same graph by a sequence of elementary subdivision, which is the action of putting a vertex in the middle of an edge. The crossing number is the minimum number of crossings that can occur when a graph is drawn in plane where no 3 edges are permitted to cross at the same point. The thickness of the graph is the smallest number of planar subgraphs of G than have G as their union. 2. Euler’s Formula For a connected planar simple graph, The amount of regons can be represented by the equation r = e – v + 2. 3. Chromatic Number We define the chromatic number χ(G) to be the least number of colours needed for a colouring of a graph. Colouring is defined to be the assignment of a colour to each vertex of the graph so that no 2 adjacent vertices are assigned the same colour. Recall what you learnt about bipartite graphs in the previous section. The Four Colour Theorem states that the chromatic number is always less than 4 for a planar graph. We say that a graph is chromatically k-critical if the chromatic number of G is k, but for every edge e, the chromatic number is k – 1 by deleting this edge from G. We could modify the definition of colouring to describe edges too. Edge colouring is an assignment of colours to edges so that edges incident with a common vertex are assigned different colours. The edge chromatic number is the smallest number of colours that can be used in an edge colouring in a graph. The edge chromatic number can actually be found by finding the biggest number of degree of vertex in the graph. 4. Vertex Basis A vertex basis is a set of vertices where there’s a path to every vertex outside this set from vertices of this set, and there’s no path from any vertex in the set to another vertex in the set. In other words, any vertex in a directed graph, which only points outwards but not inwards, is belongs to a vertex basis. 10.3 Matrix Representation (adjacency & incidence, problem models) In many cases and situations, we need to represent a graph in the form of mathematical equations or formulas, as this will help us analyse the graph easier. Before we learn how to use a matrix to represent a graph, we’ll first consider representing a graph in a list. We call it an adjacency list. By the way, the word ‘adjacent’ means ‘next to’. This list shows us the vertices, and its adjacent vertices and how they are related. For example, the graph below is represented by its adjacency list on the right. Notice that some adjacent vertices are double counted due to multiple edges. You should be able to create an adjacency list from a given graph, and also sketch the graph with a given adjacency list. The above adjacency list was for an undirected graph. An adjacency list for a directed graph looks like the one below: Sometimes it really doesn’t matter whether you have dots or circles. Notice that this adjacency list has its top row labelled with the initial and terminal vertices, which differs from the previous one. As you know, anything that can be represented in a table can be represented in a matrix. There are 2 kinds of matrices that we can represent a graph with: 1. Adjacency Matrix Consider the graph above. Once we label down all the vertices, we can represent it as an adjacency matrix, with the rows and columns being the vertices. When there is an edge connecting 2 vertices v1 and v2, then the slot in the matrix will show the number 1, and if there are 2 edges, then it will be 2, and vice versa. Note that if there exist a loop in a vertex itself, it only counts as 1 edge, unlike the degree of vertex when we counted 2. The adjacency matrix of the graph above is like the one below. You don’t really need to label the vertices in the matrix, I put it there for clarity. An adjacency matrix for a directed graph is slightly different. The rows represent the initial vertices, while the columns represent the terminal vertices. Take a look at the graph and its adjacency matrix below: This matrix is the more useful one. It helps us to find the number of paths with a certain length between a pair of vertices. The power of the matrix Mn represents the length of a path. So if we square the matrix above, getting gives us a new matrix which shows us the amount of paths with length 2 from vertex to another. It means, there is 1 path of length 2 from a to a, 3 paths of length 2 from b to a and etc. When I find M3, it will be paths of length 3 and so on. With this, we are able to find the shortest path for one vertex to reach another vertex and also to find the number of paths of a particular length from one vertex to another. In cases where you have graph of vertices more than 4 or 5, you may want to consider multiplying the particular row and column only to find the required answer, as evaluating the whole matrix will be wasting your time. One more thing to note, is that the sum of a row is not the degree of vertex for a pseudograph, since the count of loops will be wrong. You need to double count the loops in order to get the correct degree of vertex. 2. Incident Matrix To create an incident matrix, you need to label all the edges as well. For this matrix, the rows are the vertices, but the columns are the edges. So for every slot in the matrix, it will be 1 if its vertex is connected to that edge and 0 if none of the above. See the graph above, and its incident matrix below: Multiplying this matrix with itself won’t get you anywhere. Notice that every column adds up to 2, except if it is a loop. This is because, every edge is connected to 2 vertices. Again similar to the adjacency matrix, adding up the rows will not give you the degree of vertex if it is a pseudograph. PROBLEM MODELLING Graph theory has many uses, as it can help us solve some complicated, as well as some simple daily problems. In planning for a flight route, you could construct a graph where the vertices are places, edges exist when there is a flight between the two places. In an assignment of a team to do a particular project, we can use graphs to assign who to do what, after knowing the abilities and talents of the individuals. Some examples of graph models: Acquaintance graphs: Vertices represent people. ab is an edge if a and b know each other. Influence graphs: In studies of group behaviour, it is observed that certain people can influence the thinking of others. A digraph can be used to model this: uv is a directed edge if u can influence v. Call graphs: Directed multigraphs can be used to model telephone calls in a network. Here telephones are represented as vertices and each call from a to b is represented by ab. From this graph, we can actually deduce who has changed his phone number by viewing who the new phone line contacts, and who has not used his phone for a long while. Web graphs: The World Wide Web can be modelled as a digraph where each webpage is represented by a vertex. There is a directed edge ab is there is a link on a pointing to b. Precedence graph: Computer programs can be executed more rapidly by executing certain statements concurrently. However, certain statements depends on the results from other statements. Thus we to create a precedence graphs. Here each vertex represents a statement. A directed edge ab means that a must be executed before b. Dual graph: The resulting graph of a map. You represent a map with a graph. For example, The red graph G’ is the dual graph of the existing graph G. The red dots represent a region, and an edge exists between two vertices when the regions are adjacent to each other. A more interesting one is the weighted graph, which is a graph that has a number assigned to each edge. These numbers could represent the distance between 2 cities, the airfare between 2 cities, and we could actually come out with a shortest / cheapest path from a place to the other. Let me give you an example: Let’s say, the alphabets A to G all represent a name of a place, and the numbers represent the time (minutes) to get to each town by car. We want to find the path which takes the shortest time to get from town A to town G. To do this, we will make use of Dijkstra’s Algorithm. I won’t explain this algorithm in words (as even I don’t understand what the textbook talks about), but I’ll briefly show you how it’s done. Starting from A, you find the shortest edge to its adjacent town. It is D, which takes 2 minutes. Then find the shortest time to reach B, and we see that it takes 7 minutes. Now, all the adjacent towns of A are done, we shall proceed to the adjacent towns to D and B, which are C, E and F. From A, the shortest distance to F, you get there either by passing B or D, don’t you? And obviously, you make use of the route with the shortest time to both of those towns. So make use of those 2 routes, we find that the shortest time to F is 11 minutes (ADF). We further find that the shortest time to E and C are 14 minutes (ABE) and 15 minutes (ABC) respectively. Using these 3 routes, we find the shortest time to get to G, which is obviously, the ADFG route, 22 minutes. Try googling for the Travelling Salesman Problem. It has something related to this issue over here. BAB 11 11. Transformation Geometry 11.1 Transformation (isometries, similarity transformation, stretch & shears) A transformation is a correspondence between 2 sets of points in a plane. A transformation M is described as a linear transformation of n-dimensional space when it has the properties T(λx) = λT(x), and T(λx + μy) = λT(x) + μT(y) where λ and μ are arbitrary constants. Recalling your Form 4 Mathematics, you learned how to find the image of points on the Cartesian plane under a certain transformation. Here you will further learn how to use matrices and some simple linear algebra to represent transformations in 2 dimensions only. An equation of a transformation looks like this: where M is a matrix of transformation. The matrix M, will determine how the point (x, y) will transform into its image (x’, y’). The matrix M is easy to compose. Basically, where (1, 0) and (0, 1) are the unit vectors of directions x and y respectively (or rather, you can treat these 2 vectors as points on the x and y plane). For example, if I want to transform the point (1, 0) to (2, 0), and the point (0, 1) to (0, 2), then my matrix of transformation will be So if you want to find the transformation of a unit box, (0, 0), (1, 0), (0, 1) and (1, 1), just use this matrix and pre-multiply with the points, then you will get the image of the transformation. An example will be given in the next section. Knowing how a transformation matrix works, we now want to learn how to represent a few types of linear transformation with 2 × 2 matrix. We learned the 3 isometries: translation, rotation and reflection in Form 4. Now we will go through them again, and then we will learn some new ones too. By the way, an isometry is a distance-preserving map between metric spaces. Geometric figures which can be related by an isometry are called congruent. This means that, after an isometric transformation, the area remains unchanged. 1. Translation Translation is just the moving of coordinates, moving of an object from one point to another, without altering its size, shape and orientation. The matrix below will represent a transformation where a and b will be the amount of shift of the object. (1, 2) will translate the point (x, y) one step right and 2 steps upward and vice versa. 2. Rotation Given an angle, a point is rotated along the origin either clockwise or anticlockwise. A rotation, once the angle being known, could be represented by the matrix Note that this rotation restricts to rotation about the origin only. We will discuss later what to do if the point of rotation is not zero. The area and the shape of the object is unchanged, and once rotated about 360o, the object gets back to its initial position. 3. Reflection For a reflection, you need a line which acts like a “mirror”, such that the whole image reflects to the other side of the the line, equidistance and perpendicular to that line. This line, in this case, must pass through the origin. Again, the shape of the object doesn’t change, and so is the area. A few common reflection matrices are as follows: along x-axis along y-axis along the line y = x It is actually a little tedious to find the matrix of reflection with only given a line in the form of y=mx. First, you find the normal line, y = – m-1x + c. Substitute the points (1, 0) and (0, 1) to find two parallel normal lines, which passes through these 2 points. Next, you find the intersection point of these 2 lines, with the line of reflection. Taking that intersection point as the mid point, you probably know how to figure out where the reflected points of (1, 0) and (0,1) are, and thus completing your matrix. But there is a faster way. Let the line of reflection y = mx be written in the form of y=(tan θ)x. We see that the gradient m = tan θ. With this information, we find θ, and the reflection matrix is just represented by You can try figuring out why this is true. This has something to do with the angles subtended from the point to the origin, then the angle of the line, the uses of cosine and sine and etc. To find cos 2θ and sin 2θ, you could either calculate θ, or you might want to make use of some trigonometric identities. 4. Scaling Scaling does not preserve the size, but it preserves the ratio of the object. This scaling starts from the origin. Scaling can be represented by the matrix where a is a constant. If |a| > 1, then it is an enlargement. If |a| < 1, then it is a contraction, that means the size decreases. A negative value of a makes the object enlarge or contract at another direction. In the case of the red box above, it will enlarge in the 3rd quadrant instead of the 1st. a also represents the factor of enlargement. a = 2 means that the image will be twice as large as the object, and vice versa. 5. Stretch A stretch looks similar to an enlargement, but this time, the ratio of the sides and shape is not preserved. It can be a stretch along the x-axis, along the y-axis, or a stretch along both axis, with different proportions. A stretch is represented as below: along x-axis along y-axis You probably could have guessed that for values of |a| < 1 turns the stretch into a compression, while a negative value of a stretches the object the other way. For a stretch, it really doesn’t matter whether it stretches from the origin or some other point, as they are the same anyway. 6. Shear A shear deforms a shape a little. It turns a square into a rhombus, as shown above. It looks like as if we are flattening something sideways. The shear can be represented by the matrices below: parallel to x-axis parallel to y-axis 2-way shear at different angles The angle θ is calculated from the opposite axis. For example, the box above undergoes a shear parallel to the x-axis, and the angle is calculated clockwise from the positive y-axis. If the angle was 45o, we say that it is a shear of 45o parallel to the x-axis. Conversely, it can be a shear of xo parallel to the y-axis, which looks like the one below: The shear depends on the origin too. WHEN THE REFERENCE POINT IS NOT THE ORIGIN As I said earlier, these transformations transform with respect to the origin. rotations, reflections, scaling and shears all have their reference points at the origin. In order to make their transformation not from the origin, we need to translate the point of reference to the origin (translating the coordinate of the objects together), do the transformation, then translate the coordinate points back again. I don’t know what is the terminology for this, since this is something I figured out myself. If the point of rotation / scaling / shear is (a, b), with M as the transformation matrix, then (x, y) is transformed as follows: In the case of a reflection, as I said earlier, the reflection matrix above applies only for lines passing through the origin, y = mx. Now that we want to find the reflection of an object across the line y = mx + c, we take (0, c) as the point of reference to be subtracted and added in this case. The transformation will become You can try it out and see whether this is true. You will find that translating any point (a, b) will be correct, as long as the line translates such that it passes through the origin. SIMILARITY TRANSFORMATION Two square matrices A and B that are related by A = P-1BP where P is a square non-singular matrix are said to be similar. A transformation of the form P-1BP is called a similarity transformation, or conjugation by P. Try recalling what you learnt about similar triangles in Maths T. Similarity transformation simply means that the 2 transformation A and B are similar to each other, just that they probably changed their basis, coordinate or are multiplied by a different factor. I don’t have much information on this, so I wouldn’t elaborate much here (please share with me if you have good information on this, I will add it in here some day). However, if you are asked to find whether 2 matrices A and B are similar, just make use of the formula above, and if the equations are consistent, that it is, if not then otherwise. 11.2 Matrix Representation (images, scale-factor, operations) Knowing all the different types of transformation, we shall now get to do the algebra of transformations. Let’s begin with a simple example: Find and describe the image of the triangle ΔABC where A(1, 0), B(2, 0) and C(2, 3) under the transformation matrix . Plotting the new coordinates OA’, OB’ and OC’, we find that the transformation is a reflection in the x-axis (or reflection in Ox). Singular transformation in 2 dimensions maps all shapes are transformed into either a point or a line, and a line is transformed into a single point. In other words, the area of the object is destroyed. Consider the matrices below: The first one maps all shapes to the line y = x. The second matrix maps all points to the x-axis, while the last one maps everything to the origin. You will know that a matrix M is a singular matrix when | M | = 0. There is a way to tell whether a matrix maps to a line or to a point. Consider a singular matrix If the column vector (a, b) = (c, d), then the matrix maps all shapes to a point. If the column vector (a, b) ≠ (c, d) but (a, b) // (c, d), then the matrix maps all shapes to a line. AREA SCALE-FACTOR AND THE DETERMINANT Throughout our discussion on transformations, we haven’t discussed on how the transformation affects the area of an object. We want to know whether a certain transformation makes a certain object enlarged or diminished. It turns out that the determinant of the matrix of transformation tells us information on how the area would be in the end. With the matrix of transformation M, We see that Area of object × det (M) = Area of image In the case when | M | = 0, the transformation maps lines or shapes to a point, and the area is destroyed, in which agrees with the part earlier on. Invariant points are points which map to themselves after the transformation. This means that If you might have noticed, this reminds you on the chapter about eigenvalues and eigenvectors, in which this situation, the eigenvalue is one. To find the invariant points for the transformation M, for example You substitute it into the equation above, then you get x=x y = –y So this tells us that the invariant points of this transformation are any points (x, 0), or simply just the points on the x-axis. Verify yourself to see whether this is true. An invariant line, maps a line to the same line, but not necessarily mapping all the points to the same points. In our study, all invariant lines must pass through the origin, and even if there were invariant lines that do not pass through the origin, it must be parallel (has the same gradient) to another invariant line which passes through the origin. To find the invariant lines under a certain transformation, we make use of the parametric form of the line, x = t, y = at. We substitute the variable t into x and y and we have or to make life easier, we rather put Note that the variable x maps to another variable X, but not to itself. I’ll show you an example: Find the invariant lines of the transformation So we have two equations mx = X x(5 – 4m) = mX Dividing both the equations, we get a quadratic equation m2 + 4m – 5 = 0, m = 1, –5 We have the lines y = x, y = –5x. You might want to test whether the lines y = x + c or y = –5x + c are invariant too. Substitute it back into the equation, For m = 1, x+c=X 5x – 4x – 4c = X + c We get c = 0, ∴ the lines y = x + c are not invariant. For m = –5, -5x + c = X 25x = –5X + 5c Since both are just 1 equation, c is dependent of x and X, and thus y = –5x + c are invariant lines. ∴ The invariant lines are y = x, y = –5x + c, where c is an arbitrary constant. TRANSFORMING LINES Knowing how to transform points, we shall now learn how to transform lines. As in the part on invariant lines, we substitute the parametric equation of x and y, then we solve the equation in terms of X & Y, as the equation below Example, Find the image of the line y = 2 – 2x under the transformation We first substitute the line into the transformation, 2x + 2 – 2x = X = 2 4x + 4 – 4x = Y = 4 ∴ The line transform into the point (x, y) = (2, 4). Notice that in this case, the line is transformed into a point. In other cases if it transforms into another line, remember to find an equation that relates X with Y. You should be aware that this is the very same method you will do if you were to find the transformation of circles, parabolas, hyperbolas, ellipses or other curves. Make use of their parametric equations and substitute them into the equation. Recall the parametric forms of these curves. INVERSE TRANSFORMATION I think I don’t need to elaborate too much on this. An inverse transformation helps us to find the object if the image is given. You find the inverse of the matrix of transformation, and the equation will become From here you should recall that a singular transformation has no inverse. In other words, you can’t find a matrix that transform a single point to 4 other points, or transform a line into a pentagon. ADDITION, SUBTRACTION, SCALAR MULTIPLICATION, COMPOSITION The addition and the subtraction of transformations M and N, M(x) + N(x) = (M + N) (x) M(x) – N(x) = (M – N) (x) Although is defined so, has no geometrical meaning. For example, I add a matrix of rotation of 45 degrees with a matrix of reflection along the line y = x, gives you some awkward transformation, which doesn’t really have a relation to both. But the scalar multiplication of a matrix does mean something, (cM) (x) = c(M) (x) as it has the effect of scaling. Both these operations, I assume you already know how to do so, as this is covered in the chapter Matrices in Maths T. We are more interested in the composition of transformations. Given two transformation M and N, If the an object undergoes transformation M, then transformation N, it can be written as Or we could also write it as (N ∘ M) (x) = x. I think you probably remembered in form 4 that the transformation NM means “transform M first, then transform N”. This is quite straightforward, I think. In exams, you will be asked to find the matrix of the combined transformation of 2 or more transformations. If not, you will be given the points of the object and image, with half of the transformation, then ask you to find the other missing transformation, as well as describing it. Just make use of what you learnt about Matrices. BAB 12 12. Coordinate Geometry 12.1 3D Vectors (scalar & vector product, properties) This chapter will be a continuation and combination of what you learnt from the chapters Coordinate Geometry and Vectors. As we come into 3 dimensions, we make use of vectors as it makes our analysis much easier. Here, we introduce the coordinate systems for threedimensional space ℝ2. The study of 3-dimensional spaces lead us to the setting for our study of calculus of functions of two and three variables later in University. We set up the 3D coordinate system by fixing a point O in space (called the origin) and take three lines passing through O that are perpendicular to each other. These lines are labelled as xaxis, y-axis and z-axis respectively. The direction of the z-axis is determined by the right-hand rule: I think you should be familiar with this rule in Physics. When your fingers point in the direction in the x-axis, and make it curl towards the y-axis, then your thumb will be pointing to the z-axis. Try to get used to this setting: with the z-axis pointing upwards, x on the left, y on the right. A point P in space can be represented by an ordered triple (a, b, c) where a, b and c are projections of the point P onto the x-, y- and z-axis respectively. The three dimensional space is also called the xyz-space. You probably should know how we represent a vector in 3D. Using the same conventions of unit vectors i and j, we just add one more k to represent the unit vector in the z direction (e.g., 2i + 3j – 5k). Everything about a vector in 2D works about the same in 3D. The length of a vector P(a, b, c) follows the Pythagorean relation And similarly, the distance between 2 position vectors A and B can be found by the equation Let’s do a little revision on the properties of vectors, scalar multiplication, addition, subtraction & etc. We let a, b and c be 3 vectors, k and h be 2 constants, then we have (1) a + b = b + a (2) a + (b + c) = (a + b) + c (3) a + 0 = a (4) a + (–a) = 0 (5) k(a + b) = ka + kb (6) (k + h)a = ka + ha (7) (kh)a = k(ha) (8) 1a = a SCALAR PRODUCT Scalar product, also known as the dot product, is a multiplication of 2 vectors (a, b, c) and (d, e, f) such that The scalar product yields an answer in the form of a scalar, which is a value instead of a vector. In trigonometry, it can be represented by the equation a • b = |a||b| cos θ I believe all these are not new to you, as you have studied it in Maths T. However, in this section, we will be going quite detail on the algebra of vectors, unlike in Maths T where you focused more on the applications, namely the resultant force / velocity and relative velocity. Let us look at the properties of scalar products. Given a, b and c are vectors, d being a constant, we have (i) a • b = b • a (commutativity) (ii) a • (b + c) = a • b + a • c (distributive law) (iii) (da) • b = d(a • b) = a • (db) (iv) 0 • a = 0 (v) a • a = |a|2 We say that two vectors are orthogonal to each other when they are perpendicular to each other. Two vectors a and b are orthogonal if and only if a • b = 0. In 3D, we say that a vector a is orthogonal to vectors b and c if a is perpendicular to both b and c. The component of b onto a (or scalar projection) is the resolved part of a in the direction of b. This means that when we have 2 vectors a and b pointing at 2 different directions, with their tail of the arrow connected to each other, the component of b onto a is the length of the orthogonal projection of b onto a. We write the notation compa b to represent the component of b onto a, and mathematically, it has the value and according to the picture above, it is the length of PS. The vector projection of b onto a is just the vector PS itself. it has the formula We write the notation proja b to represent the projection of b onto a. Remember that the answer is a VECTOR, not just a VALUE. For a vector a (ai, aj, ak), The direction ratio is written as ai : aj : ak, whereby your answer could be in the simplest form (divided by its highest common divisor). The direction cosines of the vector a are respectively. The angle between the vector and the z-axis can be found using the equation and therefore you can deduce the angle between the vector and the x-axis & y-axis respectively. Recalling that the dot product of 2 vectors, a • b = |a||b| cos θ, we can easily find the angle between 2 vectors, VECTOR PRODUCT Also known as cross product, the vector product is something new for you, as it cannot exist in a 2D plane. We define the vector product of 2 vectors (a, b, c) and (d, e, f) to be The cross product yields a vector (it has a magnitude and a direction), which is orthogonal to both the original vectors. In trigonometry, the cross product a × b = |a||b| sin θ. You can use the right hand rule to determine the direction of the cross product. Point your fingers to the direction of a, curl it towards the direction of b, then your thumb points in the direction of a × b. This information is very important we come to the section on planes. Different from the dot product, any vector cross itself yields zero. ḭ × ḭ = 0, j̰ × j̰ = 0, k̰ × k̰ = 0 Or in other words, the cross product of 2 parallel vectors is zero. You can use your right hand rule to verify this. For the unit vectors, you could also get the following results: We shall now see the properties of the cross product. If a, b and c are vectors and d is a scalar, then (i) a × b = –b × a (ii) (da) × b = d(a × b) = a × (db) (iii) a × (b + c) = a × b + a × c (iv) (a + b) × c = a × c + b × c (v) a • (b × c) = (a × b) • c (vi) a × (b × c) = (a • c)b – (a • b)c (vii) (a × b) • a = 0 Probably (vi) is hard to remember. (vii) is just the definition of the dot product, where the dot product of 2 orthogonal vectors equals to zero. Also take note that the cross product is not commutative. Reversing the a’s and b’s will result in an extra minus sign. The cross product has many applications, especially in physics. You use the cross product to find the torque, magnetic force and etc. In geometry, we see that the area of a triangle made up by 3 vectors a, b and c is A scalar triple product of vectors a, b and c is a • b × c. If you might have noticed, you have to do the cross product first before the dot product. If you did the dot product first, then you get a scalar crossing a vector, in which by definition, does not exist. Note also that a • b × c = a × b • c. We could evaluate a • b × c using determinant Where a = (a1, a2, a3), b = (b1, b2, b3), and c = (c1, c2, c3) respectively. We use the scalar triple product to find the volumes of various solids. Since b × c is the base area of a solid, when dotted with another vector a, it multiplies the area with the cosine of the height. So the formulas for different solids are as below: 1. volume of cuboid & parallelogram: a•b×c 2. volume of tetrahedron: 3. volume of triangular prism 4. volume of pyramid 12.2 Straight Lines (equation, skew, parallel, intersect) Straight lines in 3 dimensions isn’t as easy as it is in 2 dimensions. When we want to construct a straight line in space, it must be pointing at a specific direction, and you must give at least one point that it passes through. EQUATION OF A LINE Let r be a line in xyz space, we let a and b be 2 vectors and t be an arbitrary constant. The vector equation of a line can be represented by the equation The vector a (x0, y0, z0) is a position vector. It is a point in space in which the line passes through. Then the vector b is a direction vector. This vector determines the direction of the line. The constant t is there, meaning that any scalar multiplication of the direction vector, is also the same direction vector. Summarizing it up, you actually get this: You need some visualization here. Look at the diagram below. The green line L first needs a point a in space. Then you need a direction vector b to tell you where the line extends too. So if you analyse carefully, an equation of a line is not unique. You can put in an infinite amount of different position vectors, or use an infinite amount of direction vectors of the same ratio to construct different line equations, which actually refers to the same line. This is unlike lines in 2D, where a line only has one representation. You might have also noticed that the vector equation of a line is actually a parametric equation of a line. If you break it down, This is where is the position vector a, and is the direction vector b. Probably now you figure out why the line is not unique, since parametric equations are not unique. By the way, we can also write the vector equation as r = ai + bj +ck + t(pi + qj + rk). I don’t like this method as we waste too much time writing the ijk’s and +/- signs. Now if we try to modify the 3 parametric equation, such that it is t in terms of something else, we get the cartesian equation, as below: We normally write this whole chunk of equalities without the ‘=t’, I only show it here for clarity. A line in 3D space has 2 equal signs. So what if p, or q, or both are 0? An example of such lines are You might want to substitute it back into the vector equation to check this out. You probably could have guessed why we prefer to use the vector equation instead of the cartesian equation. WIth all these information, you should be able to know how to construct a line equation, given only 2 points it passes through. SKEW, PARALLEL, INTERSECT? In 2D, lines are either parallel to each other, or they intersect. However in 3D, there exist another relationship between 2 lines, in which they do not intersect and are not parallel to each other. These lines are called skewed lines. Our question is this: how do we show that whether 2 lines are parallel, intersect one another, or are skewed? To show that 2 lines are parallel, we show that they have the same direction vector. The 2 lines below are parallel, because they have the same direction vector. You can further check whether the lines coincide (or, whether they are just both the same line). To do this, we take the point (1, 2, 3) and substitute into (x, y, z) in the second equation. Doing some algebra, we find that the value of s for the 3 parametric equations are not consistent. Therefore, it does not coincide, and is a parallel line. This method also tells us whether a particular point lies in the line. So here we see that the point (1, 2, 3) does not lie in the second line. To show that 2 lines intersect, we let line 1 equal line 2. We get 3 equations. Consider the two lines below: We have -3 + 4t = s -5 + 3t = -9 + 2s -4 + t = 13 – 3s If we could find a value of s and t such that it satisfies all the 3 equations, the lines intersect. If the value of s and t contradict one another, then the lines are skewed. We can further find the point of intersection. By using the values of s and t, substituting them back into the initial equations, we get the intersection point. In this case, the point of intersection is (5, 1, –2). DISTANCE FROM POINT TO LINE Given a line r1 and point r2, to find the distance from the point to the line, we want to make use of the sine of the angle between the line r1 and the line (r2 – a). Look at the diagram below. Recalling that |a × b| = |a||b| sin θ, the distance between the line and the point r2 is DISTANCE BETWEEN 2 LINES To find the distance between 2 lines, we have 2 situations: 1. the lines are parallel Given the two lines, we can make use of what we learnt from the part above, and find that the distance between these 2 lines are just 2. the lines are skewed Given 2 lines, the shortest distance between 2 skewed lines can be found through the equation where k is a constant. Let me explain this a little. The distance between the two lines is r2 – r2. It is parallel to the normal vector (b × d), and that is why we multiply it with k. So after setting up the equation, we get the equation c + sd – a – tb = k(b × d), which is actually 3 parametric equations in terms of 3 variables t, s and k. From here, we solve for s, t and k, and we multiply k to the magnitude of b × d, and thus you get the shortest distance between 2 skewed lines. ANGLE BETWEEN 2 LINES Recalling the formula you learnt in the previous section, You use this formula to find the angle between two lines, by substituting a and b as the direction vectors of both lines. Shouldn’t be a problem for you, I think. 12.3 Planes (equation, intersection, distance, angle) A plane is simply just a flat surface in space. We first start by introducing the vector equation of a plane, where a is a position vector, and b and c are 2 non-parallel vectors, s and t being 2 arbitrary constants. Consider the diagram below, We need to have at least 2 direction vectors to show the direction of the plane, and then a point to know where does the plane lie exactly. We multiply the 2 direction vectors with different constants, to show that any direction vector proportion to that ratio is also a direction vector. Similarly, this form of the plane equation is not unique. Again, this form can be written in the ijk form, in which looks ugly and long. There is another vector equation of the plane. Though not named properly, I call it the ‘normal’ form. We first find the normal vector of a plane, i.e., a vector which is normal to both the direction vectors. You obtain the normal vector by getting the cross product of b and c. Suppose that the normal vector is (a, b, c), the normal form of the equation will be Where d is constant which determines the position of the plane. d has a significant meaning. If the normal vector (a, b, c) is a unit vector (magnitude = 1), then d is the perpendicular distance from the plane to the origin. For 2 planes, if their values of d have opposite signs, it means that they are at the opposite sides of the origin. Finding the value d is simple: Just plug in a point lying in the plane into x, y, z, then you get it. If we evaluate the dot product above, we get the cartesian form, This cartesian form is unique, unlike the other forms. This is the most common form of the equation of planes used. You can see that this equation is linear, and that the equation y = mx + c, or x = a are all equations of planes in 3 dimensional space. So to sum up, to construct a plane equation, you need one of these information: 1. 3 points lying on the plane. 2. 2 points lying on the plane, and 1 directional vector. 3. 2 lines lying on the plane. 4. a point lying on the plane, and the normal vector of a plane. There is a fast way to get the equation of the plane when 3 points are given. I haven’t tried this before, but you could make use of the determinants below to find your equation: LINE LIES IN / PARALLEL / INTERSECT A PLANE We shall now discuss how to determine whether a line lies in / is parallel to / intersects a plane. Given the equations of the line and plane to be We first find whether the direction vector of the line is parallel to the plane. In other words, we want to know whether the direction vector of the line is perpendicular to the normal vector of the plane. By taking b • n, if the answer is zero, then the line is parallel to the plane. We might want to know whether the parallel line actually lies in the plane. We can do this by substituting the position vector of the line into r2, and if LHS = RHS, then indeed the line lies in the plane, and is otherwise if the equality doesn’t hold. So if b • n ≠ 0, this means that the line definitely intersects the plane. The point of intersection can be found by letting r1 = r2, that is, You should be able to solve for t, which satisfies all the 3 parametric equations. Then finally, to find the point of intersection, we substitute t back into the line equation to find (x, y, z). PLANE PARALLEL TO / INTERSECTS ANOTHER PLANE Since the cartesian equation is unique, 2 planes can only coincide one another if they have the same plane equation. 2 planes are parallel only if they have the same normal vector, which is also easy to find. Planes that are not parallel have to intersect somewhere, and we can determine the line of intersection. Consider 2 plane equations below: We first find the common direction by using this will be the direction vector of the intersecting line. To find a position vector of the line, we make use of the cartesian equation of both planes, We need to solve this system of linear equations to find x, y and z. Recall the Chapter on Matrices, this system of equations have infinitely many solutions. As usual, let one of them be t, solve for x, y and z in terms of t, and then just substitute a value for t to get a random position vector. The line equation is thus found. DISTANCE FROM A POINT / PARALLEL LINE TO A PLANE I think I won’t prove this one, as it is similar to the proof in 2D. To find the distance between a point (x, y, z) to a plane, make use of the equation in your Data Booklet: Notice that there is something different in my equation. It is ‘-d’ instead of ‘+d’ because I made use of the cartesian equation ax + by + cz = d instead of ax + by + cz + d = 0. Please DO NOT CONFUSE THEM. If you want to find the distance between a parallel line to the plane (note that the line has to be ‘parallel’ to have a ‘distance’…), you substitute the position vector of the line (x, y, z) into the above equation, and you get it. DISTANCE BETWEEN 2 PLANES Given 2 parallel planes, We can find the distance between them by finding I will explain why this makes sense. Firstly, you should recall that the values d/|n| and e/|n| are the perpendicular distances from the planes to the origin. Also remembering that the distance really depends on whether both the planes lie on the same side of the origin, or the other (same sign or different sign). You subtract them, then take the modulus because distance is never negative. ANGLE BETWEEN LINE AND PLANE Consider a line with direction vector a and a plane with normal vector n. The angle between the line and the plane can be found by using the equation Note that if you used cos θ, you would have gotten the angle between the line and the normal vector instead. ANGLE BETWEEN PLANES The angle between 2 planes is actually the same angle between the 2 normal vectors. So given 2 planes with normal vectors m and n respectively, we can find the angle between 2 planes by using the dot product, Recall that this is the same formula to find the angle between 2 lines. Now that you know how to construct planes, you might be curious as in how 3D shapes are constructed. Again, you could make use of the applet I shared with you in the previous post, from the drop down menu of new graph, choose z = f(x, y) surfaces. Fiddle around it and have fun creating awkward shapes. This is obviously out of your syllabus, but let me just give you some equations for some very common shapes in 3D: cylinders, x2 + y2 = r2 elliptic paraboloid, z = x2 + y2 hyperbolic paraboloid, z = x2 – y2 ax2 ellipsoid, + by2 + cz2 = 1 elliptic cone, + y2 – z2 = 0 x2 x2 hyperboloid + y2 – z2 = 1 BAB 13 13. Sampling & Estimation 13.1 Random Samples (population, parameter, statistic) In statistics, we are always interested to get information from a particular group, be it people, animals, or even non-living things. This group of interest is what we called as a population. A population is a particular group which we need information about in a statistical enquiry. A population can be very big, for example, the amount of hair growing on one’s head, or the amount of people in a country. So some times, we could only gather information from a sample of people. A simple random sample is a sample of size n if all possible samples are equally likely to be selected. So here, we differentiate the terms population and sample, as the sample being the subset of a population. A parameter is an unknown or known numerical characteristics of a population, such as the mean μ and the standard deviation σ. A statistic is a value computed from a sample such as mean x̅ and standard deviation s. Notice the symbols for both cases are different, and we will make use of this convention. So here we can conclude that the parameter is the actual value of a population, while the statistic is a value obtained from samples, which is supposed to be quite close in value to the parameter. In order to get the information required, we need to do surveys. There are 2 main kinds of surveys: 1. Census A census is done to survey on every single member of a population. For a country, they need to do a census to count how many people are there in it. Or in a class, we need everyone to submit their health report, in order to know which blood type do the students belong to. However, there are situations that the census can’t be used. In infinite samples, for example, we have an infinite number of stars, and we can’t measure the brightness of every star to find its mean brightness or distance from the earth. Another example, is testing the durability of light bulbs. To test the average lifespan of light bulbs, you can’t test every light bulb, if not, you’ll destroy the population! 2. Sample Survey A sample survey is done by interviewing / collecting data from only a small group of members within the group, which is the sample. A sample is always less than 100% of the population. For example, we do a survey on 100 residents in Petaling Jaya, to see whether they like it if we replace the McDonald outlet in SS2 with an A&W outlet. Both the census and a sample survey have their advantages and disadvantages. To sum up, a census is good for a small population, and a sample survey is more suitable for a big population. Look at the table below: Before you start sampling, you need to do a few things. First, you need to identify the target population, as in where and who do you want to interview. Next, you determine the sampling units, the people / item to be sampled. If your population is all the primary schools in Malaysia, is your sampling unit the student, the teacher, or the canteen waiter? You have to make it clear. Then, you need a sampling frame. You need a list in which the sampling units within a population are individually named or numbered. Of course the list cannot be complete, or sometimes just couldn’t be generate, as the list of units will change, move in and out, or maybe if they are fish in a pond, they couldn’t be listed down! Once you are done, you can start your survey. Knowing that we can start surveying, we need to know the possible sampling methods. We shall not focus on census in this chapter (the title says it). Now we shall look into a few types of sampling methods: 1. Random Sampling I believe you are familiar with the term ‘random’. It means that you do not choose a sample on purpose, you just simply pick one. There are 3 kinds of random sampling: Simple Random Sample As its name suggest, it is ‘simple’, you don’t need to do any homework to get that sample. You could draw lots, use a random number to choose which unit you want to take the survey. You can make use of a random number table to choose your units. It acts as a large dice, and looks something like the one below: You can use numbers from left to right, following the numbers given. Or you could also close your eyes, and use a pencil to point on a number on the table. For example, in a group of students numbered 1 to 100, you want to choose 5 random students. You can take 2 digit numbers starting from the left of the table, namely 82, 03, 14, 58 and 21 to be the students you want. You could actually use your calculator as a random number generator. On your CASIO fx570MS, press shift - Ran#, then you will get a random number, 3 decimal places, between 0.000 to 1.000. You can use multiplication or division to manipulate the random number to the range you want. Note that there exist 2 kinds of simple random samples, one with replacement, one without replacement. Systematic Random Sample In systematic sampling, you make use of a certain pattern, a certain sequence to find your samples. For example, in a list of 1000 people, you take every kth person to take the survey, depending on your sample size. Stratified Random Sample In a stratified sample, there are many distinguishable layers. For example, in a population of people, they have different age groups, they have different occupations and etc. We take a few units from different age groups, and combine them in one sample in the end. 2. Non-Random Sampling I think I don’t need to elaborate much on this. It is not random, and therefore you choose a unit with a solid and particular reason. There are 2 kinds over here: Clusters Clusters are like natural sub-groups of a population. For example, in a primary school, there are 6 classes in standard 1, with all the kids having the same status. Note that this differs from stratified random sample, since stratas are different, and classes are alike. You choose to study on one cluster, which means that you didn’t randomly pick students from any class in the school. You save a lot of effort, time and money, as you don’t need to pick the survey forms from every class or so. Quotas Quota sampling is widely used in market researches where the population is divided into groups in terms of age, sex, income level and etc. Then when you are about to survey, you already have your plans in mind: I want to survey one person who has high income, has a big family, and another one with low income, with a small family and etc. You already set specific requirements for the members of the population that you are about to interview or collect data from. All these sampling methods have their pros and cons. I summarize them in the table below: In every survey, there will sure be some sources of bias. Obviously, when you are collecting data from a population, you want it to be as accurate as possible, and thus should eliminate any bias in the process of sampling. These biases will cause the survey or data collection to be very inaccurate, and give a wrong picture of what the population really is. Examples of sources of bias are: 1. lack of good sampling frame It’s like using a list of friends generated from your Twitter account. You will miss out those friends who don’t use Twitter. You need a good sampling frame in order that everyone has an equal chance of being sampled. 2. wrong choice of sampling unit In surveying on who has a car at home, you chose the wrong sampling unit ‘people’, since a better sampling unit would be ‘household’, since children don’t drive. 3. no response by some chosen units Some people just choose to answer your survey questions for God-knows-what reason. Then, your questionnaire might have some questions in which they don’t have much choice to answer with. For example, they don’t respond the question “do you like Subway Sandwiches? Yes / No” when they don’t even know that such outlet exist. 4. introduced by the person conducting the survey The person conducting the survey might already have a conclusion in mind, and tries to make his survey results to suit his mindset. For example, on the question “Which party will do a better job in the next General Elections?” If the surveyor is a Pakatan Rakyat supporter, he might influence the person taking the survey to agree with his stand. SIMULATING RANDOM SAMPLES There are many ways to get random samples, just like what we did above. We used a random number table, or using the random number generator from the calculator. But now, we want to simulate random samples from a given distribution. There are 2 kinds of distributions that we can obtain a simulated random sample: 1. Frequency Distribution A frequency distribution looks something like this: It has a value x and a frequency. Let’s say, I would like to generate a sample of size 6 from this population. For data like this, we could not just simply use a calculator to randomly get the numbers 1 to 4 as our sample. It has a frequency, or rather a weightage of how we should randomly choose the numbers. So what we can do is we can tabulate a table, making use of its cumulative frequency. Using this table, we can finally tabulate the random sample. For example, now that we have a random number as 04938581365399, so we can get the numbers 4, 93, 85, 81, 36, 53, which corresponds to the values of x being 1, 4, 3, 3, 2, 3 respectively. We have finally got our random sample from the frequency distribution. 2. Probability Distribution The method is the same as the above, we create a cumulative frequency, and change the base to be over 1, then use the generated random numbers to find the random samples. There are a few kinds of probability distributions: probability distribution This one is not hard. We find the cumulative frequency, then Binomial distribution X ~ B (n, p) Hope you still remember the formula, P(X = x) = nCxpxqn-x. For example, we take X ~ B (3, 0.4), then we have Poisson distribution X ~ P0 (λ) The formula is We tabulate the table for X ~ P0 (4) Probability density function It can be something like We should find its cumulative density function, From here, we let the random generated number 0 ≤ x ≤ 1 equal to that function, and find x inversely. Normal distribution X ~ N (μ, σ2) Making reference to the formula We let the random generated number 0 ≤ x ≤ 1 equal to the cumulative probability of the normal distribution. Then by using normal tables (or your calculator), you can find z, and therefore x. 13.2 Sampling Distributions (sample proportion & mean, central limit theorem) When we are in the process of finding sample means, or standard deviations, we might also want to know how the data are distributed. So following the few distributions that we have learnt, being Binomial, Poisson and Normal, we are learning a new one here: The Sampling Distribution of means. SAMPLING DISTRIBUTIONS OF A SAMPLE MEAN Before we start, we need to recall some information on expectation algebra. We remember that in a population, the expected value E(X) is actually the mean itself, μ, while the expected variance Var(X) is the variance of the population itself, σ2. So now, we are going to find the expected value of a sample mean, E(X̅). We all know that the mean of a sample of size n can be represented by the equation where x1, x2 and etc are independent observations in the populations. So we further find that the expected value of sample mean is which is actually the same value as the population mean. What this means is that the sample mean estimated should have the same value of the population mean. We will then find that the sample variance has a different value from the population variance. Using the fact that we find the sample variance to be So the standard deviation of the sampling distribution is which we call as the standard error of the mean. However, remember that this standard error is for samples with replacement. For samples without replacement, the variance would be Where N is the size of the finite population, and n being the sample size. I do not know how to derive this, and I don’t think it will appear in exams. I put it here for your reference. So now, for every time when we have a normal distribution X ~ N(μ, σ2), we have a sampling distribution of Consider the distribution X ~ N(100, 64) and consider the following: Notice that the sample size affects the sampling distribution. So now to answer questions, unlike Maths T, you have to be very particular as in whether it is talking about a population or a sample. Let me give you an example: The volume of wine in bottles are normally distributed with a mean of 758ml and a standard deviation of 12ml. A random sample of 10 bottles is taken and the mean volume found. Calculate the probability that the sample mean is less than 750ml. Let X be the volume of wine in bottles. X ~ N(758, 122) Since X is normally distributed, then the sampling distribution with n = 10, X̅ ~ N(758, 122 / 10) X̅ ~ N(758, 14.4) P(X̅ < 750) = P(Z < –2.108) = 0.0175 I assume that you have fully studied the chapters Discrete Probability Distributions & Continuous Probability Distributions in Maths T. So now you know the difference between samples and populations, the final answer will be different if you used the wrong distribution. We were assuming that the sample was taken from a population which follows the normal distribution. So what if it isn’t? Maybe, the sample was taken from a Binomial, Poisson or even a Uniform distribution? Let’s do a little experiment. Suppose you have an unfair coin, such that every time you toss it, it has 25% chance of getting a head. So if you toss it 10 time, you get a binomial distribution, X ~ B(10, 0.25). We plot the probability graph below. The red bars are the Binomial plots, while the blue line is the normal approximation. So now, we do the sampling distribution of X̅. That means, we do the experiment various times, get different means, and tabulate them as a distribution. If we do it 30 times (sample size of 30) we get a graph like below: then 50 times, we get It gets closer to a normal distribution, doesn’t it? Now we try a Poisson distribution, probably the average amount of monkeys seen along the road everyday is 4, then X ~ P0(4). So the probability of seeing n monkeys a day can be tabulated as follows: Again, we get into serious investigation to see how many monkeys appear everyday, and we get the means for 30 times, and we find the sampling distribution of X̅ to be as follows: Once again it is close to the normal blue curve. Remember that the y-axis stands for probability. So this sampling distribution simply tells us “the probability of the mean monkeys seen on the road daily, with a sample size of n”. We try now for a uniform distribution. A uniform distribution X ~ R(a, b) means that X is uniformly distributed with a range of a ≤ x ≤ b. It has the following expectation and variance: Assume X ~ R(0, 27), representing the probability of getting a number between 0 to 27 in a lucky draw to be equal. We can plot its distribution as then again, we find the sampling distribution of X̅. We do 30 sample, and we find that actually, it looks like a normal distribution! All these graphs are done with this applet. So after doing all these, we find that the sampling distribution taken from distributions not normally distributed, the sampling distribution takes the normal shape as the size increase. In other words, for large sample size n, it is approximately normal. And here, we introduce the central limit theorem: When samples are taken from a non-normal population with known variance σ2 then for large values of n, the distribution x̅ is approximately normal such that In statistics, we define a large sample to be n ≥ 30. You will be using this convention for the rest of the chapters. Let me show you an example of the use of central limit theorem: The average number of telephone calls made in an evening to a counselling service is 4.5 calls. 30 random observations are taken, and find the probability that the sample mean exceeds 5. X ~ P0(4.5) Since n ≥ 30, by central limit theorem, X̅ is approximately normal, so X̅ ~ N(4.5, 0.15) P(X̅ > 5) = P [Z > (5 – 4.5) / √0.15] = P (Z > 1.291) = 0.098 SAMPLING DISTRIBUTIONS OF SAMPLE PROPORTIONS Suppose a random sample of n observations is taken from a population in which the proportion of successes is p and the proportion of failures is q = 1 – p. If X is the number of successes in the sample, then X follows a binomial distribution, X ~ B(n, p). You should recall that E(X) = np and Var(X) = npq. Using the same method how we find the expectation of sample mean X̅, now we use it find the expectation of the sample proportion Ps . We know that So finding E(Ps) and Var(Ps), we get This in turns give us the distribution of sample proportion, and we define the term as the standard error of proportion. When using a distribution of sample proportions, we need to put continuity corrections into account (try recalling what you learned in Maths T). For this case, the continuity correction is I’ll show you an example: It is known that 3% of frozen pies delivered to a canteen are broken. What is the probability that, on a morning when 500 pies are delivered, 5% or more are broken? Let p be the probability that a pie is broken, p = 0.03. Let Ps be the proportion of pies in the sample that are broken. q = 0.97, n =500, we have Ps = N(0.03, 0.0000582) P(Ps ≥ 0.05) = P(Ps ≥ 0.05 – 0.001) [continuity correction, as calculated] = P(Ps > 0.049) = P(Z > 2.491) = 0.0064 if you could have noticed, there is another way of solving this solution, just by using Binomial Distribution alone. Let X be the number of broken pies in the sample. X ~ B(500, 0.03) Since n ≥ 30, np, nq > 5, it is approximately normal. X ~ N(np, npq) X ~ N(15, 14.55) 500 × 5% = 25 P(X ≥ 25) = P( X > 24.5) = P(Z > 2.491) = 0.0064 If I were you, I would choose to do the second method. However, in exam questions, if you were asked to find the proportion, then you better do the first method to avoid deduction of marks. Note that in either cases, this sample of proportion can only be used for large sample size n. 13.3 Point Estimates (unbiased estimates, t-distribution, standard error) To define a certain distribution, be it Binomial, Poisson or Normal, you need to know their population parameters. And of course, if you don’t know the parameters before hand, you would want to use sampling to estimate it. This estimate is unbiased if the average (or expectation) of a large number of values taken in the same way is the true value of the parameter. The best way to estimate these parameters is by using one with the smallest variance. So here in this section, we are focusing on point estimates. We estimate that the parameters are those points or data that we collected through the samples. Look at the 3 equations below. We denote an unbiased estimate with a cap. So the unbiased estimate of the proportion of success in the population, the population mean and population variance are denoted by p̂, μ̂ and σ̂2 respectively. It is found that, the best unbiased estimate for the population proportion and population mean, are the sample proportion ps and the sample mean x̅ themselves. However, the best unbiased estimate for the population variance is denoted differently, with the above formula. The formula for the expected variance can also have the following forms: That is all you need to know about this section. Let me give you a short example: The concentrations, in milligrams per litre, of a trace element in 7 randomly chosen samples of water from a spring were 240.8, 237.3, 236.7, 236.6, 234.2, 233.9, 232.5 Determine the unbiased estimates of the mean and the variance of the concentration of the trace element per little of water from the spring. To answer this question, we need to make use of our calculator. Set your CASIO 570MS to SD mode, and input all the data into it, by pressing the individual numbers in, every time followed by the DT button, until you finished inputting everything. Next, you press shift+S-VAR. It gives you the option of x̅, xσn and xσn-1. The first one gives you the unbiased estimate of the mean, while the last one will give you the unbiased estimate of the standard deviation. Just show them a little working even though you know the answers straight away: 13.4 Interval Estimates (confidence intervals, large & small samples, sample size) Point estimates might not be accurate. There is always a possibility that the unbiased estimate of the population mean is far away from the actual mean. Another way of finding this value is to construct an interval, known as a confidence interval. This confidence interval tells us that there is a certain probability that the unbiased estimated mean will lie within it. We usually write this interval in terms of (a, b), where the terms a and b are the confidence limits, or end-values of the interval. Consider a normal curve: We define a confidence interval in terms of percentage. For example, a 95% confidence interval, like the one above means that there is 95% probability that the population mean lies in the interval. Here we shall learn how to construct a confidence interval for a population proportion and a population mean. CONFIDENCE INTERVAL FOR POPULATION PROPORTION Here you want to find p, the proportion of successes in a particular population. You take a sample of size n, and then find the best unbiased estimate p̂. You need to recall quite a lot of information from the last 2 parts, putting in mind that when we are dealing with population proportions, whether it comes from a normal or non-normal distribution, it must be done with a large sample (n ≥ 30). Recalling that the sampling distribution of population proportion is The confidence limit will be and the confidence interval will be Okay, I need to explain this a little. If you would have observed closely, the confidence interval is constructed by the unbiased estimate of population proportion, ± the standard error. The term a determines the percentage of interval you wanted. This value a, can be obtained from the normal tables (or the Buku Sifir given in STPM). It looks something like this: I’ll teach you how to read this table, in the example below: In order to assess the probability of a successful outcome, an experiment was performed 200 times. The number of successful outcomes was 72. Find a 95% confidence interval for p, the population proportion of success. We start by listing down the important values: ps, qs and n, and the distribution. ps = 72 ÷ 200 = 0.36, qs = 0.64, n = 200 Ps ~ N (0.36, 0.001152) To find a, we refer to the table. Note that the table was written for lower tail probability P (Z ≤ a), but we are looking for P ( –a ≤ Z ≤ a). So a central 95% of the distribution, should have an upper and lower tail of 2.5%. This table might help to explain a little: The diagram on the left shows the lower tail probability, which is what the table in your Buku Sifir gives. We want to find the one on the right, in which by looking at the position of the red lines, you know that definitely are different. So here, the value of a comes from the column 0.975, which is 1.960. So your confidence interval shall be ( ps – 1.96√0.001152, ps + 1.96√0.001152 ) = (0.622, 0.738) You might have probably noticed that the continuity correction is omitted. Yes, this is indeed the case. You need to get used to reading the table to prevent yourself from using the wrong value of a. A 90% interval means that it has a lower tail probability of 95%, a 80% interval means that it has a lower tail probability of 90% and etc. To make things faster, I suggest you memorize the 4 most common percentage intervals: 90% confidence level 1.645 95% confidence level 1.960 98% confidence level 2.326 99% confidence level 2.576 CONFIDENCE INTERVALS FOR POPULATION MEAN This section is not so straight forward. Although it shares a lot of similarities with the part above, the construction of confidence intervals for population mean depends on the variance (known or unknown), the distribution (normal or non-normal) and its sample size. So in this section, there are 5 cases: 1. Normal with known variance σ2 The sampling distribution will be using the best unbiased estimate of population mean x̅ = μ, the confidence interval is 2. Non-normal with known variance σ2 (n ≥ 30) In this case, the sample may be taken from a Binomial or Poisson distribution. Since the sample size is large, according to the central limit theorem, we approximate a normal distribution. X ~ B(n, p) becomes X ~ N(np, npq) X ~ P0(λ) becomes X ~ N(λ, λ) X ~ R(a, b) becomes X ~ N( ½ (a + b), 1/12 (b – a)2 ) and etc. From here, after finding the sampling distribution X̅, again using the best unbiased estimate of population mean x̅ = μ, the confidence interval is the same as above, 3. Normal with unknown variance σ2 (n ≥ 30) The method of solving this is just the same as method 1, but here we do not know the population variance. Using the unbiased estimate of population mean x̅ = μ, and the unbiased estimate of population variance, Our confidence interval will be or we could rewrite it in terms of s, 4. Non-normal with unknown variance σ2 (n ≥ 30) Similar to method 2, we approximate a normal distribution, and after finding the sampling distribution X̅, we use the unbiased estimates μ̂ and σ̂, we use the same equation for confidence interval as the method 3, 5. Normal with unknown variance σ2 (n < 30) It is interesting to note that when the sample size is small, the sampling distribution does not follow a normal distribution. Instead, it follows a t-distribution. The distribution of T is a member of t-distributions. All t-distributions are symmetric about zero and have single parameter ν (pronouced ‘new’) which is a positive integer. ν is known as the number of degrees of freedom of the distribution and if, for example, T has a t-distribution with 5 degrees of freedom, you would write T ~ t(5). For a sample size n, it can be shown that T follows a t-distribution with (n – 1) degrees of freedom. Take a look at the t-distribution curves below. Notice that we only use the t-distribution when the sample size is small, and therefore, when t tends to infinity, it will look like a normal curve. In other words, nothing much has changed, we are just using a new distribution for small sample size. After knowing that our sample size is small, we use the t-distribution using (n – 1) degrees of freedom, use the unbiased estimates for both the mean and the variance, and our new formula will be where t can be obtained from the t-distribution tables. It looks something like this. The way you use it is exactly the same as the critical values for the normal distribution, its just that there is a column of degrees of freedom. HOW TO SOLVE EXAM QUESTIONS It isn’t hard. All you need to do is to identify the quantities stated in the question, and you’ll classify whether you should solve the question using which one of the 5 methods. I’ll put here a few example of questions, and show you how to analyse them: A plant produces steel sheets whose weights are known to be normally distributed with a standard deviation of 2.4kg. A random sample of 36 sheets had a mean weight of 31.4kg. Find the 99% confidence interval for the population mean. It is normally distributed, variance = 2.42kg (known), sample size = 36, sample mean = 31.4kg. Use method 1. The heights of men in a particular district are distributed with mean μ cm and standard deviation σ cm. On the basis results obtained from a random sample of 100 men from the district the 95% confidence interval for μ was calculated and found to be (177.22cm, 179.18cm). Find the value of the σ and x̅. Unknown distribution, variance known, but sample size large. Approximate normal, method 2. You need to work backwards using the confidence interval formulas, get 2 simultaneous equations, and solve for σ and x̅. Give it a try. The fuel consumption of a new model of car is being tested. In one trial, 50 cars chosen at random, were driven under identical conditions and the distances, x km, covered on 1 litre of petrol were recorded. The results gave the following totals: Σx = 525, Σx2 = 5625 Calculate a 95% confidence interval for the mean petrol consumption, in km/l, of cars of this type. Unknown distribution, variance unknown, big sample. Approximate normal, use unbiased estimate of population variance (you have to calculate it this time), use method 4. A sample of 8 independent observations of a normally distributed variable gave the following values: 3.6, 3.9, 4.5, 3.8, 4.4, 4.9, 4.2, 3.8. Determine a 99% confidence interval for the population mean μ. Normal distribution, unknown variance, and small sample. Method 5. In your question, you need to write these sentences very clearly: since n < 30, a t(n-1) distribution is used. T ~ t(7) Then you continue to find the confidence intervals. Not hard, isn’t it? Here are a few short notes you might want to take note as well: 1. interval width The width of a confidence interval can be obtained from the expression Also remember, when the width is increased, then either a. the sample size n increases, b. the confidence interval decreases, or c. the variance decreases. 2. Assumption Many times you might be asked, “state the assumptions you made”. You probably only have one assumption, which is: we assume that it is a random sample. To summarize this section, I made a chart for you to remember things easier. BAB 14 14. Hypothesis Testing 14.1 Hypotheses (null & alternative hypotheses, test statistic, significance level) Let’s imagine this story. One day in town, you met this awkward looking Mathematics tuition teacher. He brags that 95% of his pupils get A’s for their Mathematics T in STPM every year. Since you love Mathematics so much, you thought that maybe you might want to take his tuition class. But being sceptical in nature, you were wondering whether 95% of his students getting A’s, is a little too much. So you decided that you want to put this teacher to a test. You managed to get some information from 15 of his ex-students, and find out that 11 of them got A for Maths T in the previous year. Now your question is: is the Maths tuition teacher’s claim, a little bit overboard? Is 11 out of 15, 95%? Obviously it isn’t, but since you are only taking a sample, you can’t be sure that you are right. What if there were 13 or 14 students got A’s? You know that if 2 or 3 students got A’s, he is definitely lying. Then how about 10 students? 8 or 9 students? There must be a cut off point, such that you are VERY SURE that he is lying, or not. Isn’t it? Or let’s think of another story. Suppose you are an athlete, participating in the MSSM 400m race. You find that every time, your running speed follows a normal distribution with a mean of 40km/h. Bored of running everyday, you decided to test whether drinking 2 cups of milk in the morning everyday helps improve your running. So after drinking milk for 5 days, you find your mean speed turned out to be 40.9km/h. Again you question yourself: did you really “improved”? Well, it might so happen that you run a little faster this time, and has nothing to do with the milk. You might also be wondering, how much increase in speed is considered as ‘improve’? You need a cut off point, again. NULL AND ALTERNATIVE HYPOTHESIS If you didn’t notice, you were actually making hypotheses, or a significance test. You were trying to test a hypothesis, to determine whether you can conclude something. You were testing whether the 95% students get A’s and the ‘improvement’ in running is true. The initial assumption is what we called as a null hypothesis, H0. It is very important as it provides the model for the calculations. The null hypothesis for the first case is “95% of the students get A’s for Maths T”. If your results show that indeed 95% of the students get A’s in Maths T, then your hypothesis is true. The case is this: you can’t reject his claim if you don’t have enough evidence to do so. If after your test, you have enough evidence to reject his claim, then you need an alternative hypothesis, H1. The alternative hypothesis for this case is “less than 95% of the students get A’s in Maths T”. This is a binomial problem, so in Mathematical terms, we have H0: p = 0.95 H1: p < 0.95 Notice that you are only interested in whether the probability is less than 95% or not, so this means that we are interested in the left hand end of the distribution. This is known as the lower tail. In the second case, we are interested in the upper tail, as in whether you have improved or not. There are cases that you want to know whether there is change in the values, e.g. whether there is a change in supporters for Barisan Nasional, Pakatan Rakyat or etc. For this case, we use a two-tailed test. 14.2 Critical Regions You previously learnt how to formulate a null and alternate hypothesis, and determine your test statistic and test value. With these information is still not enough. We shall now proceed to setting the significance level, and determining the critical region. When making a hypothesis test, you have to make a decision about the significance level, which is the value of the probability that is considered to imply an unlikely or rare event. As a guide, events that have a probability of 5% or less are regarded as unlikely and events having a probability of 1% or less are regarded as very unlikely. Other significance level used are 10% and 2% respectively. Try not to confuse this with what you learned in the previous chapter, which was confidence intervals, in terms of 90+%. Let’s say, in a test of 10 true-false questions which were written in Hindi, your friend got 6 questions correct, and you want to know whether he was guessing, or he really studied Hindi. You formulate the hypotheses as below: H0: Your friend is guessing. He makes use of the 50% luck. H1: Your friend seriously studied Hindi before. He scores more than 50%. Mathematically, this is a binomial problem again, X ~ B(10, 0.50). H0: p = 0.50 H1: p > 0.50 Notice that the expression for H0 always has an ‘=' sign, while H1 should have either <, > or ≠ signs. To start our test, we need to define our significance level. We can say, for example, that we want to test at the 5% level, that he could have obtained this score by guessing all the answers. We can also choose to test at 1% level or 10% level, and obviously, you might get different results. So from here you can see that in the last section, you can’t get any answer if you don’t set a significance level. You can’t say how much you have improved in your running, unless you state that “an increase in 5% is significant”, or “if I run faster by 10%, then there is significant improvement”. With this significant level, then only our hypothesis could be done. For the example above, say, we want to test it at 5% level. We first need to find out the probability of how many questions he get correct. We plot a cumulative binomial distribution X ~ B(10, 0.50). Notice that it is a decreasing cumulative Binomial plot. This curve tells us the probability he gets ≥ n questions correct. So we see that, there is 99.9% probability that he gets at least one question correct, and 62.3% probability that he gets at least 5 questions correct etc. Even if your friend gets 8 questions correct, there is 5.5% probability that he is guessing, which is still above our required significance level. So here, if he gets 9 questions correct, it must be really a rare event, as he has only 1.1% probability of getting this score if he was guessing. We say that the numbers 9 and 10 lie in the critical region, which is the group of observations that are considered to be unusual or unlikely (rare) events. We also say that number 9 is the critical value, or cut-off point, since anything above it is considered a rare event. So what can we conclude from here? We can see that if your friend got 0 to 8 questions correct, we have no evidence, saying that he did studied Hindi, as these are not rare events (they are > 5% probability). We say that the null hypothesis H0 is not rejected, which is the case. But if he gets 9 or 10 questions correct, we say that there is evidence, at 5% significance level, that your friend did study Hindi. In other words, the null hypothesis H0 is rejected in favour of the alternative hypothesis H1. Notice that if we did a 10% significance level test, number 8 now lies in the critical region! So this is actually very subjective, and it really depends on you (or the question in your test paper) to determine what is considered significant and what is not. Let me sum up what you understand about hypothesis testing by now: 1. To test something, you need to first define your null hypothesis H0, something that is claimed, or happening. 2. Then you define your alternative hypothesis H1 just in case H0 is not true. 3. Find your test statistic, test value. 4. Try to identify what kind of distribution it is from. 5. Determine a significance level to reject or accept the claim. 6. Plot or use a given cumulative distribution to find the critical regions. 7. Determine whether the test value lies in the critical region. If yes, then H0 is rejected. If not, H0 is accepted. 14.3 Tests of Significance (population proportion & mean, Type I & Type II errors) A Hypothesis Test is a Test of Significance. In this section, we will be looking at all the possible types of hypothesis tests that can be made in STPM. Before we start, every hypothesis test follow a general rule. You need to state these 7 steps (or workings) in your answer sheet: 1. Define the variable X. Let X be …, X ~ B(n, p) / X ~ N(μ, σ2) / X ~ P0(λ) 2. Define H0 and H1. H0: p / λ / μ / μ1 – μ2 = ? H1: p / λ / μ / μ1 – μ2 <, >, ≠ ? 3. Write down the case if H0 is true. If H0 is true, then p / λ / μ / μ1 – μ2 = ? and X ~ B(n, ?) / X ~ N(?, σ2) / X ~ P0(?) 4. Define your type of test and significance level. Use a upper / lower / two tailed test, at ?% level. 5. Set the criteria to reject H0. Reject H0 if P(X ≥ x) < ? / P(X ≤ x) < ? / z < ? / z > ? / |z| > ? / T < ? 6. Do the calculations. P(X ≥ ?) = ? / P(X ≤ ?) = ? / z = ? / T = ? 7. Conclude your results. Since P(X ≥ x) = ? / P(X ≤ x) = ? / Z = ? / T = ?, x lies / doesn’t lie in the critical region. H0 is rejected in favour of H1 / not rejected. We conclude that ………. at ?% level. If you have all these 7 steps on your answer sheets, then you will probably get 90% percent of the marks. Don’t make calculation mistakes though. TYPES OF SIGNIFICANT TESTS In this part, there are 12 kinds of significant tests that you might face, be it lower tail, two-tailed or upper tail tests. I will go through this section with an example for each one. Questions are in blue, answers are in red: 1. Binomial Proportion p (n < 30) A certain type of seed has a germination rate of 70%. The seeds undergo a new treatment after which 9 germinates in a packet of 10 seeds. Test, at the 5% level, whether this is evidence of an increase in the germination rate. Let X be the germination rate of a certain type of seed, X ~ B(10, p) H0: p = 0.7 [the germination rate is 70%] H1: p > 0.7 [the germination rate increases] If H0 is true, then p = 0.7, and X ~ B(10, 0.7) Use an upper tail test, at 5% level. Reject H0 if P(X ≥ x) < 0.05 [0.05 stands for 5%] P(X ≥ 9) = P(X = 9) + P(X = 10) = 0.1211 + 0.0282 = 0.1493 = 14.93% Since P(X ≥ 9) = 14.93%, x doesn’t lie in the critical region. H0 is not rejected. We conclude that there is no evidence that there is an increase in germination rate, at 5% level. A binomial proportion with small sample isn’t hard. The thing that bothers you might probably be the calculations of P(X ≥ 9). Remember the formula for Binomial distribution, nCxpxqn-x. 2. Binomial Proportion p (n ≥ 30) For this case, an approximation to the normal distribution is used. Remember the continuity correction is used, such that it lies in the critical region. A manufacturer claims that 8 out of 10 dogs prefer its brand of dog food to any others. In a random sample of 120 dogs, it was found that 85 appeared to prefer that brand. Test, at the 5% level whether you would accept the manufacturer’s claim. Let X be the number of dogs which prefer the manufacturer’s brand of dog food, X ~ B(120, p) H0: p = 0.8 H1: p ≠ 0.8 [notice that we are using the ≠ sign. This is because we are testing whether the claim is exactly correct. That means, the claim is wrong if more than 8 dogs like the brand, and also if less than 8 dogs like the brand.] If H0 is true, then p = 0.8 and X ~ B(120, 0.8) Since np > 5, nq > 5, then X is approximately normal, X ~ N(np, npq), which is X ~ N(96, 19.2). Use a two-tailed test, at 5% level. Reject H0 if |z| > 1.960 [Still remember how to get this value 1.960? Remember that a two-tailed test at 5% means that both ends of the bell curve has 2.5% each. Refer to the critical values for the normal distribution at the end of this post.] [85.5, continuity correction, such that it lies in the critical region, that means you correct it such that the value is nearer to the critical region.] Since z = –2.396, z lies in the critical region. H0 is rejected in favour of H1. There is evidence that the proportion is lesser, and therefore the manufacturer’s claim is not accepted, at 5% level. 3. Poisson Mean λ The number of white corpuscles on a slide has a Poisson distribution with mean 3.5. After treat, a sample was taken and the number of white corpuscles was found to be 8. Test at the 5% level of significance, whether the number of white corpuscles has increased. Let X be the number of white corpuscles on a slide, X ~ P0(λ). H0: λ = 3.5 H1: λ > 3.5 If H0 is true, then λ = 3.5, and X ~ P0(3.5). Use an upper tail test, at 5% level. Reject H0 if P(X ≥ x) < 0.05. P(X ≥ 8) = 1 – P(X < 7) = 1 – 0.9733 = 0.0267 = 2.7% [I hope you remember the Poisson formula. In some formula booklets, there are Poisson cumulative probability tables, they help too.] Since P(X ≥ 8) = 2.7% < 5%, x lies in the critical region. H0 is rejected in favour of H1. There is evidence, at 5% level that the number of white corpuscles increased. Not a hard one, I suppose. Remember that if λ > 5, you can actually make an approximation to the Normal distribution, X ~ N(λ, λ2). 4. Population Mean μ (Normal, σ2 known) A machine fills cans with soft drinks so that the volume of liquid in the cans follow a normal distribution with mean 335ml and standard deviation of 3ml. A setting on the machine is altered, following which the operator suspects that the mean volume of liquid discharged by the machine into the cans has decreased. He takes a random sample of 50 cans and finds that the mean volume of liquid in these cans is 334.6ml. Does this confirm his suspicion? Perform a significance test at the 5% level and assume that the standard deviation remains unchanged. Let X be the volume of liquid in the cans, X ~ N(μ, 32) H0: μ = 335 H1: μ < 335 The sample size is 50, X̅ ~ N(μ, 32/50) [recall what you learned in the previous chapter] If H0 is true, then μ = 335, and X̅ ~ N(335, 9/50) Use a lower tail test, at 5% level. Reject H0 if z < –1.645 Since z = –0.9428 > –1.645, z doesn’t lie in the critical region. H0 is not rejected. There is no evidence to confirm the suspicion of the operator, at 5% level. For hypothesis type 4 to 8, you might want to recall what you learn in the previous chapter. Remember when to use t-distribution, when to approximate normal and etc. These few types make use of the sampling distribution. 5. Population Mean μ (Non-normal, σ2 known) I think I don’t need to show you an example on this one. It is similar to number 4. You make that non-normal distribution (or sometimes unnamed, or unknown distribution) approximate normal, and follow the exact same steps as type 4. 6. Population Mean μ (Normal, σ2 unknown, n ≥ 30) When the variance is unknown, you make use of the best unbiased estimate of population variance, and the rest of the steps follows. 7. Population Mean μ (Non-normal, σ2 unknown, n ≥ 30) Similar to type 6, you make use of the best unbiased estimate of population variance. The following example illustrates both type 6 and 7: A random sample of 75 11-year-olds performed a simple task and the time taken, x minutes, noted for each. The results were summarized as follows: Σx = 1215, Σx2 = 21708 Test, at the 1% level, whether there is evidence that the mean time taken to perform the task is greater than 15 minutes. Let X be the time taken to perform a simple task by the 11-year-olds. H0: μ = 15 H1: μ > 15 The distribution is unknown. But since n = 75 is large, by the central limit theorem, X̅ is approximately normally distributed, X̅ ~ N(μ, σ̂ 2/75), with variance unknown. If H0 is true, then μ = 15, and X̅ ~ N(15, σ̂ 2/50) Use an upper tail test, at 1% level. Reject H0 if z > 2.326 Since z < 2.326, z doesn’t lie in the critical region. H0 is not rejected. There is no evidence, at ?% level that the mean time is greater than 15 minutes. 8. Population Mean μ (Normal, σ2 unknown, n < 30) You probably might have guessed correctly. You should use the t-distribution to do this kind of significance test. Family packs of bacon slices are sold in 1.5kg packs. A sample of 12 packs was selected at random and their masses, measured in kilograms, noted. The following results were obtained: Σx = 17.81, Σx2 = 26.4357 Assuming that the masses measured in kg packs follow a normal distribution with variance σ2 unknown, test at the 1% level whether the packs are underweight. Let X be the mass of packs of bacon slices, X ~ N(μ, σ2) H0: μ = 1.5 H1: μ < 1.5 Since σ2 is unknown, and n < 30, a t-distribution is used, T ~ t(n – 1) If H0 is true, then μ = 1.5, T ~ t(11), where Use a lower tail test, at 1% level. Reject H0 if t < –2.718 [refer to the t-distribution tables] x̅ = 1.484, so T = –3.506 < –2.718 t lies in the critical region. H0 is rejected in favour of H1. There is evidence that the packs are underweight, at 1% level. 9. Difference between Means μ1 – μ2 (different variance σ12 & σ22 known) This is something new. Type 9, 10 and 11 are only for 2 Normal populations, X1 and X2 with unknown means μ1 and μ2. So it means that here, you have 2 samples, with the new test statistic X̅1 – X̅2, and let us consider its sampling distribution. Doing some expectation algebra, and so our sampling distribution of difference of means will be and therefore, in standardised form, Let’s try one example: Due to differences in the environment, the masses of a certain species of small animal are believed to be greater in Region A than in Region B. It is known that the masses in both regions are normally distributed with masses in Region A having a standard deviation of 0.04kg and masses in region B having a standard deviation of 0.09kg. To test this theory, random samples are taken: 60 animals from Region A had a mean mass of 3.03kg and 50 animals from Region B had a mean mass of 3.00kg. Does this provide evidence, at the 1% level that the animals of this species in Region A have a greater mass than those in Region B? Let X1 be the mass (kg) of an animal in Region A, with population mean μ1. X1 ~ N(μ1, 0.042) Let X2 be the mass (kg) of an animal in Region B, with population mean μ2. X2 ~ N(μ2, 0.092) H0: μ1 – μ2 = 0 [there is no difference in the masses between the regions] H1: μ1 – μ2 > 0 [the animals in Region A have greater mass] Consider the distribution of the difference between the means X̅1 – X̅2, with n1 = 60, n2 = 50. If H0 is true, then μ1 – μ2 = 0 [there can be cases where it is not 0 too.] Use an upper tail test, at 1% level. Reject H0 if z > 2.326 z doesn’t lie in the critical region. H0 is not rejected. There is no evidence, at the 1% level, that the animals in region A have a greater mass than those in region B. 10. Difference between Means μ1 – μ2 (common σ2 known) This one has not much difference from the one above. This means that the 2 populations have a common variance, which doesn’t change in time. The distribution will then be represented by and the test statistic, By the way, you can also create confidence intervals for situations like this too. Try it out yourself. There can be 2 tail, upper tail and lower tail tests as well. 11. Difference between Means μ1 – μ2 (common σ2 unknown) I don’t know if the variances are different. But if both populations have a common unknown variance, the unbiased estimate σ̂ 2, also known as a pooled two-sample estimate, has the formula where n1 and n2 are the sample sizes and s12 and s22 are the variances of the 2 samples respectively. The distribution will be and the test statistic, This is, however, not always the case. When both the samples are small, we should use the tdistribution instead. The test statistic will now be where T ~ t(n1 + n2 – 2). We should only use the t-distribution when n1 + n2 – 2 < 30. A large group of sunflowers is growing in the shady side of a garden. A random sample of 36 of sunflowers is measured. The sample mean height is found to be 2.86m, and the sample standard deviation is found to be 0.60m. A second group of sunflowers is growing in the sunny side of the garden. A random sample of 26 of these sample flowers is measured. The sample mean height is found to be 3.29m and the sample standard deviation is found to be 0.9m. Treating the samples as large samples from normal distribution having the same variance but possibly different means, obtain a pooled estimate of the variance and test whether the results provide significant evidence at the 5% level that the sunny-side flowers grow taller, on average, than the shady-side sunflowers. Let X1 be the height of sunflowers in the shady side, X1 ~ N(μ1, σ2) Let X2 be the height of sunflowers in the sunny side, X2 ~ N(μ2, σ2) where σ2 is unknown. H0: μ1 – μ2 = 0 H1: μ1 – μ2 > 0 Consider the distribution of the difference between the means X̅1 – X̅2, with n1 = n2 = 36. If H0 is true, then μ1 – μ2 = 0 and therefore Use an upper tail test, at 5% level. Reject H0 if z > 1.645 z lie in the critical region. H0 is rejected in favour of H1. There is evidence, at the 5% level, that the sunny-side sunflowers grow taller than the shady-side sunflowers. When you perform a significance test, you tend to make errors. If H0 is correct and you accept it, or if H0 is false and you reject it, then you’ve made a correct decision. However, there are 2 kinds of errors that you will made: 1. A Type I Error, which is made when you reject H0 when it is true 2. A Type II Error, which is made when you accept H0 when it is false. Questions are usually interested to know the probability of making these errors. The first one is easy, P(Type I error) = level of significance. For the type II error, things are not so straight forward. A specific value of H1 is stated in order to find the probability of this error. I’ll show you an example below: A random observation is taken from a binomial distribution X ~ B(20, p) and used to test the null hypothesis p = 0.8 against the alternative hypothesis p> 0.8. The significance level of the test is 7%. Find the probability of making a Type I error. Find also the probability of making a Type II error if in fact p = 0.85. The probability of making a Type I error is 7%. [same as the level of significance] You make a Type II error if you accept H0 when p is the value specified in H1. For Type II error, H0: p = 0.80 H1: p = 0.85 P(X = 20) = 0.012 = 1.2% P(X ≥ 19) = 0.069 = 6.9% P(X ≥ 18) = 0.206 = 20.6% So the critical region is X ≥ 19. So P(Type II error) = P(accept H0 when H1 is true) = P(X < 19 when p = 0.85) = P(X < 19 when X ~ B(20, 0.85)) P(X ≤ 18) = 1 - P(X = 20) - P(X = 19) = 0.824 = 82.4% [Note that in this part of the calculations, you are using p = 0.85, but not 0.80 as when you were finding the critical region above.] ∴ The probability of making a Type II error is 82.4%. Let me summarize how you find the probability of a Type II error: 1. Define your new H1 2. Find the critical region 3. Find the probability of the new value in H1 that lies outside your found critical region. By the way, the expression 1 – P(Type II error) is known as the Power of the Test. BAB 15 15. χ2 Tests 15.1 χ2 Distribution The χ2 Distribution (read as ‘kai-squared’, written as ‘chi-squared’) is just another new distribution that we will be learning today. This distribution mainly helps us to see or analyse, whether a particular population fits into a certain distribution (Binomial, Poisson, Normal etc). For example, if you have a list of the height of the students in your school, you want to know whether the list fits a normal distribution. You conduct a test of goodness-of-fit. Then probably you also want to know whether the weather during the football match affects the results of the same football team. You conduct a test of independence. These tests have the similar format of how a hypothesis test is conducted, but before I go into it, let me introduce this distribution and its attributes. The χ2 Distribution has one parameter, ν, pronounced ‘new’, known as the number of degrees of freedom. The shape of the distribution is different for different values of ν. Take a look at the graph below: As ν increases, the curve looks rounder, and tends to start from zero. The curve is positively skewed for ν > 2, and when ν is large, the distribution is approximately normal. If we are using a chi-squared distribution with 5 degrees of freedom, we say that we are using a χ2(5) distribution, or X2 ~ χ2 (5). The table for the critical values of the χ2 distribution looks something like this: Unlike hypothesis testing and confidence intervals, which you might look for 2 tailed, upper tail or lower tail tests, you are only required to look for the upper tail critical value for the χ2 distribution. However, the similarity with hypothesis testing is the significance levels, which are 5%, 10%, 1% or so. The critical region, as before, is the upper blue region shown in the graph, and the boundary of the critical region is called the critical value. For a 5% level of significance, the critical value is written as χ25% (ν), where the value depends on ν. So looking at the graph, χ25% (5) has the value of 11.070. Before I introduce to you the test statistic, I’ll give you an illustration on how they come about. Suppose that I give you a table below. I told you that these were the random numbers generated by a calculator. However, you doubt whether it is truly random. When we say that the values are random, it means they have equal probability of appearing. So this means, the frequency should follow a uniform distribution, with every digit appearing with a frequency of 10 each. So the observed frequency O is your experimental results, while the expected frequency E is what you think it should be. So your test statistic X2 is Let’s try to understand this expression. The term (O - E)2 will become very large, if the expected and observed frequency are very far apart (like for the digit 0). Dividing with E gives us the percentage difference. So this tells us that, if X2 is very big, then you know that it is definitely not the correct distribution (in this case, it is not uniformly distributed). But if X2 is close to 0, then we can say that the data definitely follows that distribution. This formula for X2 is correct for any value of ν, except when ν = 1. We need to use the Yates Correction, Now you might be wondering how do we determine the value of ν. ν can be found through the formula ν = number of classes – number of restrictions The number of classes is the number of columns you have in your table. For the one above, it has 10 classes. The number of restrictions depends on whether the mean or variance is known and whether the sum of observation frequency is known. In a uniform distribution, there is only 1 restriction, that is ΣO = 100. We will get into the details in the next post. 15.2 Tests for Goodness of Fit A χ2 Goodness-of-Fit test is used when you have some practical data and you want to know how well a particular statistical distribution, such as a binomial or a normal, models that data. The null hypothesis H0 is that the particular distribution does provide a model or the data; the alternative hypothesis H1 is that it doesn’t. Just like Hypothesis Tests, Goodness-of-fit Tests also follow a general guideline. You need to write all these 6 steps in your answer sheet: 1. State the null & alternate hypothesis H0: x is uniformly, B, P0, N distributed / distributed in a ratio of ? H1: x is not distributed this way 2. Calculate the expected frequency E in the table 3. State the degree of freedom There are ? classes and ? restrictions Consider a χ2 (n – ?) distribution 4. State the significance level Perform at ?% level From the tables, χ2 (?%) (ν) = ?, so reject H0 if X2 > ? 5. Calculate X2 using the tables 6. Make your conclusion Since X2 > / < ?, H0 is rejected in favour of H1 / not rejected. There is evidence, at ?% level, that __________ . Now we shall proceed to learn how to solve 5 kinds of χ2 tests through examples. Questions are in blue and answers are in red: 1. Uniform Distribution (Random) A tetrahedral die is thrown 120 times and the number which on it lands is noted. Test at the 5% level whether the die is fair. H0: The die is fair [I can also write, “the die follows a uniform distribution”. But this is better.] H1: The die is not fair There are 4 classes and 1 restriction (ΣE = 120) [Remember that ΣE = ? is always one of the restrictions] Consider a χ2 (3) distribution, perform at 5% level. From the tables, χ2 (5%) (ν) = 7.815, so reject H0 if X2 > 7.815. Since X2 < 7.815, H0 is not rejected. There is evidence, at 5% level, that the die is fair. 2. Distributed in Given Ratio The outcomes A, B & C of a certain experiment are thought to occur in the ratio 1 : 2 : 1. The experiment is performed 200 times and the observed frequencies of A, B & C are 36, 115 & 49 respectively. Is the difference in the observed and expected results significant? Test at the 5% level. H0: The outcomes A, B & C are in the ratio 1 : 2: 1 H1: The outcomes A, B & C are not in the ratio 1 : 2: 1 There are 3 classes and 1 restrictions (ΣE = 200) Consider a χ2 (2) distribution, perform at 5% level. From the tables, χ2 (5%) (ν) = 5.991, so reject H0 if X2 > 5.991 [To save time, you could just construct one table instead of 2. You find the E and the test statistic in one table.] Since X2 > 5.991, H0 is rejected in favour of H1. The difference in the observed & expected results are significant, at 5% level. 3. Binomial Distribution Nothing much is different from this with the above two, just that you need more vigorous calculations to find your E. Once again, remember your binomial and Poisson formula, and combine expected frequencies less than 5. You do that because the error will be reduced, and of course, a different degree of freedom will be used. Perform a χ2 test to investigate whether the following is drawn from a binomial distribution with p =0.3. Use a 5% level of significance. H0: X ~ B(5, 0.3) [Writing the short form is good enough.] H1: X is not distributed this way. The expected frequency for a Binomial distribution, E = P(X = x) × 100 = 5Cx0.3x0.75-x × 100 where ΣO = 100. We tabulate the table below: Since the expected frequency of x = 4, 5 are < 5, the last 3 classes are combined. [please take note of this piece of information.] There are now 4 classes and 1 restrictions (ΣE = 100) Consider a χ2 (3) distribution, perform at 5% level. From the tables, χ2 (5%) (3) = 7.815, so reject H0 if X2 > 7.815 Since X2 < 7.815, H0 is not rejected. ∴ X is binomially distributed. Notice that the number of restriction can increase, if the population proportion is not known. You use x̅ = np to find the value of p. For example, a random sample of size 50 is taken, and you are given this table You don’t know the mean, but you know that You can find the value of p by using the equation x̅ = np, where n = 50. That will make the question having 2 restrictions, and your degree of freedom n – 2. 4. Poisson Distribution This one is very similar to the Binomial one. If the Poisson population mean λ is unknown, the number of restriction will add 1, and you use the sample mean x̅ = λ. Just take a look at the example. A local council has records of the number of children and the number of households in its area. It is therefore known that the average number of children per household is 1.4 It’s suggested that the number of children per household can be modelled by a Poisson distribution with parameter 1.40. In order to test this, a random sample of 1000 households is taken, giving the following data. Carry out a χ2 test, at the 5% level of significance, to determine whether or not the proposed model should be accepted. Let X be the number of children per household. [notice that in this case, I define X properly. You should do it when you know what is X.] H0: X ~ P0(1.4) H1: X is not distributed this way. There are 6 classes and 1 restrictions (ΣE = 1000). Consider a χ2 (5) distribution, perform at 5% level. From the tables, χ2 (5%) (5) = 11.070, so reject H0 if X2 > 11.070. I suppose you can related that Since X2 > 11.070 , H0 is rejected in favour of H1. The proposed model shouldn’t be accepted, X doesn’t follow a Poisson distribution. 5. Normal Distribution As for normal distribution, it is either you know both the population mean μ and population variance σ2, or you don’t know both μ and σ2. In this case, you either have degrees of freedom n –1, or n – 3. See the example below: The following data gives the heights in cm of 100 male students. Find the expected frequencies of a normal distribution having the same mean and variance as the data given, and test the goodness of fit, using a 5% level of significance. To start, we need to find the values of μ and σ2 first. Let X be the height (cm) of 100 male students. H0: X ~ N(171.54, 50.56) H1: X is not distributed this way. Now this one needs a lot of calculations. The expectation frequency of each class can be found by using where a and b are the lower and upper boundaries of each class (remember to ±0.5). The work for a continuous variable takes some time. Remember that the bell curve goes all the way to infinity. I believe you know that your calculator can help you do tricks, right? Remember to combine the small classes. There are 5 classes and 3 restrictions (ΣE = 100, μ and σ2 estimated from the sample). Consider a χ2 (2) distribution, perform at 5% level. From the tables, χ2 (5%) (2) = 5.991, so reject H0 if X2 > 5.991. Since X2 < 5.991, H0 is not rejected. X is normally distributed, X ~ N(171.54, 50.56). Before I end this section, let me give you a summary of degrees of freedom used throughout this post: 15.3 Tests for Independence (contingency table) Sometimes situations arise when data are displayed in a contingency table, which is a table displaying data classified according to to 2 different factors / attributes. For example, the table below This is a 2 by 3 table, which shows the different schools and their different performance in an exam. We use a χ2 test to determine whether the two factors are independent, or whether there is an association between them. According to the table above, we want to know whether the school affects their exam performance. Or in other words, since the amount of students of school A and school B. are different (80 and 70 respectively), we know that, if they have the same ratios of credit, pass and fail, it means that whichever the school it is, also it doesn’t affect the grades. This kind of test is known as the test for independence. As usual, we shall find the expected frequency, find the degree of freedom ν and find the test statistic X2 which has the same formula as the previous section. Let’s take the above example. The degree of freedom for a h × k contingency table can be found using the formula ν = (h – 1)(k – 1) and so, the above table has the value of ν = 2. The expected frequency E, can be found through the formula To find this, we first need to find the total of each row and column. We modify the table above, colour it a little, then we get The black numbers in the middle are known as the observed frequency. To proceed to find the expected frequencies, we construct another table, but clearing off all the data in the middle. Next, we use the above formula to fill in the expected frequencies. For the top left cell, we have 90 × 80 ÷ 150 = 48.0 We proceed to fill up the rest: From here, we proceed to find X2 by making use of the 6 values of O and E that we just calculated. Now let me give you an example: A research worker studying the ages of adults and the number of credit cards they posses obtained the results shown in the table. Use the χ2 statistic and a significance test at the 5% level to decide whether or not there’s an association between age and number of credit cards possessed. H0: There’s no association between age and number of credit cards possessed. H1: There’s an association between age and number of credit cards possessed. Expected frequency, ν = (2 – 1)(2 – 1) = 1, the Yates’ Correction is used. Use the χ2 (1) distribution, perform the test at 5% level. Since χ2(5%) (1) = 3.841, reject H0 if X2 > 3.841. Since X2 > 3.841, H0 is rejected. There’s an association between age and number of credit cards possessed, at 5% level. 16. Correlation & Regression 16.1 Scatter Diagrams A scatter diagram is a diagram produced when pairs of values are plotted, to determine the relationship between 2 variables. Usually a scatter diagram contains bivariate data, which is data connecting 2 variables, x and y. Using the usual convention, x is the independent variable (explanatory variable), where it is controlled by the user who is analysing the situation. y on the other hand, is the dependent variable (response variable), it is the variable that is influenced by the previous one. I believe you learned this in your form 1 Science already. In a scatter diagram, the independent variable is represented by the x-axis, while the dependent variable is on the y-axis. Basically, a scatter diagram is just a normal graph, with lots of dots on it. Suppose you want to analyse the relationship between the temperature of a chemical mixture, with its yield of a new compound. You started the experiment with various temperatures, and after a fixed time, you measure the yield of the new compound (precipitate). And you plot them in a graph like the one below. Having drawn a scatter diagram, you can then look for a mathematical relationship between the variables x and y. This relation of y = f(x) is known as the regression function. The scatter diagram above shows a positive linear relationship between the data, but with a large dispersion. You can also find a line of best fit, or regression line to make things clearer. Other kinds of relationship between 2 data are: For the data in diagrams (a) and (b), we say that there is linear correlation between the data. Diagram (d) shows that there is no correlation between the data, meaning that x and y are independent of one another. Mathematically, there may appear to be a relationship between two data, but sometimes in reality, there isn’t any relationship. For example, you want to prove that the ears of a spider are on its legs. So you experiment it by putting it on the table, and shout at it and calculate its reaction time. Then you repeat your experiment by cutting its legs one by one. When all the legs are cut, it can’t hear your shout and therefore doesn’t move, so you have wrongly concluded that its ears are grown on its legs! 16.2 Pearson Correlation Coefficient Before we start, let us revise a little bit on standard deviation. We all know that the standard error s is given by the formula In this chapter, we will be dealing with 2 variables, and thus, we need to specify whether the standard error is for the values of x or y. To make the difference, we put a subscript x or y to indicate which variable it refers to. So over here, we have the standard errors for x and y respectively. We denote the variances of variables x and y as Note that sxx and sx2 mean the same thing, it is just a different notation for some books. With this information in mind, we shall now introduce the covariance, which is defined by the formula PEARSON’S PRODUCT-MOMENT CORRELATION COEFFICIENT The correlation coefficient is a statistic which provides the information on how strong the relationship of 2 variables is. Pearson’s product-moment correlation coefficient, also known as Pearson correlation coefficient or product-moment correlation coefficient, is a numerical value between –1 and 1 inclusive, which indicates the linear degree of scatter. It is represented by the formula where, –1 ≤ r ≤ 1. When r → 1, it indicates strong positive correlation, which means the regression line has a positive gradient, or y increases as x increases. Similarly, as r → –1, it indicates the presence of strong negative correlation. If r = 1 or r = –1, The points lie exactly on a straight line, and we say that they have perfect positive / negative correlation. However, when r = 0, it does not necessarily mean that there is no correlation. It might indicate that the variables x and y are independent of each other. Besides, it might also indicate that the variables x and y have a non-linear relationship. Take a look at the diagram below: Sorry but the dots are ugly. This diagram represents a quadratic function. The variables do have a quadratic relationship, but however, its correlation coefficient r = 0. This is just an example of how r = 0 fail to explain anything. On the other hand, having r close to zero only approximates that the data is positively linear correlated. Take a look at the diagram below. This diagram has a very high r, about 0.7 to 0.8. But however, it doesn’t mean that the data is highly positively linear correlated. It might mean that there isn’t a relationship after all. r is independent of the units used in the relation, and is very useful in determining the correlation of a 2 variables. Evaluating r can be tedious if you make use of the definitions of sx and sy. So here is the best way to calculate r: Some other common formulas to find r are: Besides, there is also this Big S format, whereby and using this convention, the formula for r is I would suggest that you keep to the ‘small s format’. In order to teach you how to find r efficiently using the calculator, consider the example below. Calculate the value of the p-m correlation coefficient for the data in the following table. Comment on your answers. Let’s make use of the calculator’s functions. Using your CASIO fx-570MS, press the mode button, and select REG mode. There will many kinds of REG mode, so you press ‘1’ for Lin mode (which means ‘linear’). Now, to input the data, you press [x-value] [, button] [y-value] [DT button]. So you should type in 5, 4.3 and the DT button for the first readings. Now the screen should display [n= [ ] 1] Continue typing every data, and press the AC button when you are done. Now you press SHIFT + S-SUM. You will be able to get lots of data from here: Σx2, Σx, n, Σy2, Σy and Σxy. These are the useful information you needed for your r (you need these to show your workings). But there’s a better one, press SHIFT + S-VAR. You get to find the values of x̅, xσn (sx), y̅, yσn (sy), and in fact, r itself! The only thing you can’t get is sxy (what a pity). So using your calculator, you find that the answer is r = 0.93, it is a strong positive correlation. 16.3 Linear Regression Lines (method of least squares, correlation & regression coefficient, coefficient of determination) Regression analysis is a statistical technique which can be used to obtain the equation relating 2 variables. A regression line makes estimations on one of the variables when the corresponding value of another variable is known. In this section, we are going to learn how to draw regression lines (lines of best fit). There are actually 3 methods that I know of: 1. By eye method You look at the bunch of dots, estimate using your eye, and start drawing the line. Not a good idea though. You probably used this method for your STPM Physics paper 3. 2. L & R method We fisrt start by finding the average values of x and y. We draw a horizontal and vertical line across the mid-point. Then, we proceed to find the mid point of the data on the left and right of the vertical line, and we connect these 3 midpoints to obtain a line. 3. Least squares regression line This is probably the best method of all, and we will be learning how to do it below. METHOD OF LEAST SQUARES The term ‘least squares’ tells us that the square of the distances between the points and the line is minimized. For a least squares regression line of y on x, the distance taken into account is the vertical distance. This line will definitely pass through the mid-point of the graph, (x̅, y̅). Take a look at the graph below. The red dots are the scatters, while the blue line is the least squares regression line. The line is drawn in such a way that the sum of squares of the vertical distances between the red dots and the blue line (green lines) is minimized. So to form a least squares regression line, we have 2 equations of lines, namely y = a + bx x = c + dy The line y = a + bx is known as the regression line y on x, while the line x = c + dy is the line x on y. Note that they are 2 different lines, and are not inversions of formulas. The line of y on x is used when x is the independent variable, and y being the dependent one. However, the line x on y is used only under 2 conditions: 1. when neither variable is controlled and you want to estimate x for a given value of y. 2. when y is the independent variable, and x the dependent variable. The line of x on y, according to its equation, has its gradient and y-intercept as follows: Notice another thing. In this chapter, the lines are not written as y = mx + c. The gradient is b, and by usual convention is put behind the constant a, so y = a + bx, but not y = bx + a. The constant b is known as the regression coefficient of y on x, and d is the regression coefficient of x on y. They are both calculated using the formulas which in the end, you find b to be If you could have looked closely, where r is the product-moment correlation coefficient you learned in the previous section. The term r2 has a name too, called the coefficient of determination of regression lines. r2 tells the percentage of the variable y can be explained by x. Or in other words, Or mathematically, You don’t really need to understand what it means, but just memorize it just in case they ask you to define it in exams. Take note that 0 ≤ r2 ≤ 1. Coming back to the relationship between the correlation coefficient and the regression coefficient. We can see that if * b and d are positive, then r is positive. * b and d are negative, then r negative. Finding b is not enough to plot the regression line of y on x. The equation of the line, in the end will be and from there, a can be found. Note that the terms x̅ and y̅ can be substituted with any ordered pair (x, y) given, and you get the same line. By the way, sometimes the lines are not that straightforward. You might be asked to make use of coding, in the form of Y = a + bX to transform lines which are not linearly related, into a linear line that can be analysed using regression lines. Common examples are Most statistical questions on this chapter mainly asks you to do these few things: 1. Plot scatter diagrams, and draw a regression line on it All you need to do is use the table of data given, plot the scatter diagram (on graph paper), and find the respective values using your calculator to get the values of a and b. 2. Make predictions and estimations Sometimes you are asked to extrapolate the line, to find a particular value of y, given x, and tell whether the data is sensible. Remember: extrapolation of a regression line is unreliable. You are to understand that there exists uncertainties of such predictions. In the case of a graph of age against running speed, you know that it doesn’t mean the older you are, the faster you run! 3. Calculator estimation Within the scatter data, sometimes you are given a value of x, to find the value of y, using the regression line you formulated. The estimated value of y is denoted by ŷ. It is not hard: with your regression line in hand, just substitute the value of x into it, and you get the value of y. In calculator, you can press [number] [x̂] [=] to find x̂, and [number] [ŷ] [=] to find ŷ. However, do take note that you find x̂ using the equation x = c + dy, and you find ŷ by using the equation y = a + bx. Remember which is the dependent and independent variable, they both make a lot of difference. 4. Find the correlation / regression coefficient or the coefficient of determination This is quite obvious. That was why we learned them in the first place. Before I end this chapter, let us take a look at an example, and we will learn how to use your calculator to find the regression line too. The following table shows the marks (x) obtained in a mid-year examination and the marks (y) obtained in the year-end exam by a group of 9 students. a) Plot the scatter diagram. b) Find the equation of the estimated least squares regression line of y on x, and x on y, and plot them. c) A 10th student obtained a mark of 70 in the mid-year exam but was absent from the year-end exam. Estimate the mark that this student would have obtained in the year-end exam. I think you shouldn’t have problem plotting the diagram, right? It looks something like this: So now, we are to find the regression lines. Firstly, key in all your data into your calculator. Remember to clear your previous data by pressing SHIFT + CLR, press ‘1’, then ‘=' (refer to previous post on how to key data in REG mode). Now press SHIFT + S-VAR. Press the right button until you see A B r. Guess what, the given a and b are the coefficients of the line that you wanted. So you immediately found the regression line of y on x, y = 15.83 + 0.72x Remember to show your workings though. You need to show how you calculate sxx, sxy, and syy, x̅ and y̅. For the equation x = c + dy, there’s no shortcut, so you have to calculate yourself, which gives you x = 22.63 + 0.66y We shall plot them on the graph: with the red line being y = a + bx, and green line being x = c + dy. Remember to label them in exams though. As for the estimate, you can use your calculator again. From the SHIFT + S-VAR function, and typing the formula I posted above, you should get 66.38.

Logic & Proof: Propositions, Quantifiers, Truth Tables

Related documents

Products

Support

Logic & Proof: Propositions, Quantifiers, Truth Tables

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib