Bab 1-bab 16 - E

advertisement
FURTHER MATHS T
BAB 1
1. Logic & Proof
1.1 Logic (propositions, quantifiers)
A proposition is a declarative sentence that is either true or false, but not both. Normally, we
will use the letter p, q and r to represent a proposition. The negation of a proposition is its
opposite. For example, if p: 2 + 3 = 5, then its negation is 2 + 3 ≠ 5. A negation of p can be
represented by ~p, p̅ or ¬p. It is recommended that you stick to one, so I will be using ~p
throughout this post.
Given 2 propositions p and q,
A conjunction has the meaning of ‘p and q’. Symbolically, it can be written as p∧q.
A disjunction has the meaning of ‘p or q’. Symbolically, it can be written as p∨q. There are
some other disjunctions which we don’t normally use, like the ‘exclusive or’ (p⊕q), ‘not and’
(nand, p | q) and ‘nor’ (p ↓ q).
An Implication is a conditional statement, where p is the antecedent (hypothesis or premise)
while q is the consequence (or conclusion). It has the meaning ‘if p, then q’. Symbolically, it can
be written as p→q. In English, there are many ways you can interpret this other than ‘if p, then
q’. Examples are
if p, q.
q unless not p.
p implies q.
p is sufficient for q.
p only if q.
q when / if p.
q whatever p.
a necessary condition of p is q.
a sufficient condition of q is p.
q is necessary for p.
q follows from p.
A bi-implication is a bi-conditional statement, which means that the implication and its converse
are equivalent. It has the meaning ‘p if and only if q’. Symbolically, it can be written as p↔q.
Other ways of saying it are
p is necessary and sufficient for q.
if p then q, and conversely.
p iff q.
Now using a conditional statement p → q,
A converse of the statement is q → p.
An inverse of the statement is ~p → ~q.
A contraposition of the statement is ~q → ~p.
To further elaborate the meanings of all the stuff above, I’ll continue by introducing truth
tables.
TRUTH TABLES
A truth table is a table which states the truth values of various statements. Here, ‘T’ means
‘true’, while ‘F’ means ‘false’. So the truth table for negation is
which means that every time when p is true, p̅ is false and whenever p is false, p̅ is true.
Let’s go on for conjunctions, disjunctions, implications and bi-implications:
Did you notice something? ‘p and q’ is true if both p and q are true. ‘p or q’ is true if either one
of p or q is true. p ↔ q is true if both p and q are true, or both p and q are false. p → q is a little
tricky. It is only false if p is true, and q is false. I’ll illustrate this a little:
Let’s say, Pakatan Rakyat, the opposition party of Malaysia gives a conditional statement: “If we
win the next elections, we will immediately reduce the price of petrol by 20%.” So according
to their statement, there will be 2 situations.
Situation 1: If they won the elections
In the event that they really did reduce the petrol price, then the statement is true. But if they
break their promise and the petrol price isn’t reduced, then the statement is false.
Situation 2: If they lost in the elections
Since they didn’t promise anything about what they will do if they lost, whatever that happens,
be it the price of petrol increases, remains or reduces, doesn’t make the statement false, and so
therefore, for either case, the statement is true.
Now we’ll go on to see the truth tables for converse, inverse and contrapositive:
Did you see something? You notice that the converse is equivalent to the inverse! And besides,
the statement itself is equivalent as its contrapositive. This is one very important piece of
information for you.
PRECEDENCE
In problem solving, you can combine all the logical operators to make a very complicated
statement. For example, you can have
[ (~r ∧ ~p) → q ] ∧ [ q → (~r ∨ ~p) ]
In order to understand the statement, you need to have a precedence of logical operators, which
means, which symbol comes first and which comes second and so on. The precedence of logical
operators, from first to last is as follows: ~, ∧,∨,→,↔.
If 2 statements have the same truth values, we say that the statements are logically equivalent. if
p and q are logically equivalent, we write p ≡ q. Note that it is an equal sign with 3 strokes
instead of 2.
A tautology is a compound proposition that is always true, no matter what the truth values of the
propositions that occur in. For example, ~p → (p → q) is a tautology. You can check by looking
at its truth table below:
Opposite to a tautology is a contradiction, a compound proposition that is always false. A
contingency is neither a tautology or a contradiction.
Here are some common laws of logic and some notes identities that you should memorize:
True-False Laws
Some laws for implications:
Some laws for bi-conditional statements:
PREDICATION & QUANTIFICATION
Sometimes, we can use predicates to represent a logic statement, which depends on an
unknown, or various unknowns. A predicate is normally denoted by P(x), or any capital letter
followed by a bracket with an unknown in it. For example, C(x) may represent “x is a comedian”
while F(x) may represent “x is funny”.
A quantifier is used to generalise or specialize a particular predicate, and is placed in front of it.
There are 2 kinds of quantifiers:
1. The universal quantifier, denoted by ∀xP(x) which means ‘for all x, P(x)’. It also could
mean ‘for every’, ‘all of’, ‘for each’, ‘given any’, ‘for arbitrary’ or ‘for any’. In terms of P(x), it
can be represented by the logical statement
which means the conjunction of any variable in the predicate P. Using the example above, if x
means ‘people’, then ∀xC(x) means ‘everyone is a comedian’.
2. The existential quantifier, denoted by ∃xP(x) which means ‘for some x, P(x)’. It also could
mean ‘at least one’, ‘there is a…’ or ’there exists’. In terms of P(x), it can be represented by the
logical statement
which means that it is the disjunction of any variable in the predicate P. Again, using the
example above, ƎxC(x) means ‘some people are comedians’. Sometimes, you might also see the
expression Ǝ!xC(x). Instead of ‘for some x’, this means ‘there is exactly one x’. We call it the
uniqueness quantifier.
We use these quantifiers just the same way how we use those p’s and q’s earlier on, you can add
the negating sign (~) or the conditional arrows (↔ and →) to them. So what are the truth values
of these quantifiers?
The statement ∀xP(x) is TRUE if P(x) is true for all x, in which x should belong to a particular
domain (people, animal, students or etc). It is FALSE if we could find an x in which P(x) is
false. Using the same example again, ∀xC(x) is true if every human being is a comedian, but is
false if you could find one person (yes, you only need one person) who isn’t a comedian.
The statement ƎxP(x) is TRUE if there exist one x which is true, and is FALSE if every x is
false for P(x). Using the same example, ƎxC(x) is true if there is one human being on earth is a
comedian, but is false if you can’t find a single human being who is a comedian.
Simple? Let me give you some common rules for quantifiers:
This one tells you how you can bring the quantifier into the brackets. Beware though, you can’t
bring a universal quantifier in if you use a disjunction, and the same applies to the existential
quantifier and conjunctions.
The negations of quantifiers:
This is quite straightforward.
A nested quantifier is formed when you use 2 or more quantifiers in 1 predicate. Examples are
like
Notice that both quantifiers mean different things. The first one says ‘for all x, there exist a y
such that P(x,y) is true’, while the second one says ‘there exist an x such that all y is true for
P(x,y)’. Let me give you a detail example:
Let P(x,y) be the statement “x has sent an SMS to y”, where the domain of x and y are
‘students’. We can see that
1. ƎxƎyP(x,y) means “There is some student who sent an SMS to some student.”
2. Ǝx∀yP(x,y) means “There is a student who sent an SMS to all other student.”
3. ∀xƎyP(x,y) means “All students sent an SMS to at least one student.”
4. Ǝy∀xP(x,y) means “There is a student who receives an SMS from all students.”
5. ∀yƎxP(x,y) means “Every student has been sent an SMS by at least a student”
6. ∀x∀yP(x,y) means “All students have sent an SMS to all students.”
Notice that ƎxƎy and ƎyƎx do mean the same thing, and this applies to ∀x∀y and ∀y∀x too, but
not the mixture of both. Now the problem is: how do you know the truth values for nested
quantifiers? I’ll show you in a table below:
You can actually work it out yourself by using the negating rules stated above. It’s simple: a
negation sign passes through a universal quantifier turns into an existential quantifier, and vice
versa. Another important remark is this:
"if Ǝx∀yP(x,y) is true, then ∀yƎxP(x,y) is true, but not INVERSELY.”
That’s all for logic. You should practice more on translating sentences into logical statements,
because it can be very confusing sometimes. Also, practice how to prove the equivalence of
logical statements using the given laws. For complicated equivalences (more than 2
propositions), you could use a truth table. It will be faster. We’ll start doing proofs in the next
section.
1.2 Proof (direct, indirect, induction)
1.2 – Proof
A theorem is a statement that can be shown true.
A proof is a valid argument that establishes the truth of a theorem.
In this section, I will be showing you 3 main kinds of proof (with tonnes of sub-proofs).
DIRECT PROOF
This proof is very straightforward. A direct proof of a conditional statement p→q, is shown by
showing that p is true, then show that q is true. Basically what it means is that you show that
something is true, and therefore some other thing follows to be true as well. Let me give you an
example:
Show that the sum of 2 odd integers is even.
Did you notice that this is actually a conditional statement? We let p be ‘m & n are odd integers’
and q be ‘the sum of m & n are even’. Putting p→q, we have ‘If m & n are odd, then their sum
is even’. So the proof is:
Let m = 2j + 1, and n = 2k + 1, where j and k are integers. Here we can see that m & n are odd.
Then m + n = (2j + 1) + (2k + 1) = 2(j+k+1), which is even.
Therefore, the statement is proven valid.
Try to get used to the definition of odd and even integers, being 2k + 1 and 2k respectively,
where k is an arbitrary integer (arbitrary means ‘anything’). Actually, when you are asked to
prove odd and even integer stuff, it will definitely be a direct proof, and just substitute these
definitions in, then you will find the answer. Another case is in proving theorems related to
perfect squares, where you use the definition n = k2. One more case is in proving rational and
irrational numbers, where you define a rational number n = j/k. All these definitions will help
you a lot, and you may use it in the later chapters, like Number Theory…
INDIRECT PROOF
Literally, an indirect proof is a proof that is not direct, i.e. you don’t prove straight using p→q.
There are many kinds of indirect proofs:
1. Proof by Contraposition
This proof basically proves the statement p→q using its contrapositive, which is ~q→~p.
Example:
Show that if x + y ≥ 2 (where x & y ϵ R), then x ≥ 1 or y ≥ 1.
Looking at the question, you know you won’t get anywhere if you try to manipulate the
first statement, x + y ≥ 2. So what you do is by assuming that ~q is true, which means x
< 1 and y < 1. Remember the fact that the contrapositive of a conditional statement has
the same truth value as it has! So here we have
Suppose x < 1 and y < 1. Then x + y < 1 + 1, x + y < 2, which is the negation of x + y ≥
2. Therefore, the statement is proven valid.
2. Proof by Contradiction (Reductio ad absurdum)
What this proof does is to first assume that p is true, and ~q is true (p→~q). You will
eventually evaluate ~q and find out that ~p is also true. Now you have p Λ ~p, which is
a contradiction (a statement which is always false)! From here, you conclude that p→q.
Example:
Prove that the sum of an irrational number and a rational number is irrational.
Putting it in a p→q form, we rewrite the statement as ‘if m & n are irrational and rational
respectively, then their sum is not rational’. Now let’s try solving it:
Suppose r is rational and i is irrational, and r + i = s is rational. [p is true, ~q is true]
Then s could be written as p/q for some integers p and q which have no common factors. r could
also be written in the form t/u, where t and u are integers with no common factors too. Using
some algebra,
which is contradicting, since i is irrational. So it follows that the sum of an irrational number and
a rational number is irrational.
3. Vacuous Proof
All you need to do in this proof is to show that p in the p→q statement is false. (Note that when
p is false, the statement will definitely be false!) This is used to proof statements like ‘if 2 + 2 =
5, then it will snow in Malaysia’. Won’t come out in STPM, I assume…
4. Trivial Proof
This proof is similar to the above, but here you must show that q is true in the p→q statement.
(Once again recall, when q is true, whether p is true or false, the statement still holds!) For the
statement ‘if you give me RM1, then the sun will rise in the east’. I don’t need to explain, right?
5. Proof of Equivalence
This proof is for proving statements of the form p↔q, where you need to prove p→q and its
converse, q→p. You can use one of the above methods (direct proof, proof by contraposition or
contradiction) to solve the p→q and q→p part. Basically this proof is a combination of proofs,
and I don’t think I need to elaborate much on this.
6. Proof by Cases
In this proof, you need to proof something using a case by case basis. For example, Prove that |x|
+ 2 > 0. I won’t show you the answer. All you need to do is use case by case: case 1 where x > 0,
case 2 where x < 0, and case 3 where x = 0, substitute the values of x in it, then show that it is
true.
7. Exhaustive Proof
This proof requires you to use up all the possibly numbers in that domain, substitute them into
the equation to show that it is valid. E.g., Prove that n2 + 1 > 2n, where 0 < n < 5. Just use up all
the values n = 1, 2, 3, 4 and substitute them into the equation to prove it. Very straightforward
for a small domain of n.
8. Existence Proof
The questions for this kind of proof normally starts with something like ‘show that there exist
a…’. There are 2 kinds of existence proof, it can either be constructive or non-constructive. A
constructive one will make you find the exact answer to the question, while the non-constructive
approach will prove the statement true even without finding a solution. I’ll show you the
examples:
Show that there is an integer that can written as the sum of cubes of two integers in two different
ways.
This is true as 1729 = 103 + 93 = 123 + 13. [constructive]
Show that the equation x3 + x - 1 = 0 has a solution.
Let P(x) = x3 + x - 1. Then P(0) = -1 and P(1) = 1. Thus (by the intermediate value theorem),
the equation P(x) = 0 has a solution which is between 0 and 1. [non-constructive. Interesting
isn’t it?]
9. Uniqueness Proof
This proof is an extension of the previous proof. First you show that there is a solution x for the
statement P(x). Then you find a value of c which is true for P(x). So lastly, you show that if x ≠
c, then P(x) is false. Example:
Show that if a, b ϵ R with a ≠ 0, there is unique r ϵ R such that ar - b = 0.
Certainly r = b/a satisfies ar - b = 0. [1st & 2nd part]
Next, suppose s, t both satisfy as - b = 0 and at - b = 0. Then as - b = at - b and so as = at. Since
a ≠ 0, we have s = t, which means that the solution is unique. [3rd part]
10. Proof by giving a Counterexample
A counterexample is used to disprove something. This is super easy, for example:
Prove or disprove that the product of 2 irrational numbers is irrational.
Using a counterexample, √2 × √2 = 2, which is rational. So the statement is invalid.
All you need to disprove something is to find a counterexample. That’s it.
MATHEMATICAL INDUCTION
Mathematical induction is the most common proof that you will use and see in your exams.
There are lots of information on this proof in A-level books, so I don’t need to give you too
much examples. What mathematical induction does is that it proves that an equation is true for a
particular value, we call it the basis. Then we go on to prove that the equation is valid for any
value greater than that basis. To sum up, mathematical induction involves 2 steps:
1. The basis step: you proof that the equation is true for n = 0, n = 1 or whatever initial value
they give you in the question.
2. The inductive step: you now assume that the equation is true for n = k. Then you try solving
the equation in terms of n = k + 1, and show that the relationship holds too. From here you can
conclude that by mathematical induction, the equation is true in that domain.
Let me show you an example:
Use proof by induction to prove that
Since
LHS = RHS, we see that the formula is true for n = 1. We assume that the formula is true for n =
k, then we have
and letting n = k+1, we have
Hence
is true for all n ≥ 1. [proven]
I will show you some tips on how to solve different kinds of questions. Remember to do A LOT
of exercises on Mathematical Induction!
* Questions involving summations
Try to make use of the fact
* Questions involving matrices
Try to make use of the fact
Ak+1 = Ak × A, where A is a matrix.
* Questions involving differentials
Try to make use of the fact
* Questions involving recurring terms
Try to make use of the fact
if un + un+1 is divisible by a, then either un & un+1 are both divisible by a, or both not divisible by
a.
BAB 2
2. Complex Numbers
2.1 Polar Form (geometrical effects, exponential form)
The basics of this chapter on complex numbers should have been covered in Mathematics T. So
while I explain stuff over here, I assume that you already know the following about complex
numbers:
a. understand what is the real part, imaginary part, and conjugate of a complex number
b. find the modulus and argument of a complex number
c. represent complex numbers geometrically by means of an Argand diagram
d. use the condition for the equality of two complex numbers
e. carry out elementary operations on complex numbers expressed in Cartesian form
So in this post, I’ll be introducing to you another 2 forms of which you can represent a complex
number. This will be a short one.
You still remember that the modulus of a complex number, a + bi is denoted by |z|, which has
the value √(a2 + b2). And then, you also remember that the argument of a complex number,
denoted by arg z, has the value of tan-1 (b / a), putting in account which quadrant is the complex
number in the Argand diagram. So with the modulus and argument, you can represent a complex
number in polar form, or (r, θ) form, which is
|z| (cos (arg z) + i sin (arg z))
For example, the modulus of the complex number –3 + 3i is 3√2, its argument is 3π/4, so its
polar form is 3√2 [ cos (3π/4) + i sin (3π/4)]. Using this form will give us a lot of advantages,
especially when we learn de Moivre’s theorem in the next post.
Another way of expressing complex numbers using the argument and modulus is the
exponential form. The term cos θ + i sin θ can be written as eiθ, where e is the natural exponent.
So by multiplying the modulus in the front of both terms, and substituting the θ with the
argument, you get another form of complex numbers! Using the example above, the exponential
form of –3 + 3i is 3√2ei(3π/4). This is something like a compressed form of complex numbers. By
now you will be puzzled as in how do you relate cosine and sine with exponents. Well, I’ll show
the derivation, only in the chapter Power Series. As for now, just take it as it is. (Extra info: go
google for “Euler’s Equation”. You will be surprised by the equation eiπ + 1 = 0 !)
Now that we know 2 extra forms of complex numbers, we want to know how multiplication and
division of complex numbers can be done easily in these 2 forms. The general rule is this:
When you multiply 2 complex numbers, you multiply their modulus and add up their
arguments. But when you divide 2 complex numbers, you divide their modulus and
subtract their arguments.
Let’s give it a try. Suppose we multiply 2 complex numbers, –3 + 3i and –1 – i. You will have:
3√2 [ cos (3π/4) + i sin (3π/4)] × √2 [ cos (3π/4) - i sin (3π/4)] = 6
Did you catch my working? Try using some trigonometry formulas, you will eventually get the
answer. Take note that 3π/4 - 3π/4 = 0!
There’s also another way of going about it. Let’s try using the exponential form:
3√2ei(3π/4) × √2ei(-3π/4) = 6
This is multiplication. It will be the same for division, go ahead and give it a try. There’s no
shortcut for addition and subtraction though, so it’s better to use the cartesian form to solve them.
Now we will learn the geometrical effects of basic operations on complex numbers. We will be
using the Argand diagram (which I assume you already know what it is, and what are its axes).
1. Conjugation
Basically what conjugation does is that it reflects a complex number across the real axis. Try
visualizing these complex numbers as vectors, and you will understand more.
2. Addition & Subtraction
The diagram above shows the addition of 1 + 1i with 1 – 1i. As you can see, it is merely a vector
addition, you add up the two of the complex numbers as if they were vectors. You should be able
to deduce that for subtraction, just by using the fact that the added minus sign switches the
direction of the arrow, you will be able to get the answer, which will be 2i, lying on the
imaginary axis. Quite straightforward right?
3. Multiplication & Division
Possibly you couldn’t catch what the diagram meant. The green arrow, 3 – i is the result of the
multiplication of 1 + i and 1 – 2i. What happens here is just exactly what we learnt above: the
modulus multiplies (the length does not add up, but multiplies as in √2 × √5 = √10) and the
argument adds up (0.25π rad – 0.35π rad = –0.1π rad). You can try expressing the complex
numbers in polar form, which will help you to identify it clearer. These are really simple stuff,
because your head will start to spin when the loci of complex numbers come in.
2.2 de Moivre’s Theorem
de Moivre’s theorem states that:
for all real values of n,
(cos θ + i sin θ)n = cos nθ + i sin nθ.
This is a very important relationship we need to know about complex numbers. before we start
using it, let’s try to prove it first.
PROOF
When n = 1,
(cos θ + i sin θ)1 = cos θ + i sin θ
and so the theorem hold for n = 1.
Now, we assume that the theorem is true for n = k, so
(cos θ + i sin θ)k = cos kθ + i sin kθ
if the equation is true for n = k, it should be true for n = k + 1, and therefore
(cos θ + i sin θ)k+1 = (cos kθ + i sin kθ)(cos θ + i sin θ)
= cos kθ cos θ + 2i cos kθ sin θ – sin kθ sin θ
= cos (k + 1)θ + i sin (k + 1)θ
which is true.
.·. by mathematical induction, de Moivre’s Theorem is true for all integers n > 0.
Let’s try proving for negative numbers too.
Let n = –p.
since p = –n, cos (-n)θ – i sin (-n)θ = cos nθ + i sin nθ.
.·. once again, this theorem is proven.
So we see that actually de Moivre’s Theorem is true for all values of n, where n is any integer.
We can also show that it is true for fractions, but this is beyond what we can learn. However, one
thing to note that if n is not an integer, cos nθ + i sin nθ is only one of the possible values. I will
elaborate more on the next post in the section on roots of unity.
The most important thing for this section, is that you need to remember how to prove this
theorem, and know how to use it. You will be able to simplify a lot of complex number equations
by changing the exponents into just multiplication of numbers.
Another thing you should note is the relations of negative angles.
cos (-θ) = cos θ
sin (-θ) = - sin θ
You will be dealing with all these a lot. It is good to memorize it, and be careful not to make
mistakes.
APPLICATIONS
1. I’ll show you an example how de Moivre’s Theorem help you in proving trigonometric
identities.
Express sin 3A in terms of sin A.
sin 3A = Im (cos 3A + i sin 3A) [here, “Im” means imaginary, while “Re” means real.]
= Im (cos A + i sin A)3
= Im ( cos3 A + 3 cos2 A i sin A - 3 cos A sin2 A – i sin3 A)
= 3 cos2 A sin A – sin3 A
= 3 sin A – 4 sin3 A
Okay, I need to explain this. Here, we are trying to project the term sin 3A in terms of a complex
number, which can be dealt with using de Moivre’s Theorem. So sin 3A, is actually the
imaginary part of cos 3A + i sin 3A, and we put the “Im” there because sin 3A belongs to the
imaginary part (this means that if our question was cos 3A, we have to put “Re” in front of it
instead). We evaluate it, and when we remove the “Im” sign, we remove all the real parts (terms
without the ‘i’), leaving the imaginary part without the ‘i’ in it. Try using this method to solve
cos 3A, you will understand more by then.
2. If we set z = cos θ + i sin θ, then
From here, you can further deduce that
With all these, we can do the above example backwards.
Express sin3 A in terms of sines of multiple angles.
2.3 Equations (roots of unity, loci, transformation)
ROOTS OF UNITY
When you take the square root of 1, you will get two answers: 1 and –1. But what happens when
you take the cube root of 1? All these while, you studied that there’s only 1 cube root, which is 1
itself. But the actual fact is that it has 3 roots, –1/2 + √3/2 i, –1/2 - √3/2 i, and 1.
From here, you start guessing that probably the 4th root of 1 will yield 4 roots, 5th root of 1 will
yield 5 roots and so on! You are correct. So in this section, I will teach you how to find the roots
of unity. Here unity means ‘one’, not ‘bersatu padu’. You will see this term very often in higher
level physics.
Let us try finding the cube root of 1. If you use 3√1, you won’t get anywhere. We need to use de
Moivre’s theorem. So expressing 1 in polar form, we get |z| = 1, arg z = 2π (It is actually 0, but I
recommend you using this instead) and we have 1 = cos 2π + i sin 2π. Let us continue from
here:
Does it look familiar? Now you might want to try to use de Moivre’s theorem to solve it. But
wait! de Moivre’s theorem is true for integers only, and that if n is not an integer, it is only one
of the possible solutions. Which means, there are other solutions to be found. Let’s use de
Moivre’s theorem first:
Now I will teach you how to find the remaining roots: you need to add 2nπ to the angle to get
another root. You do this all the way until you reach the angle 2π (or, not exceeding 2π). n here
denotes the exponent that you use, here it is 1/3, so you need to add 2π/3 to the angles. This
means that we have
So we have found the 3 roots of the cube root of unity. By the way, it could have been faster to
write if we used the exponential form, so 3√1 = e(2/3)πi, e(-2/3)πi and e2πi (which is actually 1). Did
you notice that cos (4/3)π + i sin (4/3)π is actually cos (2/3)π – i sin (2/3)π, and it is actually the
conjugate of cos (2/3)π + i sin (2/3)π? So it seems there is a faster way of finding the roots.
Putting in mind that 1 is always a root of itself (and –1 is also a root ONLY if n is even, for
example the square root, or the 4th root), you just need to add 2π/3 to the angles up to and not
exceeding π, and you already know the rest of the roots, because every root’s conjugate is also a
root!
You can try to solve the roots of unity for the 4th root, 5th root all the way up to the 9th root or
beyond. Let me tell you some applications of this:
1. If you were to plot the roots of unity on the Argand diagram, they form a circle with |z| = 1,
and they are equally spaced from each other! Here is the Argand diagram for the cube roots of
unity:
This actually applies to the roots of any number, just that |z| might equal to something else.
2. Using this information for roots of unity, you can find all the real and complex roots of any
other number. For example, 4√16 = 4√1 × 2 or 6√-64 = 6√1 × 2i . The terms 2 and 2i refer to the
absolute root of 16 and –64 respectively. Using de Moivre’s theorem to find out the 4th root and
6th root of 1, you can just multiply the roots of unity with 2 and 2i respectively to get the
required roots for the answer. If you remembered that the 4th roots of unity are 1, –1, i and –i
respectively, you get 4√16 = 2, –2, 2i, –2i. Try solving the other one on your own. There are
many questions of this kind, for example, solving for z as in z5 = –8 + 8i, or (z + 2i)2 = 4. This
needs some practice.
3. Now you will probably face polynomial questions with complex coefficients, like
2ix2 +3x + 3 + 4i = 0. So be ready to know how to solve.
4. Try adding up all the roots of unity. You will be surprised that for every nth root of unity, all
the roots add up to 0! Try answering the following question using this piece of information:
By considering the ninth roots of unity, show that
Hint: Find the all the ninth roots of unity, sum them up (equals to 0), and you’ll prove the
equation. Also, Remember what you learned in high school Maths, the 2 complex roots in a
quadratic expressions are conjugates of one another. This means that for whatever nth root of
unity, other than 1 (or –1), the rest of the complex roots are paired up such that one root is sure to
be the conjugate of another.
5. The square of one root is actually the conjugate of itself, which is just another root! So if w is
a cube root of 1, then 1 + w + w2 might have 2 answers. If w is a real root, then it will be 3, but it
is either of the complex roots, the answer will be 0. Try verifying this.
COMPLEX INTEGRATION
I hope you have learned the chapter on Integration in Maths T already. You solved the integral
by using integration by parts, and handing over from left to right and etc. But you could actually
solve such an equation by complex integration. Let me show you how:
Do you think this is easier compared to the integration by parts? I thought so, just don’t be
careless.
LOCI IN THE COMPLEX PLANE
Still remember learning loci in form 3 Mathematics? It’s read as ‘lo-sai’ by the way, not
‘lo-kee’ or ‘lo-chee’. Loci is actually just a representation of an equation in a graph or
diagram. In a cartesian plane, you should be familiar with the equations for circles,
straight lines, or ellipse (please study Coordinate Geometry in Maths T before starting
this section). Here, the representation of complex loci in the Argand diagram is similar to
the one in polar coordinates (which you do not learn in STPM anyhow). Anyway, I’ll start
by introducing you 6 types of common loci. Throughout this section, z is a variable,
while z1 or z2 will be complex numbers, and k, r, c or θ are just constants.
1. |z - z1| = r
This basically represents a circle, with the centre z1, while r is the radus of the circle.
For example, take |z + 2 + 2i| = 2√2, we have
2. arg (z – z1) = θ
This is a half line, starting from the point z1, making an angle of θ with the positive real
axis, going all the way to infinity. Taking arg (z + 1 + i) = π/4, we have
3. |z – z1| = |z – z2|
This is a perpendicular bisector, in between the two complex numbers z1 and z2. To find the
distance between z1 and z2, just find |z1 – z2|, and to find the mid-point, use (z1 – z2)/2. Taking |z
– 1 – i| = |z + 2 + 2i|, we have
4. |z – z1| = k |z – z2|
Adding a constant ‘k’ really changes this loci compared to the one before. This loci turned into a
circle instead! It is the ratio between the distances between 2 complex numbers. Probably this is
the hardest one, because you need to find out the centre of the circle before you can draw it out.
Taking |z – 2| = 3 |z + 2i|, we can’t easily draw it out straight away. Using the fact that z = x + i y
and using the definition of modulus, you can change the equation into
and you can further solve the equation into an equation of a circle, which is
and now we plot the loci onto the Argand diagram, which is
5. arg [(z – z1) / (z – z2)] = θ
This one is hard to explain in terms of Cartesian coordinates. It is actually a circle drew out by 2
lines connecting 2 complex number points at a fixed angle. Using the equation
The 2 points are (0, 2) and (0, 0) respectively. We draw 2 lines out of these 2 points,
such that they intersect at some point which nicely makes the angle π/4. So this is
shown below:
The loci is not the blue line. It is the arc of a circle traced out by the tip of the arrow
shown below.
6. |z – z1| + |z – z2| = k
This one plots out an ellipse, where z1 and z2 are the foci of the ellipse, and k is the sum
of the distance between z1 and z2. To plot this one, you also need to change the
equation into cartesian coordinates as in the previous example. I leave this to you as an
exercise. A general ellipse looks like this:
Now that you know all the possible types of loci, you need to practice plotting them
when given a loci equation. Then, you should also know how to convert them into
equations of cartesian coordinates. Besides, you should also know how to transform
loci. I give you a few general rules:
1. w = z2 transforms a line into another line, and a circle into another circle.
2. w = (z1 + z) / (z2 – z) transforms a circle into a bisector.
3. w = 1/(z1 – z) transforms a circle into a line.
4. w = kz – c/z transforms a circle into an ellipse, where k and c are constants.
5. w = z̅ also transforms a circle into a circle.
I don’t think this is in the syllabus, but I will give you an example.
For the transformation w2 = z, find the locus w for |z| = 5.
z = 5 (cos θ + i sin θ)
w = √z = √5 [ cos (θ/2) + i sin (θ/2) ]
.·. w is a circle, with radius √5 and centre at 0.
3. Matrices
3.1 Row & Column Operations (properties of determinants)
This section is mostly covered in Maths T. So I will only discuss on properties of
determinants.
You should possibly know what a determinant is by now, at least for 2 × 2 and 3 × 3 matrices. In
Maths T, you are told to evaluate the determinant of a 3 × 3 matrix just by how it is. Here, I’m
going to teach you that there is a shortcut operation such that you could calculate the determinant
in a faster method.
Sometimes, we are required to change the appearance of the determinant to ease us in our
calculations. There are several ways to change the determinants without altering its value:
1. add / subtract any row to any other row
adding the second row to the first row yields:
2. add / subtract any column to any other column
subtracting the first column from the third column yields:
3. add / subtract any multiple of any row / column to any other row / column
adding 3 times of the third row to the first row yields:
4. interchange 2 rows / columns and change its sign (+ / –)
interchanging column 1 and column 2 yields:
5. factor out a constant k from any row / column
factoring out 3 from all 3 rows yield:
Note the difference here between matrix and determinant. You only factor out one ‘3’ if it were a
matrix. Don’t make mistakes.
6. transpose the determinant
I think you understand this without illustration, right?
One more interesting fact about determinants is, whenever 2 rows / columns are equal, the value
of the determinant is zero.
Knowing how you can simplify the determinants, this gives you the advantage when you
calculate inverse matrices. You can now calculate the value of the determinant faster! Besides, it
can be useful for situations, just like the one below:
Factorize the following determinant:
So, the answer will be:
3.2 System of Linear Equations (consistency, uniqueness, Gaussian elimination,
Cramer’s rule)
Are you aware that not all simultaneous equations have unique answers (that means,
you can find the unknowns x, y or z absolutely)?
Let’s take a look at the equation below:
-3x – y + z = -1
If I give you just one equation up there, it is certain that you will not be able to find an
absolute answer for x, y and z. In fact, you could substitute any value for x and y, and
you get a value for z!. From here you could have guessed that equal amount of
equations and unknowns are required for you to solve the problem. This means, if you
have 2 unknowns, you need 2 equations to solve for x and y. If you have 3 unknowns x,
y and z, the minimum amount of equations you need to solve the problem is 3, and etc.
So now let me give you another 2 equations below:
-3x – y + z = -1
x + 4y – z = 3
-5x + 2y + z = 2
Are you able to solve these 3 equations simultaneously? Alright, you only have learnt 2
variables, so let me give you another example below:
2x + y = 1
4x + 2y = 3
If you tried solving it, you find out that it is not solvable. A certain amount of unknowns,
expressed in the same amount of equations is what we called as a system of
equations. It is linear when all the variables, x, y, z, w or so on are to the power of
one. So this section, I’m going to teach you how to solve systems of linear equations
with 3 variables, and also know how to identify whether a system of equations have
unique solutions.
CONSISTENCY
When we talk about consistency of equations, we are actually trying to determine
whether this system of linear equations have a unique solution, infinitely many
solutions, or no solutions. Every system of linear equation (I’ll be using 3 variables
only in this section) will have either one of the outcomes. Having a unique solution
means that there is a definite value for x, y and z, and when x, y and z are not those
values, the 3 equations will contradict one another. For example,
2x + z = 0
2x + y + 4z = 1
3x + y + 8z = 1
will be consistent with each other if and only if x, y and z are 3, 3 and –1 respectively.
Any other value will be wrong. Now, if a system of linear equations don’t have a unique
solution, it has either infinitely many solutions, or no solution at all. Having infinitely
many solutions means that for the terms x, y and z are all dependent on another
variable, let’s call it t, and t can be ANY VALUE. For example,
3x + y – 2z = –4
x +2y + 3z = 11
3x – 4y –13z = –41
have answers x = 7t - 19, y = 37 – 11t and z = 5t respectively, and t can be any
number. As of the example above (the one in blue), it has no solution, because any
value of x, y and z will not be able to get any consistent answer.
To determine whether a system of solution has unique, infinitely many or no solution,
we first need to represent them in the form of a matrix. Taking the blue example above,
we have
From here, we will determine the determinant of the 3 × 3 matrix. If the determinant
gives a non-zero value, then there IS a unique solution for x, y and z. But if the
determinant value equals to zero, then we have a non-unique son, in which we have to
find out whether x, y and z are linearly dependent of each other, or there just isn’t a
solution for x, y and z. Try calculating the determinants of the blue, red and green
example to verify whether it is so.
It can be quite hard to identify whether the system of equations are linearly dependent
(infinitely many solutions) or inconsistent (no solution). Basically, when you only have 2
or 1 equation available for a system of 3 unknowns, that system of equations is
definitely linearly dependent. Try checking out the green example. Take the 1st
equation, subtract 3 times the 2nd equation, you get –5y – 11z = –37. Taking the 3rd
equation, subtracting 3 times the 2nd equation, factoring out 2, you get the same
equation –5y – 11z = –37! This means that you actually only have 2 equations for 3
unknowns, which tells that there is infinitely many solutions. To solve this system of
equations, you simply let any variable be t, as long as it doesn't give you trouble. For
this case, I'll use z = 5t. Substituting it back to any 2 equations, you will get the final
answer (x, y, z) = (7t - 19, 37 - 11t, 5t).
For a system with no solution, you need to find 2 equations which contradict with one
another. Using the orange example, after multiplying 2 to the 1st equation, you get
4x + 2y = 2
4x + 2y = 3
and you are about to conclude that 2 = 3, which is a contradiction! This shows that the
system of equations are inconsistent with one another.
Equations can get very complicated, and it is not easy to identify contradicting or same
equations within the system of equations. But there is a faster way.
GAUSSIAN ELIMINATION
Before I introduce this method, I need to tell you that the system of 3 linear equations
could be represented in an augmented matrix below:
The line separates the 3 × 3 matrix and the constants on the right hand side. This is the
blue example I take from above. The aim of Gaussian Elimination is to represent the
augmented matrix in the row-echelon form before we solve the equation. The rowechelon form of augmented matrices have the following characteristics:
1. The 1st non-zero entry of each row is 1 (which is called the “leading 1”).
2. Below each “leading 1” are all zeroes.
3. Each “leading 1” is placed one position to the right of the “leading 1” in the row above.
4. Any row consisting entirely zeroes (if there is) will be placed at the bottom of the
matrix.
Confused? Look at the matrix below:
In human language, as long as you have a diagonal of 1’s slanting from left to right
downwards, and there are 3 zeroes on the bottom left corner, then that is a row-echelon
form. So if the last row has all zeroes, so be it, as long as that row is not placed in the
middle or the top.
In order to transform the blue example above into the row-echelon form, we need to
learn some elementary row operations. Let i, j, and k be the labels for the 1st, 2nd and
3rd row (I hope you are able to distinguish a ROW and a COLUMN by now), and c be a
constant. There are 3 things we can do to the augmented matrix without altering the
final outcome:
1. interchange rows i & j, denoted by Rij.
2. multiply a row with a number c, denoted by cRi.
3. add c times of row j to row i, denoted by cRj + Ri.
So to show you how to go about, let’s use the blue example:
First, I switched the first and 2nd row, so I’ve got a “leading 1” for my first row. Next, I try
to add 3 times the 1st row to the 2nd row to get a zero in the front of the 2nd row, then
divide by 11 so that I get the 2nd “leading 1”. Since the 1st and 2nd row are done, I now
solve for the 3rd row, and so happen that the 3rd row has 3 zeroes on the left (which
has to be at the bottom, remember?). Converting this augmented matrix back to the
system of equations, I have:
which has a contradiction! Therefore there is no solution for this system of equations.
Take note that there is a difference between no unique solution and no solution, and
both mean very different things!
Now, try using Gaussian elimination on the red and green example. For the red one,
you will get the last row z = –1. By using back substitution, you will solve for x and y
easily. For the green one, you will eventually get the last row filled with all zeroes.
Please do lots of exercises on Gaussian Elimination, as you don’t want to make stupid
mistakes in exam.
CRAMER’S RULE
Consider a system of n equations in the n variable x1, x2, ..., xn, expressible in matrix
form as AX = B, where A is an invertible matrix. Let A1 be the matrix obtained by
replacing the ith column of A with the n × 1 matrix B, Then the solution to the system
is given by
I suppose you don’t like definitions in alien language. Cramer’s rule is applicable for
systems of linear equations which have unique solutions only (probably when you are
not having your calculator with you). In human language, Cramer’s rule states that when
you have 3 linear equations
a1x + b1y + c1z = d1
a2x + b2y + c2z = d2
a3x + b3y + c3z = d3
Your expression for x, y and z are
The expression for x, y and z are just fractions of 2 determinants. The bottom
determinant in blue are the same for all 3 x, y and z, which is the one you use to
determine whether a system of linear equations have unique solutions (notice that if the
blue determinant is 0, you don’t get an answer!). The determinant on the top differs by
substituting the coefficient of that variable with the d’s. For example, the coefficient of x
are the a’s, so just substitute them with all the d’s. I highlighted the d’s for clarity.
Now, use the red example above and try solving it with Cramer’s rule. If you feel like
freaking your Maths T teacher out, use this method in exam, and risk your marks. :P But
as I said earlier, you can actually solve simultaneous equations with 3 variables using
your calculator. So probably you will only use this method when asked.
3.3 Eigenvalues & Eigenvectors (diagonalization, Cayley-Hamilton theorem)
Before I continue, I need to teach you some basics of transformation. You will learn the further
details in Chapter 11.
You probably still remembered the transformations you learnt in form 4 Mathematics. There are
translations, enlargement, rotation, reflection and etc. Basically, every kind of transformation,
whether in 2 dimensions or 3 dimensions, can be represented by a matrix. I won’t be teaching
you how, but you need to know this before I continue. So it simply means that a vector (x, y),
after being transformed by the transformation T (which can be represented by a matrix) will
change its coordinates to (x1, y1). It can be written as
Now, in some special cases, you will find that a particular line after undergoing transformation
T, will remain unchanged. A good example is the reflection across the line y=x. You can reflect
the line y = 2x and it turns into the line 2y = x. However, you find that the line y = x, after the
reflection, is still y = x! This line is called an invariant line.
An eigenvector is a vector pointing in the direction of an invariant line under a particular
transformation. An eigenvector is not unique, for example, the eigenvector for the line y = x
could be (1, 1), (2, 2) or etc., but they both mean the same thing. An eigenvector, after the
transformation T, will still fall in the same line (same direction, or rather, same ratio between x
and y), but not necessarily the same position. So using matrices, we can represent the
transformation T as
Notice the right hand side. Since I said earlier that the vector (x, y) after being transformed, will
still be in the same line (same x:y ratio), it means that it will just transform to another vector (x,
y) multiplied by a constant λ. λ is what we called as an eigenvalue.
The aim of this section is simple, you just need to know how to identify the eigenvalues and
eigenvectors. I’ll be focusing on 3D vectors in this section (you will learn them in detail in
Chapter 12). So let’s say we let the transformation M be
And then we have
As I said, our aim is to find what is the eigenvector and eigenvalue for this transformation. Let’s
try to find λ. Doing a little algebra,
Now let’s get back to determinants. Looking at the situation, we know that the determinant of the
big chunk matrix over there must be zero, because since the eigenvectors are not unique and are
non-zero. Besides, there’s no particular x, y or z that we are finding, it is an invariant line, which
is dependent on a parameter t (recall your coordinate geometry in Maths T). So we are almost
there! By writing down
we can find λ by forming a quadric equation, then we can substitute it back to the initial equation
to find its eigenvector. Let me show you an example:
FInd the eigenvalues and their respective eigenvectors for the matrix
So let’s start by finding the eigenvalues.
(1 - λ)[(2 - λ)(3 - λ) - 2] – 1(2) + 2(2 – λ) = 0
λ3 – 6λ2 + 11λ – 6 = 0 [this equation is called the characteristic polynomial]
λ = 1, 2, 3
So by simplifying the equation and using your calculator, you find that this matrix has 3
eigenvalues. Note that it is possible such that a matrix only has 2, 1, or no eigenvalues. Now that
we found the eigenvalues, let’s try to find the eigenvectors. We need to substitute each value of
into the equation
When λ = 1,
We get a system of linear equations,
y – 2z = 0
y – 2z = 0
-x + y –2z = 0
So you immediately notice that this is actually a system of linear equations which are linearly
dependent. Letting z = t, you have (x, y, z) = (0, –2t, t). However, your answer is not
.
as you need to substitute a value for t into it. So your answer should be (0, –2, 1). Note that you
can also put (0, –4, 2), (0, –2.666, 1.333) as any scalar multiple of any eigenvector is still the
same eigenvector. But we really chose the first one for simplicity.
Hey, the working is not done yet!
When λ = 2,
…
…
the eigenvector is (1, 1, 0)
When λ = 3,
…
…
the eigenvector is (2, 2, 1)
Try do the working yourself.
In the end, you find yourself with 3 eigenvalues and their respective 3 eigenvectors. The finding
process may be hard if you are a careless person. So please do a lot of practices on this section.
CAYLEY-HAMILTON THEOREM
If the characteristic polynomial p(λ) of an n × n matrix A is written
p(λ) = (-1)n (λn + bn-1λn-1 + … + b1λ + b0), then
An + bn-1An-1 + … + b1A + b0I = 0
Basically what this theorem means is that the λ in the characteristic equation of the matrix can be
substituted with the whole matrix itself. Taking the characteristic equation example above,
λ3 – 6λ2 + 11λ – 6 = 0
tells us that actually
M3 – 6M2 + 11M – 6I = 0
where M is the matrix itself.
From here, you could actually find the inverse matrix M-1 a fast way. Post-multiplying M-1 to the
equation, we get
M3M-1 – 6M2M-1 + 11MM-1 – 6IM-1 = 0
M2 – 6M + 11I – 6M-1 = 0
and from here we will get
That is the only reason I think Cayley-Hamilton Theorem is used for.
This chapter should end here. However, I think it is better that you know some
applications of eigenvalues and eigenvectors. One such application is diagonalization.
BAB 4
4. Recurrence Relations
4.1 Recurrence Relations (problem models)
I supposed that you have learnt the chapter Sequences & Series in Maths T before you arrive at
this chapter.
A recursive definition of a sequence specifies one or more initial terms and a rule for
determining subsequent terms from those that precede them. A recurrence relations
for the sequence {an} is an equation that expresses an in terms of one or more of the
previous terms of the sequence, namely, a0, a1, …, an-1, for all integers n with n ≥ n0,
where n0 is a non-negative integer.
That was the formal definition of recurrence relations. When you say that something is
recursive, it means that there is a repetition. So a recurrence relation is basically just an
equation which relates a term, with the term before it. Let’s take the arithmetic sequence 1, 2, 3,
4, 5, … till infinity. So the term ‘2’ is derived from the term ‘1’, by adding 1 to it. Similarly,
there is the same relationship for all the terms, which is to add 1 to it. We shall denote ‘1’ as a0,
which is the initial term. Then, we find that the term a1 which is related to the initial term by the
equation
a1 = a0 + 1
So after generalizing the sequence, we can conclude that the arithmetic sequence can be
represented by the recurrence relations
an = an-1 + 1
where n ≥ 0 (non-negative integer). Using this equation, and given the initial condition a0, you
can write down the rest of the terms by slowly adding all the way up (just imagine if I asked you
to find the term a109!). So now you know that a recurrence relation is just an equation which has
an and at least another term an-x. Examples of recurrence relations are
an = 6an-2
an = 5an+4 - 2an+3 + n
We say that a recurrence relation is homogeneous when it only contains the terms an-x. For
example, an = 6an-2 is homogeneous, while an = 5an-1 – 2an-2 + 3 is not, as 3 is not an an-x. term.
We say that a recurrence relation is linear when the maximum power of the an-x terms is 1. For
example, an = 6an-2 is linear, but an = 6(an-2)2 is not, as its maximum power is 2.
The order / degree of a recurrence relation tells us the maximum amount of terms away is the
term an related from itself. For example, an = 6an-1 is a first order recurrence relation, while an =
6an-1 + an-3 is a third order recurrence relation. Any recurrence relation with the k-th order
requires k amount of initial conditions to be solved. For example, we see that the equation an =
8an-1 + 9an-2 needs 2 initial conditions, a0 and a1 to be defined.
In STPM, you will only be dealing with linear and 2nd order recurrence relations, for both
homogeneous and non-homogeneous.
Now that you know what a recurrence relation is, I will guide you with some basic modelling.
You need to learn how to use recurrence relations in a given situation, or question. Let me start
with 2 very famous examples, the Fibonacci Numbers and the Tower of Hanoi.
RABBITS, AND THE FIBONACCI NUMBERS
Leonardo Pisano, also known as Fibonacci, came up with this problem in the 13th century.
Suppose a young pair of rabbits (one male and one female) is placed on an island. A pair of
rabbits does not breed until they are 2 months old. After they are 2 months old, each pair of
rabbits produces another pair each month. He wanted to find a recurrence relation for the number
of pairs of rabbits on the island after n months, assuming that no rabbits ever die.
Let’s try counting. In the beginning, there were only 2 rabbits. Then in the first and month, there
are still 2 rabbits on the island, because they are still not old enough to breed. But in the second
month, the pair of rabbits started to breed, and they produce another 2 rabbits on the island,
making it 4 rabbits. In the third month, there will be 6, because the old rabbits reproduce, but not
the young rabbits. Counting by pairs, we found out that the rabbits grow according to a sequence
of 1, 1, 2, 3, 5, 8, 13, … and so on. Take a look at the bunny diagram below.
Now, here is the hard part. To solve this problem, you know that there are 2 initial conditions, a0
and a1, which are both 1 (a0 is the starting, which I will call it as month 0, and a1 is for the first
month). As we step into month 2, the amount of pair of rabbits will be the number of pairs of
rabbits in the previous month (month 1) plus a new line of rabbit which it reproduced (which has
the condition of the rabbits in month 0). The progress goes on and every time we reach a new
month, we will add up the number of pairs of rabbits in the previous month with the number of
pairs of rabbits in the month before the previous month. So in the end, we come up with the
famous Fibonacci Sequence, which is represented by the recurrence relation
fn = fn-1 + fn-2
I bet you got lost somewhere, but this is the best explanation I could come up with. You can try
reading the textbooks, and you might not even understand it at all. We see that the Fibonacci
sequence is a 2nd order homogeneous linear recurrence relation. This chapter really needs you to
think a lot.
Do you know that Fibonacci numbers also exist in sunflower patterns, pinecones, and spiral
seashells? Get to know more about Fibonacci Numbers in Nature.
THE TOWER OF HANOI
Have you played this game before?
You are given a chunk of disks of different sizes on the left. Your objective is to transfer all the
disks from the left pole to the right pole, only moving one disk at one time, and not stacking a
bigger disk onto a smaller disk. At every move, only one disk can be in your hand, and the disk
could only be placed in any of the 3 poles. Watch this video to see how others do it:
Take 5 textbooks of different sizes to represent the disks, and play this game with your classmates in school. I did that last
time…
Interesting? A myth created to accompany the puzzle tells of a tower in Hanoi where monks are
transferring 64 gold disks from one peg to another, according to the rules of the puzzle. The
myth says that the world will end when they finish the puzzle. Detail calculations show that if
they move one disk per second, it will take them more than 500 billion years to complete!
Anyway, enough of fun stuff. Our goal here is to find a recurrence relation for the minimum
amount of moves required to move n pegs from the left to the right.
Let’s start from scratch. If there was one disk, you only need one move to solve the problem. If
there were 2 disks, you need to take the top disk to the middle peg, transfer the bottom disk to the
3rd peg, and transfer the top disk back on top of the bottom disk on peg 3. So if we have n disks,
we can see that we need to move n-1 disks to the middle peg, move the bottom disk to the right,
and then move the n-1 disks to the last peg, on top of the bottom disk. The bottom disk only
requires one move, but you need 2 moves to transfer the n-1 disks, which is once to the middle
peg, then twice to the 3rd peg. So here, we can deduce that the recurrence relation can be
represented by
Hn = 2Hn-1 + 1
where Hn represents the minimum number of moves required to transfer n pegs from the left to
the right pole. The initial condition, H0 is 1 move.
I suppose you are terribly confused by now. These are only 2 examples! The hard part of this
chapter is to model recurrence relations. The solving part (will be dealt in section 4.2 & 4.3) are
actually much easier. Spend more time thinking and try to figure out some of my examples
below.
1. A pond with a0 amount of fish will double every month. So for n months, the number of fish
can be represented by the relation an = 2an-1.
2. In the first month, you date 1 girl, the second month 2 girls, and the nth month you dated n
girls. So the recurrence relation an = n + an-1 will be the total amount of girls you have dated in
the first n months. How nasty of you…
3. You have a loan of RM a0 from Along Bukit Beruntung. You now pay RM100 every month to
the him, who charges you a rate of 10% increment every month. So the balance you owe the loan
shark on the nth month can be represented by the relation an = (1 + 0.1)an-1 – 100.
4. The cash deposit machine in CIMB bank only accepts RM1 coins (if they exist), RM1 notes
and RM5 notes. If the order of the deposition matters, the number of ways you deposit RM n into
the machine can be represented by the relation an = 2an-1 + an-5. [5th order recurrence relation!]
5. If you can climb up a flight of stairs by taking either one step or two steps at one time, the
recurrence relation for the number of ways to climb n stairs can be represented by the equation
an = an-1 + an-2.
6. You are laying tiles on a walkway in a single line. You can only lay either red, green or blue
tiles, in which no 2 red tiles are adjacent to each other, and the tiles of the same color are
considered indistinguishable. The recurrence relation for the number of ways to lay out a
walkway with n tiles is an = 2an-1 + 2an-2. [Go think about it. This is hard…]
4.2 Homogeneous Linear Recurrence Relations (2nd order, constant coefficients)
Recall that you learnt in the previous section how to model a situation using recurrence relations.
The equations are helpful, however, it doesn’t really help much if you are searching for a huge
term. For example, the relation an = 2an-1, given the initial condition a0 = 1, finding the term a109
will be tiring, as it will take you forever to get there. When we say that we solve a recurrence
relation, it means that we are trying to convert the relation into an equation in terms of n instead
of an, which obviously, would be easier for you to calculate the nth term.
In this section, I’ll be showing you how to solve 2nd order homogeneous linear recurrence
relations. The non-homogeneous part follows from here in the next section.
2 DISTINCT ROOTS
Given a recurrence relation an = 5an-1 – 6an-2, with initial conditions a0 = 1, a1 = 0. To start off
with, we let an = rn. This is a smart guess which we will find eventually that it is correct. We can
then further deduce that an-1 = rn-1, and an-2 = rn-2. Substituting everything back into the equation,
we have
rn = 5rn-1 – 6rn-2
dividing the equation by rn-2 (which is the smallest power), we get
r2 = 5r – 6
r2 – 5r + 6 = 0
which is a quadratic equation! This equation is called the characteristic equation, and r is
called the characteristic root. Solving the equation, we get r = 2, 3. Again, using a smart guess,
we deduce that the term an can be represented by the equation
an = c12n + c23n
So you noticed that the 2n and 3n must have came from the characteristic roots earlier on. This is
the general solution of the recurrence relation. The terms c1 and c2 are just 2 constants, which
we will find by using the initial conditions.
When a0 = 1,
a0 = c1 + c2 = 1
(1)
When a1 = 0,
a1 = 2c1 + 3c2 = 0
2c1 = –3c2
(2)
Now you have 2 simultaneous equations. Using the calculator, you can easily find that c1 = 3, c2
= –2. Substituting the constants back into the equation, you get
an = 3(2n) –2(3n)
which is what we called as the particular solution. This is the final answer that we are looking
for. Now that you substitute n = 109, you can get the answer straight away for an! Now that you
find the answer, try finding the first 5 or 6 terms, using both the recurrence relation an = 5an-1 –
6an-2 and the equation an = 3(2n) –2(3n). Do they contradict one another? Congratulations, you
just learnt how to solve homogeneous recurrence relations!
2 EQUAL ROOTS
However, the above method is only true for 2 distinct roots in the characteristic equation. Take
another example, an = –4an-1 – 4an-2, a0 = 0, a1 = 1. You get a characteristic equation r2 + 4r + 4
= 0, r = –2. If you take the general solution as an = c1(-2)n, then you are totally wrong. The
correct answer should be an = c1(-2)n + nc2(-2)n. Notice the extra multiplied n in the second term.
To summarize:
1. If the characteristic roots r1 and r2 are distinct, represent them as an = c1r1n + c2r2n.
2. If the characteristic roots r are equal, represent them as an = c1rn + nc2r2n.
Distinct roots could be either real or complex. The method for both is the same.
4.3 Non-homogeneous Linear Recurrence Relations (2nd order, constant coefficients)
Consider the following non-homogeneous linear recurrence relation:
an = { an-1 + an-2 } + { 3n + n3n + n2 + n + 3 }
(1)
(2)
Part (1) is the homogeneous part of the recurrence relation, which we now call it as the
associated linear homogeneous recurrence relation. Part (2) is of our interest in this section, it
is the non-homogeneous part. Solving this kind of questions are simple, you just need to solve
the associated recurrence relation (just like how you did in the previous section), then solve the
non-homogeneous part to find its particular solution. These two sections are solved separately,
which we will combine the results together in the end.
Example 1 (terms of the form kn):
an = 3an-1 + 2n
We first proceed to solve the associated linear recurrence relation (a.l.r.r.), which is
an = 3an-1
The characteristic equation gives us r = 3, and therefore
an = c1(3n)
Now that the associated part is solved, we proceed to solve the non-homogeneous part. Using a
smart guess, we let
an = c22n
From here, we then deduce that an-1 = c22n-1. Putting these 2 equations back to the initial
recurrence relation an = 3an-1 + 2n, we have
c22n= 3c22n-1 + 2n
(c2 – 1)2n= 3c22n-1
2(c2 – 1)= 3c2
2(c2 – 1)= 3c2
And so we have c2 = –2, which then gives us an = –2(2n) = –2n+1. Combining both the answers
for the associated and non-homogeneous part, we have our general solution
an = c1(3n) – 2n+1
If we were given the initial condition a0 = 2, then our particular solution will be
an = 4(3n) – 2n+1
This is the the general rule that we follow: For any amount of terms with the form kn, we shall
let an be kn multiplied by a constant. So if the non-homogeneous part is an = 5n + 78n, then we let
the answer be an = c15n + c278n, in which c1 and c2 are constants to be found. The same goes to
the form nkn, in which you let an = c1nkn. However, there is an exception, when the root r is of
the same form as kn. For example,
an = 2an-1 + 2n
You get r = 2, which you will get an a.l.r.r. of an = c12n, which has the same form with the nonhomogeneous part! In this case, you need to multiply your non-homogeneous part with n. Which
means, you let
an = nc12n and an-1 = (n – 1)c12n-1
And using the same method, you put it back to the initial equation,
nc12n = 2(n – 1)c12n-1 + 2n
and you find c1 from here.
Similarly, if
an = 2an-1 + 3(2n) + 5n(2n)
you let your non-homogeneous part be
an = c1n2n + c2n2(2n)
and if
an = –4an-2 – 4an-2 + 3(2n) + 5n(2n)
which has a double root r = 2, then you will have a non-homogeneous part of
an = c1n2(2n) + c2n3(2n)
as long when the kn or nkn term is already found in the a.l.r.r. once, then multiply n to all the
terms, and multiply n2 if it is found twice. If you are curious why it is so, you could actually try
without following this rule. You find that you can’t get the correct answer.
Example 2 (polynomial terms, n2 + n + c or etc)
an = 3an-1 + n2 + 5n + 3
It is the same for the a.l.r.r., an = c1(3n). But for the non-homogeneous part, we let
an = c2n2 + c3n + c4
(1)
an-1 = c2(n – 1)2 + c3(n – 1) + c4 (2)
I think you might have got the pattern by now. Note that if the equation was
an = 3an-1 + n2 + 3
an = 3an-1 + n2 + 5n or
an = 3an-1 + n2
we still need to use the above, an = c2n2 + c3n + c4. This is because we need to account for the
possibly missing terms which might arise in the particular solution.
So, just like example 1, substitute back both the equations (1) and (2) into the initial recurrence
relations, then find c2 to c4, and combine with the a.l.r.r. to find c1 with the given initial
condition, say a0 = 1. However, there is also an exception for this case, which is when one or
two of the characteristic roots r = 1. For example,
an = 2an-1 – an-2 + n2 + 5n + 3
You obtain a double root r = 1 for your a.l.r.r.. Since 1n = 1, then your a.l.r.r. will be of the form
an = c1 + nc2
which will clash with you equation for your non-homogeneous part if you use the same equation
like the above, an = c2n2 + c3n + c4. Instead, you should use an = c2n4 +n3 + c4n2, which is
multiplied with n2 to it. Similarly, if it were a first order recurrence relation with one root r = 1,
then you multiply n, and if it were a third order recurrence relation with a triple root
r = 1, then you multiply n3 (notice the similarity with example 1). Again, you can try doing
without following the rules, which will result in you not getting the required answer.
5. Functions
5.1 Inverse Trigonometric Functions (graphs, identities)
This chapter will be of less words, but more formulas. What you need to do in this chapter is:
1. memorize the useful graphs, identities and formulas.
2. spend your time trying to derive all the identities.
With this 2 points done, you are sure to score for this chapter. STPM questions will be about
proving them, sketching graphs, or differentiating and integrating them (which will be covered in
the next chapter).
You have learnt about trigonometric functions throughout your secondary school years. Now,
we let sin y = x. An inverse trigonometric function inverses the trigonometric function, and is
denoted as y = sin-1 x.
Note that there is a difference between sin-1 x and (sin x)-1. This is only one of the 6 inverse
trigonometric functions, the rest of them are cos-1 x, tan-1 x, sec-1 x, csc-1 x, and cot-1 x.
Following are the graphs of the 6 inverse trigonometric functions:
The domain and the range of the functions are as follows:
Now that you the details about these 3 inverse trigonometric functions, it’ll be formulas and
identities. Try to remember as many as you can. In fact, make sure you know how to derive
every single one of them.
Prove the first one by letting x = cos y, the rest follows.
Inverse-Forward Identities
Forward-Inverse Identities
Proving this one is not hard too. Make x = cos y, and make use of the identity cos2 x + sin2 x = 1.
The rest follows too. Just that probably the tan(cos-1 x) one will be harder. Give it a try.
Inverse Sum Identities
Prove the first one by letting x = cos (π/2 – y) = sin y. Try figuring out the rest yourself.
sin-1 (-x) = –sin-1 x
csc-1 (-x) = –csc-1 x
cos-1 (-x) = π – cos-1 x
sec-1 (-x) = π – sec-1 x
tan-1 (-x) = –tan-1 x
cot-1 (-x) = –cot-1 x
This one is proven by letting sin y = x, and sin –y = –x. The rest follows.
I don’t think this one will come out in exams. However, the proof requires you to learn the
inverse hyperbolic in the next section first.
I’ll leave this proof to you to try.
This is one is the hardest to prove. Try proving using the formula
You probably don’t even know that this formula exist.
5.2 Hyperbolic Functions (graphs, identities, Osborn’s rule)
The hyperbolic functions, of which there are six, are so named because they are related to the
parametric equations for a hyperbola.
The 2 main hyperbolic functions are sinh x and cosh x (and so now you know what the ‘hyp’
button on your calculator is for). The hyperbolic functions are actually functions of the natural
exponents ex through the following equations:
We now relate the hyperbolic functions with the hyperbola. The equation for the hyperbola is
We let
x = a cosh u
y = b sinh u
We find that cosh2 u – sinh2 u = 1, which is true (This can be proven by substituting the ex into
the equation). Now that we have 2 hyperbolic functions, we use it to further derive a few other
functions following a similar convention which the trigonometric function uses:
All these 6 hyperbolic functions have their special pronunciation. sinh is read as ‘shine’, cosh as
‘cosh’, tanh as ‘than’, sech as ‘sheck’, csch as ‘co-sheck’ and coth as ‘cough’.
Now we shall see the graphs of the 6 hyperbolic functions. Note that they are all derived from
the exponential function:
cosh x
sinh x
tanh x
sech x
csch x
coth x
Their domain and ranges are as follows:
Now that you know the basic information of these functions, it’s time to memorize formulas. But
before you start, I need to introduce a special rule which makes the memorizing easier.
The Osborne’s Rule states that to change a standard ordinary trigonometric identities into the
equivalent standard hyperbolic identity, change the sign of the term which is the product of two
sines, and substitute the corresponding hyperbolic functions. This means that if you remember all
the trigonometric identities, you can remember the hyperbolic identities. Please note that all the
trigonometric formulas which have the periodic characteristics (for example, the R formula and
the phase shifts) do not apply to hyperbolic functions, as they are not periodic.
For each case, you should be able to derive them. Proving them is simple, just plug in the ex
relation into it and you are sure to get it.
The formulas and identities are as follows:
Double-Angle Formula
Besides all these formulas, you should also know the relations between hyperbolic functions and
trigonometric functions. Use the following to derive those for tanh x, sech x, csch x and coth x
too. Bear in mind that i × i = –1.
5.3 Inverse Hyperbolic Functions (graphs, identities, logarithmic form)
nverse Hyperbolic Functions are obtained in the same way as the Inverse Trigonometric
Functions. I think I don’t need to explain much, I’ll straight away show you the graphs:
cosh-1 x
sinh-1 x
tanh-1 x
sech-1 x
csch-1 x
coth-1 x
Note that due to the definition of functions, we only take the positive y values of the functions
cosh-1 x and sech-1 x. The domain and ranges are as follows:
There are not much formulas and identities for this section. But there is one very important thing
that you are suppose to learn how to prove, which is the logarithmic form of inverse hyperbolic
functions.
I’ll show you the proof for sinh-1 x:
Please promise me that you will learn how to prove the rest, this is super important.
Here are some identities to remember. Note that they are quite similar to the inverse
trigonometric ones:
For all the above identities, please try to prove all of them. Refer to the section inverse
trigonometric functions for some hints on the proofs.
BAB 6
6. Differentiation & Integration
6.1 Differentiability of a Function (continuity)
In Maths T, you already learnt how to prove whether a function is continuous. Now you need to
know the relationship between continuity and differentiability.
A differentiable function has to be continuous, but it doesn’t mean that a continuous function
is differentiable. Using logical propositions, it means that if f(x) differentiable, then it is
continuous, but not conversely. Normally, the non-differentiability occurs in graphs with
1. a corner
2. a vertical tangent line
3. a discontinuity
4. at end points
For piece-wise defined functions, it is easy to see whether a function is differentiable at the
joints. If the joints have different gradients for the different sub-functions, then it is definitely not
differentiable. However, there should be a formal definition for differentiability. For a number a
in the domain of the function f, we say that f is differentiable at a , or that the derivatives of f
exists at a if
or
exists.
You can go on to prove that both formulas are actually the same thing. Of course,
differentiability does not restrict to only points. We could also say that a function is
differentiable on an interval (a, b) or differentiable everywhere, (-∞, +∞). I’ll give you one
example:
Prove that f(x) = |x| is not differentiable at x=0.
So, f(x) = |x| is not differentiable at x = 0. [proven]
6.2 Derivatives of a Function Defined Implicitly or Parametrically (2nd derivatives)
You probably have learnt how to differentiate and integrate functions implicitly and
parametrically, but only up to the first order. Here, we will be learning how to continue on to the
2nd order. It is actually very easy and straight-forward, so there is nothing too important in this
section.
IMPLICITLY
I think I don’t need to tell you how to do it. differentiating a function implicitly for 2nd order is
just the same as 1st order. I’ll show you an example:
Find the 2nd order derivative of the function x2 + y2 = 2.
Note the use of the product rule in this question. Just do more exercises, then you will get used
to these kind of questions.
PARAMETRICALLY
Probably there’s something new in this section. Again, I’ll show you an example:
Consider the parametric equations x = t + 1 and y = t3.
Differentiating each other with respect to t gives
To find d2y/dx2,
But we cannot differentiate 3t2 with respect to x. Therefore, using chain rule,
To summarize it up, finding the 2nd order derivative for parametric equations x and y is by the
equation:
6.3 Derivatives & Integrals of Trigonometric & Inverse Trigonometric Functions
The derivatives and integrals of trigonometric functions are covered in Maths T. So in this
section, I’ll only teach you how to differentiate inverse trigonometric functions. A warning
here is that you must study the chapter Integration (especially the part on integration by parts)
in Maths T before you come to this section, if not you will get really confused.
To find the derivative of sin-1 x, we need to make use of our knowledge on differentiating a
function implicitly. We let x = sin y. Differentiating the function implicitly, we have
So as a result, we get
From here, you can further deduce that the derivations of the derivatives of inverse trigonometric
functions should follow the same rule, i.e., differentiating the functions implicitly, then making
use of their trigonometric identities. The list of derivatives of all the inverse trigonometric
functions are as follows:
where a is a constant. You should try to prove each and every one of them as an exercise.
You should further try to differentiate these functions with complicated variables using all the
differentiation rules you learnt. For example,
while
Take note that once you differentiate an inverse trigonometric function, it becomes a fraction of
polynomials. Do not worry about the anti-derivatives of these inverse polynomial functions now,
as I will give you a summary table in the section on Reduction Formulae.
However, I want to discuss on the anti-derivative of the inverse trigonometric function itself. For
example, I want to find
To do this, you need to make use of integration by parts. If you followed the formula in the
Maths T formula sheet, it would be
However, I suggest that you use this formula which makes you remember easier:
Before I continue, let me explain this formula. Normally, you only use integration by parts when
you are trying to integrate a product of 2 functions, which are most likely logarithmic,
exponential, polynomial and trigonometric functions. So in any case, you let one function be
u, and the other function be v. Notice that v has to be a function that is easy to integrate, while u
has to be the other one which is hard to integrate / easy to differentiate. In words, this formula
can be read as
“Integration of u × v = [ u × integrate v ] – integration of (differentiate u × integrate v)”
Never mind if you don’t get it, as long you have your own version of I by P. So continuing on
integrating sin-1 x, we let u = sin-1 x, and v = 1. We have
Get it? So the important tips to this question is to put v = 1 (you might recall that this is the
method you use to integrate ln x). So the rest of the functions, after integration gives
Try to derive all of them as an exercise. Note that the term ln [x + √( x2 – 1)] is actually a cosh-1
x function.
6.4 Derivatives & Integrals of Hyperbolic & Inverse Hyperbolic Functions
The derivatives and integrals of hyperbolic functions and inverse hyperbolic functions are
very similar to those of trigonometric and inverse trigonometric functions, just with a
difference of a negative sign somewhere within the formulas. There is no rule that we can tell
where the minus sign has changed, so this section requires a lot of memory work.
HYPERBOLIC FUNCTIONS
The derivatives of hyperbolic functions can be derived easily by converting the functions into
their exponential form. I’ll leave it for you as an exercise to derive all of them. The list of
derivatives are as follows:
As you can see, the derivative of sinh x is cosh x, and vice versa, which is different from
trigonometric ones by a minus sign. The functions whose derivatives have minus signs are the
secondary hyperbolic functions, csch x, sech x and coth x.
The integrals, again, are very similar to trigonometric integration.
The integrals for sech x and csch x may look a little weird. You should try to differentiate the
right hand side and see whether you get the expression on the left. Again, you should do some
homework to derive all of them.
INVERSE HYPERBOLIC
Again, the inverse hyperbolic functions have similar derivatives to what the trigonometric
functions have, and it is just a matter of a minus sign, with or within the square roots. Deriving is
similar: derive them implicitly and make use of the hyperbolic identities (do not confuse with the
trigonometric ones. Remember Osborne’s rule). Here you go
The integrals, as usual, are harder to do. You need to use integration by parts, as I said in the
previous section. Try doing them as how you did for the previous section. As a matter of fact, the
huge ‘ln’ terms in the integrals of csch-1 x and sech-1 x are just logarithmic forms of cosh-1 x and
sinh-1 x.
6.5 Reduction Formulae
SUMMARY OF PREVIOUS SECTION
Before I start, let me just give you some results of combining all of the derivatives and integrals
of trigonometric, inverse trigonometric, hyperbolic and inverse hyperbolic. This will give
you a clearer picture of what you have learnt for the past 2 sections:
1. The Integrals of the Inverse Polynomials
Here I reorganize the tables of integrals for your reference:
As you can see, there is a pattern that you can easily memorize. It’s either of the form a2–x2, x2–
a2 or a2+x2, whether with the square root or not. You also see that they are all quadratic
expressions, in which you could use the method of completing the squares to solve similar
cases. For example,
Also, make sure that the coefficient of x is always 1. Another example,
Notice that if you didn’t, you would have got a different answer.
2. Trigonometric & Hyperbolic Substitution
Examples of integration like
can’t be solved by normal ways. You might have learnt one trigonometric substitution to solve
this kind of questions in Maths T. But now that you have learnt hyperbolic functions, your
vocabulary of substitutions increases to 3 of them. Whenever you face the integrals of this kind,
you will:
3. Some extra tips on integration
These are just some short notes that I jotted down while I was studying for this chapter few years
ago. I thought I might wanna share with you all:
a.
This kind of integration makes use of the half angle formula. This applies to hyperbolics as well.
b.
From here, you do integration by parts, with t2 as u and the term in the bracket as v.
c.
Notice that it must be e2x. Here you use the substitution ex = sinh x. Similarly, if the term in the
square root was e2x – 1 or e2x + 1, you substitute ex as cosh x or sin x respectively. Try and see
whether it works.
d.
You might want to try proving this before you use it. This will be useful for the next section.
e.
I actually learnt this in University. You should remember this by memory, it might come useful.
Alright, let’s get into the topic:
REDUCTION FORMULAE
A reduction formula is an expression of a definite integral in terms of n, relating the integral to
a similar form of itself. For example,
which can be represented as
Notice that firstly, it is a definite integral, which means that it has upper and lower limits. Then,
it relates to itself, with a decrease of power or so. These formulae can be very helpful, especially
when you calculate high powers of these functions. So if you want to find
You can use the reduction formula to get
which is easily solvable.
Solving is easy, but the harder part is the proof. It can be very very complicated and tedious if
you are doing this for the first time. It is not easy to straight away identify how to integrate (as in
who is the ‘u’ and who is the ‘v’ if you’re using integration by parts), and sometimes, you take
hours to solve just a simple question. I’ll show you the proof for the above example so you’ll
know what I mean. Using my famous colour coded integration by parts formula,
we have
handing over the sinn x term from the right to the left, we get
Complicated? Unfortunately, most exam questions on Reduction Formulae are all on proving
them. Since you need A LOT of exercises (seriously, I bold it because this is no joke), I’ll give
you some examples for you to prove.
Not enough? There’s more:
and more…
Not hard enough? Try 2 variables then:
Hope you haven’t start to freak out yet. I seriously haven’t tried proving all these Reduction
Formulae, so if you have done so, I salute you. I can give you some tips here though:
1. Break down cosn x = cos x cosn-1 x and tann x = tan2 x tann-2 x.
2. Try checking out the expressions on the right. When there’s a n – 1, you know that the term
with the power of n needs to be differentiated once, and n – 2, will be differentiate twice. m + 1
means that term will be integrated.
3. For those which are related to polynomials and roots, you will find the formula d. above very
useful.
6.6 Applications of Integration (length of arc, surface area of revolution)
You probably have learned how to find the area enclosed between the function f(x) and the axes,
or between 2 functions. You have also learned the volume of revolution for a function f(x) with
the x or y-axis as the axis of rotation. In this section, you’ll be learning 2 new applications,
which are the arc length and the surface area of revolution.
ARC LENGTH
Consider 2 points, P and Q, on a curve. P is the point (x, y) and Q is the point (x + δx, y + δy).
Let s be the length of the arc from a point on the y-axis, and δs the length of the arc PQ. Since δs
is very small, we can approximate the arc PQ to a straight line. Hence, using Pythagoras’
theorem, we have
(δs)2 = (δy)2 + (δx)2
Dividing by (δx)2, we obtain
As δx → 0, this gives
and after square-rooting both sides, we end up with
The parametric form of s can be obtained by dividing the equation (δs)2 = (δy)2 + (δx)2 with
(δt)2. While the polar form is probably not in your syllabus, so don’t worry too much. To find the
arc length of a particular function, just differentiate it with respect to x, then substitute it in the
formula above.
SURFACE AREA OF REVOLUTION
Let A be the area of the surface formed by rotating the curve y = f(x), between the lines x = a
and x = b, about the x-axis. Let the curved surface area of a blue ring shown be δA. Treating the
strip as being bounded by 2 cylinders, we have
2πy δs ≤ δA ≤ 2π(y + δy) δs
As δx → 0, δs → 0, so we have
which gives us the formula
Again, differentiate the function, and substitute it into the formula to find the surface area of
revolution.
BAB 7
7. Power Series
7.1 Taylor Polynomial (remainder theorem)
A power series is an expression of a function as a sum of infinite polynomials. Every
differentiable function f(x) can somehow be approximated by a series of polynomials, such that
f(x) = a + b(x-x0) + c(x-x0)2 + d(x-x0)3 + e(x-x0)4 + … + f(x-x0)n
When x is close to x0, and where a → f are constants. If you remembered the Binomial
Expansion for real numbers, the function (1+x)r can be represented by the series
Compare the Binomial Series above with the formula for f(x). You see that it is just a special
case of the above function, such that x0 is zero, and the constants are defined in a special relation.
Our question is this: Since we could represent the above bracketed polynomial function as an
infinite series of polynomials, so is it possible that we represent other functions, like sin x, ln x,
ex or anything else? If it is doable, how do we determine the constants a, b, c and so on as in the
function f(x) above?
Before we get into our topic Taylor polynomials, let me introduce to you Taylor’s Theorem
with Remainder. The theorem states that if a certain function f(x) is (n+1)-times differentiable,
then
Let me explain this a little. The term a is used when we measure the f(x) close to it. For example,
when a = 0, we substitute it into the series, and the new expression will be definitely quite
accurate for estimating values x which are close to a (of course, for certain functions, the value x
is accurate for whatever value a. We’ll discuss this in the later section). This means, we vary a to
approximate the different values of the same function.
Then, the term f’(a), f’’(a) are the 1st and 2nd derivatives of the function f(x). Note that the term
f(n)(x), the ‘n’ has a bracket, to tell us that it is not the ‘nth power of f’, but the nth derivative of
f. The entire series is what we called as Taylor series. All those terms between the equal sign
and the Rn are called as the Taylor polynomial, and sometimes we denote this whole chunk of
polynomial as pn(x). Writing the whole equation in another form, we have
Now, the term Rn(x) is what we call as the remainder term. Since the Taylor series is an infinite
series, we won’t possibly write down all the terms of the series. So sometimes we just set our
limits, for example, we want the series corrected till the 6th order. So in this case, we see that
Rn(x) is the difference between f(x) and the sum of its first 6 polynomials.
The remainder term, could also be written as
I’ll try to give you an illustration to make you understand how this Taylor Series thingy work.
By the way, we are not required to prove the formula for Taylor series. For an example, take the
function
Using Taylor’s Theorem, we find the Taylor series expanded at x = 0 (which means, a = 0) for
this function. By the way, there is a special name for the Taylor series expanded at x = 0, which
is named Maclaurin Series. We find f’(x), f’’(x) and so on, substituting them into the formula,
we get
f(x) = x + x2 + x3 + x4 + …
Notice that this function could be expanded by binomial expansion, which is faster. Now look at
the graph below.
Notice that the blue line sketches the exact graph of the function f(x). As I said earlier, the
Taylor series is only an estimation. This means that, the more Taylor polynomial terms we keep,
the more accurate the Taylor series estimates the function f(x). Look and see that the graph of
degree 1, and degree 2 are actually quite far off from representing f(x), but is quite accurate for
values of x near 0. As the degree of polynomial increases, the graph of the Taylor series will
eventually be the same as the actual function f(x).
So now, we want to learn how to find the series for some functions that we know of. Let’s try ex.
Since there can be an infinite amount of Taylor series expanded at any a, we shall focus on
deriving the Maclaurin series of functions.
Recalling the formula,
We find that ex will still be itself after infinite derivatives, and e0 = 1. So plugging in what we
have to, we get the Maclaurin series
Try finding the Maclaurin expansion for other functions, ln (1 - x), sinh x, and any other
functions you can think of. Note that not all Maclaurin series of functions could have such
beautiful series. Some might end up with non-ordered coefficients.
Below is a list of common Maclaurin expansions:
I want you to note a few things:
1. There is no Maclaurin expansion for ln x, because ln 0 is not defined.
2. Notice that the Maclaurin expansion similarities for trigonometric and hyperbolic functions.
Here you are able to proof the hyperbolic-trigonometric identities, which relates both the
functions.
3. Some expansions are either odd or even. In other cases, there might be missing a power as
well, so it is normal for a function not to have all the powers of x.
REMAINDER ESTIMATION THEOREM
If a function f(x) can be differentiated n + 1 times on an interval I containing a & if M
is an upper bound for fn+1(x) on I, i.e., | f(n+1)(x) | ≤ M,
then
Ignore the alien language first. Continuing from the previous part, the remainder of the series is
actually quite significant. When you use a Taylor series to estimate something, you are interested
in knowing the error you estimate, or the difference between your estimate and the actual value.
If you remembered from the previous section, the remainder is given by the formula
The formula gives the exact error when f(x) is approximated by the nth Taylor sum. The problem
is that it is too difficult to evaluate it this way, so we are going to find an overestimate of the
remainder instead. We look at the magnitude of the (n + 1)th derivative of f(t) as t varies
between a and x, and overestimate that by a single number M (known as upper bound, as stated
above). So here, we are saying that the remainder is definitely smaller or equal to the upper
bound, and thus the formula above,
This information is important, as we will use it to
1. Estimate the error between the function and the series
2. Approximate a function to n decimal places
I understand that this might be hard for you to catch, so I will give you 2 examples here.
EXAMPLE 1 (ESTIMATE ERROR)
Find the Taylor series of the function ln x expanded at x = 1, to get a cubic approximation,
and estimate the error for ln 2.
Have I taught you how to find a Taylor series for a function?
We first list the function in terms of what we are looking for. In this case, since it is expanded at
x = 1, so the terms are powers of (x – 1). It will be in terms of x or (x + 5) if it is expanded at x =
0 and x = –5 respectively, so
ln x = a + b(x-1) + c(x-1)2 + d(x-1)3
Now, we need to find the constants a, b, c and d. You can find all of them by substituting x=1,
and by differentiating the left and right side of the function. Which means,
which gives you
and then
To go on, we need to use the formula above. To find M, we need to first find f(n+1)(x), which is –
6x-4. Remember the part above which says | f(n+1)(x) | ≤ M, we find that the maximum value of –
6x-4 is 6 if we use values 1 ≤ x ≤ 2 (interval I containing a), so we have
Thus, ln 2 = 5/6 within ± 1/4.
EXAMPLE 2 (Approximating decimal places)
Use an nth Maclaurin polynomial for ex to approximate e to 5 decimal places accuracy. Find
n.
(Note that if you are finding f(n+1)(x) = cosn x or sinn x, then M ≤ 1 instead. Useful information.)
Now, the different thing here compared to the previous example is that we don’t know n, so we
can’t substitute n for any value (in fact, we are looking for n!). But we do have another piece of
information, which is, to 5 decimal places. We take that decimal place, give it a ± 50%, and now
the we know that the remainder must be smaller than 0.000005. So we have
By trial and error, we find that n = 9, then the equation holds. Therefore,
To summarize things up, this is what your checklist when you are dealing with such related
questions:
STEP 1: Write down the series f(x), f(c) (the function substituted with the value you want),
STEP 2: Find the interval [a, x] (a is what the series is expanded at, and c is within this limit)..
STEP 3: Find M (the upper bound), which is f(n+1)(x) ≤ [something]
STEP 4: Write down the equation |Rn (x)| ≤ [the equation above]
STEP 5: If required, write down [the equation above] ≤ [amount of decimal points ± 50%]
STEP 6: Continue on the estimation.
7.2 Taylor Series (Maclaurin series, limits)
Generally, Taylor series has a lot of uses. We can use it to do one of the following:
A. DERIVE A GIVEN FUNCTION
You were given a list of Maclaurin series in the last section. Now I show them to you again
below:
These are not all though. You can still find and derive the Taylor or Maclaurin series of other
functions like sin-1 x, coth-1 x or lg x2. The method is the same, by listing down the Taylor or
Maclaurin series of the functions. For example,
sin-1 x = a + bx + cx2 + dx3 + ex4 + …
and you substitute x = 0 to get a. To get b, you differentiate once and substitute x = 0, and c,
differentiate twice, and etc. The coefficients a, b, c and so on might not have a certain order like
the functions listed above, but at least you have a reasonable polynomial to estimate the function
in the absence of a calculator.
Besides, you could also combine more than 2 functions to find a new Taylor series for them. For
example, (1 + x)2 cos x can be derived from
Adding and subtracting of functions (like sin x + cos x) or even substitution of variables (like e8x
or sin x2) can be easily derived too.
B. DIFFERENTIATE AND INTEGRATE THE SERIES TO GET OTHER RESULTS
Did you notice that the laws of calculus also obeys the rules of power series? Taking cos x for an
example, differentiating both sides, gives
This is a very useful information. You can speed up the calculations if you were asked to derive
the series of a function which relates to on of the known functions above. By the way, if you
were able to find the listing of the polynomials, you would want to learn how to find the
summation notation of the derived series as well. Read through your Maths T Sequence &
Series, and try to make use of the knowledge you learn there.
C. FINDING LIMITS OF FUNCTIONS
When you are asked to find the limit of a complicated function as x → 0, you can actually make
use of the Maclaurin series of the function. For example,
To help you, you might want to learn L’Hôpital’s rule as well. This rule comes really handy in
this situation, it states that if f(a) = 0, g(a) = 0, and g’(a) ≠ 0, then
Use this rule when you get a 0/0 results. Remember that this rule only holds if the f(a) = 0 thingy
is true.
D. SOLVING DIFFERENTIAL EQUATIONS NUMERICALLY
I believe you already know what are differential equations, just that you only know how to
solve a little of them. So here, we are trying to estimate and represent a set of differential
equations as a Taylor series, and thus try to estimate the function for values x close to a, when
expanded at x = a. I’ll show you an example:
Find the Taylor’s series solution for y up to and including terms in x4 for the differential
equation
Hence, find y correct to 9 d.p. when x = 0.01.
BAB 8
8. Differential Equations
In Maths T, you learnt how to solve 2 types of differential equations, namely the separable
variable and the homogeneous differential equations. In FMT, you will learn how to solve
linear differential equations.
A differential equation is linear if it is of the form
where a is a function of x. It can be solved by introducing an Integrating Factor, e ∫ a dx. This
term is multiplied to the left and right of the equation, then we will get
integrating both sides, we get
Which is an expression of y in terms of x. This method is very simple, let me give you an
example:
Find the general solution of the differential equation
We start by expressing it in the form
Which is
Now that we know the a, we can find the integrating factor,
Note that the integration in the integrating factor doesn’t need a constant, because it will
eventually cancel out later. So multiplying it both sides,
8.1 1st Order Linear Differential Equations (integrating factor)
In Maths T, you learnt how to solve 2 types of differential equations, namely the separable
variable and the homogeneous differential equations. In FMT, you will learn how to solve
linear differential equations.
A differential equation is linear if it is of the form
where a is a function of x. It can be solved by introducing an Integrating Factor, e ∫ a dx. This
term is multiplied to the left and right of the equation, then we will get
integrating both sides, we get
Which is an expression of y in terms of x. This method is very simple, let me give you an
example:
Find the general solution of the differential equation
We start by expressing it in the form
Which is
Now that we know the a, we can find the integrating factor,
Note that the integration in the integrating factor doesn’t need a constant, because it will
eventually cancel out later. So multiplying it both sides,
8.2 2nd Order Linear Differential Equations (complementary function, particular integral,
general & particular solution, problem models)
In this section, we will be learning how to solve second order linear differential equations,
both homogeneous and non-homogeneous.
HOMOGENEOUS CASE
A second order homogeneous linear differential equation has the form
where a, b and c are constants. We first give a smart guess (ansatz) that the solution has the form
y = Aenx, where A is a constant, and n is an integer. Differentiating it yields
and once we substitute all equations into the differential equation, and eliminating Aenx, we get a
quadratic equation of the form
which we call as the auxiliary equation. From here we can see that y = Aenx is indeed a solution
for the 2nd order differential equation, provided that the value of n satisfies this equation. Once
we find the values of n, we can thus write down the general solution of the differential equation.
However, the equation will give you 3 outcomes, which is either it has 2 distinct roots, 2 equal
roots or 2 complex roots.
Case 1: 2 Distinct Roots
In this case, suppose the auxiliary equation gives you 2 roots n1 and n2. your answer for y will be
in the form of
Remember that your initial guessed solution for the differential equation was y = Aenx? Notice
that if y = Aenx and y = Bemx both are solutions of the the differential equation, then the sum of
both the solutions, y = Aenx + Bemx is also a solution for the differential solution. That is why,
our solution for y is the sum of both solutions. You may want to prove it. Given the differential
equation
You find the auxiliary equation to have the values n = –1, –2 respectively. Do try substituting y
= Ae-x, y = Ae-2x and y = Ae-x + Be-2x into the equation. All of them are consistent, aren’t they?
Case 2: 2 Equal Roots
Suppose your auxiliary equation gives you only one value of n. Your answer will be in the form
of
When there is a repeated root, you multiply it by x. Try recalling the connection of this chapter
with what you learnt in the chapter Recurrence Relations.
Case 3: Complex Roots
Suppose you get 2 complex roots, m + in and m – in. Your answer will then be in the form of
Notice the second line of the equation. Remember the fact that
e(m+in)x = emx(cos nx + i sin nx), and you get y = emx[ (A + B)cos nx + i(A – B)sin nx ], in which
you represent the terms (A + B) and i(A – B) as C and D respectively. You will be surprised that
D is actually a real constant, so somewhere on the way, A and B must have been complex.
As I said, these are the forms of general solutions that you can get. To get a particular solution,
you need to have an initial condition, something like when y = 1, x = 0 or so. The particular
solution eliminates the constants ABCD, and gives them in terms of real numbers instead.
NON-HOMOGENEOUS CASE
A second order non-homogeneous linear differential equation has the form
Again, a, b and c are constants, and f(x) is a function of x, which is either a polynomial, a
constant, an exponential function, a cosine or sine function, or a combination of any 2.
Functions like tan x, sinh x or ln x will be out of your syllabus, in which the solving of these
kinds of differential equations will require the Method of Variation of Parameters. Try google
for it if you want to know more.
The solving method is easy. First you separate the differential equation into 2 parts. You let the
first part = 0,
and this is solved just as above, by finding the auxiliary equation and then representing the
answer in the form of y = g(x) = Aenx + Bemx. This solution is called as the complementary
function (CF). The other part f(x) will have the solution y = h(x), which is called as the
particular integral (PI). Remember that the sum of solutions is also a solution, so our final
answer will be
y = g(x) + h(x)
Since you already know what to do with the CF, we will introduce methods to solve the PI
below, which depends on what h(x) is.
Case 1: h(x) is a Polynomial Function
You should just substitute the PI as a polynomial function. For example,
You already know the CF from above, which is y = Ae-x + Be-2x. Then to find the PI, you let
y = Ax2 + Bx + C, according to the degree of the polynomial. Differentiating, you get
Substituting it back, we get 2A + 3(2Ax + B) + 2(Ax2 + Bx + C) = x2 + 4x –3. Solving for ABC,
you get A = 1/2, B = 1/2, C = –11/2. So in the end, our PI is
and the general solution, being the sum of the CF and the PI will be
Try not to get confused with the constants of the CF and the PI, in which here, I have 2 A’s and 2
B’s. I would suggest you that you should name the constants for the PI as C, D and E instead.
This rule applies for any polynomial of degree n. However, there is an exception, when your
auxiliary equation has a root n = 0. Since Ae0 = A, you already have a constant term in the CF.
So for your PI, you need to multiply your solution with an extra x. So if your
f(x) is 4x + 3, your PI should be Bx2 + Cx instead of Bx + C. Similarly, you can guess that if the
CF has a double root n = 0, you will then multiply your PI with x2. Try relating this information
with the chapter on Recurrence Relations.
Case 2: h(x) is an Exponential Function
This is easy. If f(x) = 5e2x, our PI will be just y = Ce2x. Just differentiate y to get dy/dx and
d2y/dx2, substitute it into the equation, and find A. Again in this case, there are exceptions. If
your CF already has a term Ae2x, then like the above, you multiply x in front of the PI to give
you y = Cxe2x. If your CF is y = Ae2x + Bxe2x, then your PI will be y = Cx2e2x, multiplying x2
this time. Not hard I think. If you are given
Your CF is the same, y = Ae-x + Be-2x. Your PI will be y = Cex + Dxe-2x, and you should further
solve the equation yourself.
Case 3: h(x) is a Cosine or Sine Function
If f(x) = 5sin 2x, or f(x) = 4cos 2x, or f(x) = 6sin 2x + 7cos 2x, your PI will be the same, which
is y = Ccos 2x + Dsin 2x. Notice that whether you have only sines or only cosines, you still have
to come up with both cosines and sines for your PI. The reason is simple, if you only come up
with one of them, your solution is not solvable. Again, there is an exception, which is when your
auxiliary equation might have totally imaginary roots, which happens to give your CF a sine or
cosine function of the same form. As usual, just multiply an x in front of your PI. For example,
You get an auxiliary equation of n = ±4i, CF of y = A cos 4x + B sin 4x. So, your PI should be in
the form of y = Cxcos 4x + Dxsin 4x. Differentiate it (might be complicated), substitute it, find
constants C and D, and give the general solution by adding the PI and CF. Should be straight
forward.
Combinations of functions, like f(x) = x cos 3x, f(x) = xe4x, f(x) = e4xsin 3x shouldn’t be hard for
you to solve. The basic rule is if your CF already has a solution with the same form as f(x), then
just multiply x to that term. If it doesn’t work, multiply x2 then.
SUBSTITUTION
If you could recall what you learned in Maths T, you have already learned how to use the
substitutions v = ax + by and y = vx to transform a complicated-looking differential equation
into one that is solvable. You can apply those skills in 2nd order differential equations too. Other
kinds of substitution include x = u0.5, u = xy, but I want your attention on solving differential
equations of the form
You need to use the substitution
From here, find dy/dx and d2y/dx2 by using the chain rule.
Which in the end, gives you a differential equation of the form
which is solvable.
PROBLEM MODELLING
Seriously, I have looked through many books, but none of them really teach us about modelling
for 2nd order differential equations. You should be familiar with modelling of 1st order
differential equations though. So here, I have no choice but to introduce to you some university
level stuff.
1. LRC Circuits
The potential differences of an inductor, a resistor and a capacitor are denoted by
So this means that the total voltage across the 3 elements put in series is equals to
I assume you know that L, R, C, and Q means inductance, resistance, capacitance and charge
respectively. Here we see that the voltage V is a function of time, which makes it a nonhomogeneous 2nd order linear differential equation. Solving the differential equation means
finding an equation which relates the charge to time.
2. Oscillators
Remember in physics that a simple harmonic oscillator has the equation of
mẍ + kx = 0
where m is the mass, and k is the spring constant. Notice that this is a 2nd order differential
equation! Solving this makes you find x in terms of t. A damped oscillator has an extra term in
it,
mẍ + bẋ + kx = 0
where b is the drag constant. A forced oscillator, in turn would be
mẍ + kx = F(t)
where the force F is a function of time, probably a sine or cosine function. You could have
guessed it, that a forced damped oscillator would be
mẍ + bẋ + kx = F(t)
With these information, you are able to model a second order differential equation once you
know all the factors m, b, k and F.
There are a whole lot more of physics equations which requires differential equations, like the
famous Schrödinger’s Equation and other higher level stuff, which requires higher level
physics. I better stop here before I turn this into a physics lecture instead.
9. Number Theory
9.1 Divisibility (prime & composite numbers, unique factorisation, gcd & lcm, Euclid’s
algorithm)
Number Theory is considered one of the hardest sections in Mathematics. It is the study of the
very fundamentals of numbers, yet can be very complicated. Information on this chapter for such
a level of study is very rare, so I hope you will appreciate everything that I have for you over
here.
We have been learning division since standard 2. But today, we will look at it at a different
manner. If a and b are integers with a ≠ 0, then we say that a divides b if there is an integer c
such that b = ac. When a divides b we say that a is a factor of b and that b is a multiple of a.
The notation a | b denotes that a divides b (which means, there is no remainder). We write a ł b
when a doesn’t divide b. For example, 2 | 4, but 4 ł 2. Take note that the notation 2 | 4 and 2/4 are
2 different things. The former is the notation for divisibility, while the latter is simply a fraction.
There are certain rules of divisibility that you should know. These are:
1. If a | b, b | c, then a | c.
You should know how to prove this. As above, the term a | b can be written as ak = b, bl = c,
and therefore akl = bl = c. Here, k and l are integers.
2. If a | b, a | c, then a | (b + c) and a | (mb + nc).
3. If a | b, then a | bc.
The above 2 can also be proven with the similar notation as 1.
Not every 2 numbers can divide each other. For example, 2 does not divide 7, as it leaves a
remainder of 1. Here we represent the above in an equation, which is
7 = 2•3 + 1
Here, 3 is the quotient, we denote the quotient as a div b, which in this case, 2 div 7 = 3. 1 is the
remainder, which we denote as a mod b, and here we have 2 mod 7 = 1. Note that a remainder
has to be positive. For example, –7 = 2• –3 – 1 is wrong, because it then gives us 2 div –7 = –3
and 2 mod –7 = –1, a negative remainder. It should be –7 = 2• –4 + 1, which in turns give 2 div
–7 = –4 and 2 mod –7 = 1. Try doing –2 mod 7 and –2 div 7, and see whether the answers are
different.
A prime number is a number that is only divisible by 1 and by the number itself. A number
which is not prime, is called as a composite number. The smallest prime number is 2, and it
goes on as 3, 5, 7, 11, 13, 17, 19… and so on. The interesting thing about prime numbers is that,
you are unable to write a formula to determine the sequence or series of prime numbers. So
therefore, if we want to find a very huge prime number, we need to slowly divide the number by
almost every possible number before we say that it is prime. One very famous example used in
the past is the sieve of Eratosthenes, which is used to find all the primes below 100. It is done
by first listing down all the numbers from 1 to 100. Then, slowly cross out the multiples of 2, 3,
4 and so on, until you have nothing to cross out. The rest of the numbers, are primes! Another
one is The Prime Number Theorem. You might wanna google about it.
So how do you know whether a number is prime, for a relatively small number? There is a way
to find out, at least a little faster than trying to divide the number by any number smaller than
itself. It is found that if a number is not divisible by primes less than its square root, then it is a
prime number. This can be proven. If we have a composite number n such that ab = n, then if a
> √n and b > √n, then we have ab > √n • √n > n, which is a contradiction. Although it does
speed up the process of finding primes, it is still quite a slow method.
Prime numbers are the building blocks of all numbers. the Fundamental Theorem of
Arithmetic states that:
Every positive integer > 1 can be written uniquely as a prime or as the product of 2 or
more primes where the prime factors are written in order of non-decreasing size.
This is what we called as prime factorisation. For example, 4 = 22, 100 = 2252, 641 = 641 and
so on. We can write down any number in terms of products of primes, a = 2x3y5z7w… and so on.
There’s a lot to talk about prime numbers. One famous argument was to prove that there are
infinitely many primes. Suppose you label every prime number as p1, p2, p3 and so on. You
found the greatest prime number in the world, called as pn. So if we write a particular number a
such that a = p1p2p3…pn + 1, it must have been a prime, since it couldn’t be represented as the
product of any primes smaller than pn. This contradicts with what we said earlier on about
finding the greatest prime number, and therefore proves that there are indeed infinitely many
primes.
Another 2 interesting stuff on prime numbers are the Goldbach’s Conjecture and the Twin
Prime Conjecture. Go look up on it if you are free.
Now, let’s move on to the gcd and lcm. Try recalling whether this sounds familiar to your Form
1 Mathematics. gcd is the greatest common divisor (you are probably more familiar to the
name highest common factor, or HCF), while lcm is the lowest common multiple. Here we
denote k = gcd (a, b) to have the meaning of “k is the greatest common divisor of the integers a
and b”. Similarly, k = lcm (a, b) means “k is the lowest common multiple of the integers a and
b”. For example, gcd (4, 6) = 2 and lcm (5, 6) = 30.
Relating this back to prime numbers, for any 2 integers a and b, if gcd (a, b) = 1, we say that
they are relatively prime. For example, 5 and 6 are relatively prime.
Do you still remember the method to find your lcm and gcd in Form 1? You had to draw out
something like a ladder or so. But here, we will use another method, which has something to do
with the prime factorization. For example,
Find gcd (120, 500) and lcm (120, 500).
We first start by representing the numbers 120 and 500 in terms of primes.
120 = 23 • 3 • 5
500 = 22 • 53
Now, the formulas to find the gcd and lcm are easy, it is just
gcd (a, b) = p1min(a1,b1)p2min(a2,b2)p3min(a3,b3)…pnmin(an,bn)
lcm (a, b) = p1max(a1,b1)p2max(a2,b2)p3max(a3,b3)…pnmax(an,bn)
You first compare the primes present among the 2 numbers 120 and 500. p1max(a1,b1) means the
maximum of the powers of that particular prime p1 of the 2 numbers a and b, while p1min(a,b)
means the minimum. So plugging in the numbers, we have
gcd (120, 500) = 2min(3,2) • 3min(1,0) • 5min(1,3) = 223051 = 20
lcm (120, 500) = 2max(3,2) • 3max(1,0) • 5max(1,3) = 233153 = 3000
From here, we obtain a new formula, as we can see that
ab = gcd (a,b) • lcm (a,b)
The method described for computing the greatest common divisor of 2 integers, using the prime
factorizations of these integers, is inefficient. The reason is that it is time consuming to find
prime factorizations. Now I will teach you a more efficient method of finding the gcd, called the
Euclidian Algorithm (also Euclid’s Algorithm). It is named after the ancient Greek
mathematician Euclid, who included a description of this algorithm in his book The Elements.
Let’s start with an example.
Find gcd (91, 287).
First, we use the smaller term to divide the bigger term. Then, we take the divisor of and the
remainder of the equation, repeat the process, until we get no more remainder. The last
remainder is the gcd that we are finding. So we have
287 = 91 • 3 + 14
91 = 14 • 6 + 7
14 = 7 • 2
∴ gcd (91, 287) = 7
You might be puzzled as in how did this method work. Basically, this method is formulated from
the results
if a = bq + r, then gcd (a, b) = gcd (b, r)
From
a = bq + r
I know that if some integer k divides a, it must divide b and r as well. Now I turn the equation
around
a – bq = r
If some integer divides both a and b, then it must divide r. So here, the biggest integer that can
divide a, b and r must be the same integer, which is gcd (a, b), and also gcd (b, r). So therefore,
the Euclidean Algorithm is valid.
9.2 Modular Arithmetic (linear congruences, Chinese Remainder Theorem)
You’ll terribly ‘love’ this section.
Consider how you read your time on the clock. Every time the short hand goes one round, it will
be 12 hours. So when the shorthand goes past another hour, it will be 13 hours, and the time
might be 13 o’ clock. We know, however that 13’ o clock is actually 1 o’ clock. Same to 25 o’
clock, it still means the same thing. We say that the clock follows a modular system.
Modular Arithmetic, is the calculations of numbers in a modular system. In the clock’s system,
it is of modulo 12. When two numbers a and b are congruent to each other in the same modulo,
we denote it by
a ≡ b (mod m)
This equation is read as ‘a is congruent to b modulo m’. For example, 13 ≡ 1 (mod 12), this
means that 13 is the same as 1 in a modulo 12 system. Note that the main equation is the part on
the left hand side, 13 ≡ 1, while the right hand side, (mod 12), tells you that this equation is valid
only in modulo 12. This modulo system also has another explanation for it. a ≡ b (mod m)
means that a and b give the same remainder when divided by m. Notice that 13 divided by 12
gives remainder 1, while 1 divided by 12 also gives the remainder 1. Or using the mod
terminology, we say that
a mod m = b mod m
Take note that a ≡ b (mod m) and a = b mod m both bring different meanings. The latter says
that ‘a is the remainder when b is divided by m’.
Now, bringing divisibility in, we say that
a ≡ b (mod m) if and only if m | (a – b)
Can you see that m divides a and b? And if that is the case, a and b actually have a difference of
a multiple of m. So this means that, 49 ≡ 37 ≡ 25 ≡ 13 ≡ 1 (mod 12). You just add 12 to the
number, you get another number which is congruent modulo 12.
If I convert this notation a ≡ b (mod m) into algebra, it can be written as a = b + km, where k is
a constant (try verifying this with the divisibility notation above). So to summarize things up:
When a ≡ b (mod m), then
a mod m = b mod m
m | (a – b)
a = b + km
Before we go into solving linear congruences, we need to know some basic rules of modular
arithmetic. These rules below can be proven by yourself, and so try doing it.
If a ≡ b (mod m) and c ≡ d (mod m), then
1. a + c ≡ b + d (mod m)
2. a – c ≡ b – d (mod m)
When in the same modulo m, the addition and subtraction rules work as usual. This will be
useful when you are solving simultaneous modular arithmetic equations. This can be proven by
using its algebraic form, a = b + km, c = d + lm.
3. ac ≡ bd (mod m)
This is also important, and uses the same method above to prove.
4. ak ≡ bk (mod m)
Where k is a constant, a positive integer. I’ll proof this one here for you:
When a – b = km, then ak – bk = (a – b)(ak-1 + ak-2b + ak-3b2 + … + abk-2 + bk-1), which is a
multiple of (a – b). Therefore, ak – bk = lm, where l is a constant, and therefore
ak – bk ≡ 0 (mod m)
ak ≡ bk (mod m)
5. ak ≡ bk (mod m)
The congruence holds even when a constant is multiplied to both sides of the equation. Same
proof as 1, 2 and 3.
Next, try proving both the equations below (make use of the information that
a ≡ (a mod m) (mod m):
6. (a + b) mod m ≡ [(a mod m) + (b mod m)] (mod m)
7. ab mod m ≡ [(a mod m)(b mod m)] (mod m)
8. The Simplification Law
If c | a, c | b, c | m, and a ≡ b (mod m), then
To summarize this rule, it means that a constant c can only be divided out from a, b and m if it
divides all of them. Provable too.
Here’s another one not to be confused with the former, the cancellation law. If gcd (c,m) = 1,
then
9. ac ≡ bc (mod m) ⇒ a ≡ b (mod m)
You can prove this too. Suppose ac – bc = (a – b)c = km. Since gcd (c, m) = 1, c and m have no
common divisors, and therefore c | k. Since c divides this constant k, c can be cancelled out, and
thus a – b = nm for some integer n. Here we see that a ≡ b (mod m), which was to be shown.
FINDING THE INVERSE
b, the multiplicative inverse of a number a is such that ab = 1. Here, we can find that b is
actually the reciprocal of the number a. Here in modular arithmetic, we are going to look for an
inverse of a, such that
ab ≡ 1 (mod m)
Let us recall the Euclidean Algorithm. We learnt that we could find gcd (a, m) by dividing the
bigger number with the smaller number, and continue to divide the smaller number with its
remainder, and so on until there is no remainder. Indeed, we could make use of this information
to find the gcd in terms of a linear combination of these 2 integers, such that
gcd (a, m) = m • n + a • b
where n and b are integers. If gcd (a, m) = 1, then an inverse of a exist, and the integer b
happens to be the inverse of a. We will see why this is true in the following example:
Find gcd (123, 2347) and write it as a linear combination of these integers, and further find
the inverse of 123 modulo 2347.
2347 = 123 • 19 + 10
123 = 10 • 12 + 3
10 = 3 • 3 + 1
3=1•3
∴ gcd (123, 2347) = 1
Now, to get the linear combination thingy, we have to reverse all of the above equations. Let me
rewrite them again:
10 = 2347 – 123 • 19 (1)
3 = 123 – 10 • 12
1 = 10 – 3 • 3
(2)
(3)
Now we will do some back substitution. We want an equation of gcd (123, 2347) (which is 1) to
be in terms of 123 and 2347. We start with equation (3), and substitute equation (2), we have
1 = 10 – 3 • (123 – 10 • 12)
= 10 – 3 • 123 + 10 • 36
= 10 • 37 – 3 • 123
Repeating the process with equation (1),
1 = (2347 – 123 • 19) • 37 – 3 • 123
1 = 2347 • 37 – 123 • 706
We have now shown the gcd (123, 2347) in terms of a linear combination of its numbers. This is
what we called as the extended Euclidean Algorithm. Here, we find that the inverse of 123
modulo 2347 is –706. We see that
-706 • 123 ≡ –86838 ≡ 1 (mod 2347)
Note that every integer congruent to –706 modulo 2347 is also the inverse of 123, which we find
it best to represent the inverse of 123 as 1641, a positive integer less than 2347.
I haven’t tell you why this works. Since gcd (a, m) = 1, and we know that it can be represented
as a linear combination 1 = m • n + a • b, we can show that
m • n + a • b ≡ 1 (mod m)
You should understand this equation. If 1 = 3 – 2, then 1 ≡ 3 – 2 for whatever modulo, and that
make sense. Here, since m • n ≡ 0 (mod m), as this is obvious, since m divides itself completely,
in whatever given n. So in the end, we have a • b ≡ 1 (mod m), which was what we used just
now. Note that not all integers have inverses in a particular modulo. It is only in the case where
gcd (a, m) such that there will be an inverse.
By the way, the inverse could also slowly be found by trial and error for small moduli. For
example, 2 mod 3. Try multiplying the numbers between 1 to 3 to the number 2, and you find
that 2 • 2 ≡ 4 ≡ 1 (mod 3). And thus, 2 is the inverse of 2 modulo 3.
SOLVING LINEAR CONGRUENCES
A linear congruence equation has the form
ax ≡ b (mod m)
In which we want to find x. If you can relate this to the section above, it has a solution only if
gcd (a, m) = 1. This can be solved by finding the inverse of a. Let’s try an example:
Solve the linear congruence 3x ≡ 4 (mod 7).
We have checked that gcd (3, 7) = 1, and so an inverse of 3 exist, and thus the solution exists.
Using the extended Euclidean Algorithm, we get the inverse of 3 as –2. So multiplying –2 to
both sides,
-2 • 3x ≡ –2 • 4 (mod 7)
We know that –2 • 3 ≡ 1 (mod 7), and therefore
x ≡ –8 ≡ 6 (mod 7)
substituting 6 back into x, you get the answer correct. Besides, substituting any integer which is
congruent to 6 modulo 7, like 13, 20, –8, –1 and etc are also solutions of the linear congruence.
In cases where gcd (a, m) ≠ 1, there are solutions too, only if gcd (a, m) | b, and there are gcd (a,
m) solutions. For example,
2x ≡ 6 (mod 8) has gcd (2, 8) = 2 solutions, but 2x ≡ 5 (mod 8) has no solution. Let’s try to
solve the linear congruence 2x ≡ 6 (mod 8). You can solve it as follows:
Using the simplification law, you see that 2 divides 2, 6 and 8 and therefore
x ≡ 3 (mod 4)
which is in another modulo system. If you want the solution to be in the same modulo system,
then you need to do some modification. By looking at the equation, you know that
x ≡ 3 (mod 8)
is one solution. The other solution is by adding 3 to the new modulo system you get above,
which is 4. You get another solution,
x ≡ 7 (mod 8)
So your solution for the linear congruence 2x ≡ 6 (mod 8) is x ≡ 3 (mod 8), x ≡ 7 (mod 8).
This same method applies: When there are 10 solutions, you keep on adding the new modulo
system integer value to the existing answer, until you get 10 solutions.
Let’s try another one, 2x ≡ 6 (mod 9). Using rule number 9 above, you can quickly see that gcd
(2, 9) = 1, and therefore x ≡ 3 (mod 9). Try not to confuse this one with the one above.
SIMULTANEOUS LINEAR CONGRUENCES
Similar to the one above, now you have 2 congruences with 2 unknowns, under the same
modulo. Let’s consider a system of linear congruences with 2 unknowns:
ax + by ≡ k (mod m)
cx + dy ≡ n (mod m)
We first write this in matrix form:
For this system of congruences to have a solution, there must be an inverse for the matrix. This
means, that ad – bc must not be zero, and must exist. Let’s multiply the left and right hand side
with its adjoint matrix:
Now we get 2 linear congruences,
(ad – bc) x ≡ (dk – bn) (mod m)
(ad – bc) y ≡ (an – ck) (mod m)
and for such linear congruences to have solution, again we must make sure that the equation gcd
(ad – bc, m) = 1 holds. With that you can solve the above 2 linear congruences for x and y. This
kind of question came out in STPM 2009, my year. Try solving it with the method I just showed
you.
QUADRATIC RESIDUE MODULO M
a quadratic residue modulo m has the form
x2 ≡ q (mod m)
You are supposed to solve the equation in terms of x. I don’t know of any short cut to solve such
a problem, but one way is to list out all the possible values, draw a table, and find the answer.
Example,
Solve the quadratic residue modulo x2 ≡ 2 (mod 7).
We proceed to draw a table:
Therefore, we conclude that x ≡ 3 (mod 7) and x ≡ 4 (mod 7).
If you have noticed, we could actually solve linear congruences with the above trial & error
method too.
If m is very big but divisible, we could break the modulo system up. For example,
x2 ≡ 14 (mod 35)
We can make it into 2 equations, namely x2 ≡ 14 ≡ 0 (mod 7) and x2 ≡ 14 ≡ 4 (mod 5).
Tabulating the table,
x2 ≡ 0 (mod 7) has the solution x ≡ 0 (mod 7).
x2 ≡ 4 (mod 5) has solutions x ≡ 2 (mod 5) and x ≡ 3 (mod 5)
x ≡ 0 (mod 7) means that
x ≡ 0, 7, 14, 21, 28 (mod 35) are solutions of modulo 35.
x ≡ 2 (mod 5) means that
x ≡ 2, 7, 12, 17, 22, 27, 32 (mod 35) are solutions of modulo 35.
x ≡ 3 (mod 5) means that
x ≡ 8, 13, 18, 23, 28, 33 (mod 35) are solutions of modulo 35.
We find the intersections of x ≡ 0 (mod 7) and x ≡ 2 (mod 5), we get
x ≡ 7 (mod 35)
And we find the intersections of x ≡ 0 (mod 7) and x ≡ 3 (mod 35), we get
x ≡ 28 (mod 35)
And our final answer solution is
x ≡ 7, 28 (mod 35)
MODULAR EXPONENTIATION
I don’t think this is in the syllabus, but it is good for you to know. Modular exponentiations are
of the form an mod m. You are normally asked to compute it with a very big value of n. For
example,
Find 3101 mod 100.
First, do you still remember what are binary numbers? Express the term n in binary form, by
keep on dividing the number with 2, writing the remainder by the side. Recall your Form 4
Maths:
So we get 101 = (1100101)2 = 26 + 25 + 22 + 20 = 64 + 32 + 4 + 1
Substituting it back to the congruence,
364+32+4+1 mod 100 = 3643323431 mod 100
Now, we need to tabulate the amounts congruent to 364 332 34 and 31.
32 ≡ 9
34 ≡ 92 ≡ 81
38 ≡ 812 ≡ 61
316 ≡ 612 ≡ 21
332 ≡ 212 ≡ 41
364 ≡ 412 ≡ 81
Now you know all the values, substitute them back into the equation,
3643323431 ≡ 81 • 41 • 81 • 3 ≡ 3 (mod 100)
Spend some time understanding my calculations. If not, just pray that it won’t come out in
exams.
CHINESE REMAINDER THEOREM
In the 1st century, the Chinese Mathematician Sun-Tsu asked:
There are certain things whose number is unknown. When divided by 3, the remainder
is 2; when divided by 5, the remainder is 3; and when divided by 7, the remainder is 2.
What will be the number of things?
This puzzle can be translated into the following question: What are the solutions of the systems
of congruences
x ≡ 2 (mod 3)
x ≡ 3 (mod 5)
x ≡ 2 (mod 7) ?
The Chinese Remainder Theorem, named after the Chinese heritage of problems involving
systems of linear congruences, states that when the moduli of a system of linear congruences are
pairwise relatively prime, there is a unique solution of the system modulo the product of the
moduli.
I will omit the proof, because I don’t understand it either. Here are the steps to solve this kind of
problems:
Firstly, for a system of linear congruences with different moduli,
x ≡ a (mod m)
x ≡ b (mod n)
x ≡ c (mod o)
We construct a number M being the product of the moduli,
M=m•n•o
Then, we construct a number Mm, Mn and Mo such that they are the product of the all the moduli
in the system other than itself. Which means,
Then, find the inverse of Mm, Mn and Mo respectively:
̅ m ≡ 1 (mod m)
MmM
̅ n ≡ 1 (mod n)
MnM
̅ o ≡ 1 (mod o)
MoM
And finally, your answer will be:
Let’s try to solve Sun Tsu’s problem.
x ≡ 2 (mod 3)
x ≡ 3 (mod 5)
x ≡ 2 (mod 7)
M = 105,
̅ 3 is 2.
M3 = 35 ≡ 2 (mod 3), inverse M
̅ 5 is 1.
M5 = 21 ≡ 1 (mod 5), inverse M
̅ 7 is 1.
M7 = 15 ≡ 1 (mod 7), inverse M
∴ x ≡ (2 • 2 • 35) + (1 • 3 • 21) + (1 • 2 • 15) ≡ 233 ≡ 23 (mod 105)
Note that you cannot let M3 = 2, M5 = 1 and M7 = 1, as you will get a total different answer.
However, the inverses can be any other number congruent to itself in its particular modulo.
FERMAT’S LITTLE THEOREM
If p is a prime number and a is an integer not divisible by p, then
ap-1 ≡ 1 (mod p)
ap ≡ a (mod p)
This theorem is here for you to identify if a congruence can be solved easily. Similarly, I won’t
prove it, so just keep this theorem in mind and use it if needed.
10. Graph Theory
10.1 Graphs (simple, complete, bipartite)
In mathematics and computer science, graph theory is the study of graphs, mathematical
structures used to model pairwise relations between objects from a certain collection. A graph,
G = (V, E) consists of V, a nonempty set of vertices / nodes and E, a set of edges. In other
words, a graph is a discrete structure consisting of vertices, and edges that connect these
vertices. Each edge has either one or two vertices associated with it (endpoints). An edge is said
to connect its endpoints. A graph looks something like this:
As you can see, a and b are vertices, while e and f are edges. the edge g is called a loop. The
vertex set V = {a, b}.
In this section, there will be many terminologies which you should remember, and should be able
to write down their definition in your exam. Here we will be learning the different kinds of
graphs and their names:
An infinite graph is a graph with infinite vertex set (or rather, an infinite number of vertices).
The definition of a finite graph is just the converse. Throughout this section, we will only be
learning about graphs with finite amount of edges and vertices.
A simple graph is a graph in which each edge connects two different vertices and where no
two edges connect the same pair of vertices. A multigraph is a graph that has multiple edges
connected to the same vertices, while a pseudograph is a graph that may include loops,
multiple edges connecting the same pair of vertices. The 3 pictures below illustrate a simple
graph, a multigraph and a pseudograph:
Notice for the multigraph, there are 2 edges connecting both a to b and a to c, while 3 edges
connecting e to f. As for the pseudograph, there exist loops at the vertices e and f.
The complement of the graph, G̅, has the same amount of vertices as graph G but whenever
there is a edge between vertices a and b, there won’t be an edge, and whenever there isn’t an
edge between vertices a and b, an edge is added to it. This only applies to simple graphs. For
example, below is the graph and its complement:
All the above graphs are undirected, that means that one can traverse an edge in both directions.
A directed graph (or digraph), consists of a nonempty set of vertices and a set of directed
edges (or arcs). Each directed edge is associated with an ordered pair of vertices. The directed
edge associated with the ordered pair V = (u, v) is said to start at u and end at v. In other words,
we say that u is adjacent to v, while v is adjacent from u. Notice thee different uses of { } and (
) brackets for undirected and directed graphs. Below is a directed graph:
For the ordered pair of vertices (u, v), we say that u and v are adjacent, and we say that the edge
is incident / connects u and v. u is known as the initial vertex and v being the terminal vertex.
Using the similar naming convention, we can describe a simple directed graph as a directed
graph in which each edge connects two different vertices and where no two edges connect the
same pair of vertices. Then similarly, a directed multigraph can be defined.
An underlying undirected graph is the undirected graph that results from ignoring directions of
edges. It is just the same graph without the arrows. A mixed graph, is a graph with both directed
and undirected edges. A converse of a directed graph, is the graph in which its arrows are
reversed.
For every graph, we could come up with subgraphs, which are graphs that are subsets of the
initial graph. For example, the graph
can be broken down into 11 subgraphs below:
An exercise for you here is that you can try to figure out whether you can determine the total
amount of subgraphs, given the values of V and E.
A bipartite graph is a simple graph such that its vertex set V can be partitioned into 2 disjoint
sets V1 and V2 such that every edge in the graph connects a vertex in V1 and V2. Consider the
bipartite graph below:
Notice that I coloured the vertices with 2 colours, red and blue. The blue vertices will not
connect to any other blue vertex, and the red vertices too, they don’t connect to any other red
vertex. The graph is partitioned such that there are two sets or parties of vertices which can be
grouped together. To identify a bipartite graph is simple: As long as you can colour adjacent
vertices with only 2 colours, then it is a bipartite graph. For example, you colour the first vertex
blue. The vertices adjacent to the first vertex must be coloured red, and if you can fit all the
vertices with 2 colours such that no two adjacent vertices have the same colour, then it is a
bipartite graph. Notice also, that a graph is bipartite if and only if it has no odd cycles. We will
learn about cycles in the next session.
Now there are a 5 types of special simple graphs I want to introduce:
1. Complete Graph Kn
This graph is a simple graph that contains exactly 1 edge between each pair of distinct vertices.
In other words, this graph has the maximum amount of edges it can have, and adding any edge
between any 2 vertices will turn it into a multigraph. The graphs look as follows:
For the K4 graph, it has 4 vertices, and every vertex is connected to the other 3 vertices. By
simple calculations, a Kn graph has n vertices, and n(n-2)/2 edges.
2. Cycle Graph Cn
This graph, where n ≥ 3, consists of n vertices and edges.
Strictly speaking, C2 is not a Cycle graph, as n < 3. Notice that every vertex is only connected to
two other vertices. It looks like a regular polygon with n sides.
3. Wheel Graph Wn
This graph looks like a wheel with n sides. We obtain the wheel when we add an additional
vertex to the cycle Cn, for n ≥ 3, and connect this new vertex to each of the n vertices in Cn, by
new edges.
A Cn graph has n + 1 vertices and n 2n edges.
4. n-Dimensional Hypercube, Qn
This graph, also know as n-cube, is the graph whose vertices represent the 2n bit strings of length
n. Two vertices are adjacent if and only if the bit strings that they represent differ in exactly one
bit position. I don’t think this graph is in the syllabus, but I think it will be good for you to know:
This graph has 2n vertices and 2n-1 edges. Try proving this if you are free.
5. Complete Bipartite Graph, Km,n
This graph is just a bipartite graph, in which there is only 1 edge between each pair of distinct
vertices across V1 and V2. Note that the number of edges, | E(m, n) | = mn, and there are m + n
vertices.
Now that we know everything about the structure of graphs, we shall now get into the a little
calculations. The degree of vertex is the number of edges incident with it, except that a loop at a
vertex contributes 2 times to the degree of that vertex. The degree of a vertex is denoted by deg
(v). When deg (0), we say that the vertex is isolated, and when deg (1), then we say that the
vertex is pendant.
We now want to find the relationship between the sum of degrees of vertices & number of edges.
The Handshaking Theorem states that the sum of degree of vertices is double the amount of
edges. In equation form, we have
This theorem has many implications. One of them is that we know that a graph cannot exist if the
sum of degree of vertex is odd.
In the case for directed graphs, we denote deg+ (v) as the out-degree, meaning the amount of
arcs pointing away from the vertex, while the in-degree is denoted by deg- (v), which is the
amount of arcs pointing towards the vertex. Modifying the handshaking theorem, we have
10.2 Paths & Cycles (walk, trail, circuit, cycle, Eulerian, Hamiltonian)
We have already learnt the different types of graphs. Now we are going to learn the properties of
these graphs. This section, again, will be full of terminologies to be remembered. Some have
quite similar meanings, so take note not to confuse them.
1. A walk is an alternating sequence of vertices and edges of a graph. A closed walk is
defined to be a walk which starts and ends with the same vertex.
2. A path is a sequence of edges that begins at a vertex of a graph and travels from vertex
to vertex along the edges of the graph. A simple path is then a path which doesn’t
contain the same edge more than once. The length of a path is the amount of edges that
the path contains.
3. A trail is a walk that has no repeated edges. In some cases when the word trail is used, a
path could mean a walk that has no repeated vertices. Take note of the double meaning of
the word path.
4. A circuit is a path that begins and ends at the same vertex in an undirected graph. A
simple circuit is then a circuit with not repeated edges.
5. A cycle is a path that begins and ends at the same vertex in a directed graph.
6. The degree sequence is the listing down of all the vertices not by its name, but by its
degree of vertex. It is listed in non-increasing (which means, decreasing lah…) order.
To illustrate these 5 terms, I will show you an example below:
* You can construct many walks in this graph. An example of a walk here is (v1, e1, v2, e6, v3, e7,
v4). Remember that a walk starts and ends with a vertex.
* An example of a path is e1, e6, e7 or e2, e5, e7, which both have length 3.
* A trail can be just as the walk above, (v1, e1, v2, e6, v3, e7, v4), as long as there are no repeated
edges.
* A circuit here, can be something like e1, e6, e5, e2 or e3, e6,e2. It will be called as a cycle if it
were a directed graph. Note that you can only follow the directions of the arrow for a cycle.
* The degree sequence for v1, v2, v3, v4 is 6, 4, 3, 1. Remember that the sum of the degree of
vertex should be an even number.
CONNECTEDNESS
A graph is connected when there’s a path between every pair of distinct vertices of the graph.
This means that, if I start walking from any vertex, I am able to reach any other vertex by
traversing the edges (by the way, you pass through a vertex, but traverse an edge).
In the case of a directed graph, we say that it is strongly connected when there is a path a → b
and b → a whenever a and b are vertices of the graph. And then, it is weakly connected when
there is at least a path a → b or b → a for any vertex a and b in the graph. Or in other words, a
weakly connected graph exists if there is a path between every 2 vertices in the underlying
undirected graph.
A connected component is just a connected subgraph. For example, the graph
has 3 connected components. We can further say that a component is a strongly connected
component / strong component if a component is the maximal strongly connected subgraph.
Cut vertices (articulation points) are vertices whose removal and all the edges incident with it
produces more connected components than the original graph.
Cut edges (articulation bridges) are edges whose removal produces a graph with more
connected components than in the original graph.
ISOMORPHIC GRAPHS
We say that 2 graphs are isomorphic if:
1. They have equal vertices and edges, degree sequence and length of simple circuits
(these properties are known as the graph invariants).
2. Follow paths that go through all vertices so that the corresponding vertices of the 2 graphs
have the same degree.
For example, the graphs below:
You notice something? A and R are isomorphic graphs. Also, F and T, K and X, and M, S, V
and Z are isomorphic graphs. It is easy to identify isomorphic graphs for small amount of
vertices, but remember to use the above rule if the graphs are really complicated.
EULERIAN TRAILS & CIRCUITS
The town of Kӧnigsberg, Prussia was divided into 4 sections by the branches of the Pregel
River. These 4 sections included the 2 regions on the banks of the Pregel (A & B), Kneiphof
Island (C), and the region between the 2 branches of the Pregel (D). In the 18th century, 7
bridges connected these regions.
On Sundays, the residents take long walks through the town. They wondered whether it was
possible to start at some location in the town, travel across every bridge without crossing any
bridge twice and return to the starting point. Do you want to try and see whether you can find a
simple circuit for them?
We’ll come back later. An Eulerian Trail (or Euler Path) is a simple trail that contains every
edge in the graph, while an Eulerian Circuit (or Euler Circuit) is a simple circuit containing
every edge in the graph. For example,
An Euler Circuit in the graph is e3, e1, e2, e6, e7, e5, e4, e9, e10, e8. Take note that an Euler path
doesn’t necessarily return to the same vertex, but an Euler circuit has to. By the way, the word
‘Euler’ is read as ‘Oil-lerr’, not ‘you-lerr’.
It is found that there is a condition for these Eulerian properties to exist:
An Eulerian trail exists in a graph if there are exactly 2 vertices with odd degrees.
An Eulerian circuit only exists if every vertex in a graph has even degrees.
Take a look at the graph above, you will see that every vertex has degree 4. That explains why an
Eulerian circuit exist.
Now, let’s rephrase our previous question. So are we able to find an ‘Eulerian circuit’ for the
Kӧnigsberg bridge? First, we draw the bridges as edges, the river banks and islands as vertices.
We get a graph like this:
Counting the degree of vertices, we find that these 4 vertices have odd degrees. Therefore, we
can conclude that it is impossible to cross all 7 bridges, and come back to the same spot, neither
is it possible to cross all the bridge once in any order. Although we didn’t find a solution, we
proved that a solution can’t found.
HAMILTONIAN PATHS & CYCLES
Just now we did for edges, now we do the same for vertices. A Hamiltonian path (or Hamilton
path) is a simple path in a graph that passes through every vertex exactly once, whereas a
Hamiltonian cycle (or Hamilton circuit) is a simple cycle in a graph that passes through every
vertex exactly once. Look at the graph below:
The red lines shows a Hamiltonian cycle. Note that the word cycle here might not necessarily
mean that it has to be a directed graph. A Hamiltonian path, doesn’t need to start and end at the
same vertex.
Surprisingly, there is no known simple necessary and sufficient criteria for the existence of
Hamiltonian cycles. However, there are 2 theorems over here which might possibly work. I don’t
think this is examinable:
1. Dirac’s Theorem: The graph G has a Hamiltonian cycle if the degree of every vertex is at
least half of the number of vertices. n, with n ≥ 3.
Note that this theorem doesn’t say anything about graphs whose degree of every vertex less than
half of the number of vertices. It might, and might not have a Hamilton path, and you have to
check it through.
2. Ore’s Theorem: The graph G with n vertices has a Hamiltonian cycle if for every nonadjacent pairs of vertices u and v, deg (u) + deg (v) ≥ n.
I believe Hamiltonian cycle questions in STPM won’t be too hard, so don’t worry too much
about it for now. Questions on Euler and Hamilton paths and circuits will involve identifying
them, or focusing on what you can do to make an Euler or Hamilton circuit exist.
OTHER EXTRA INFORMATION
These are probably out of the syllabus, but I treat it as extra information for you:
1. Planar Graphs
A planar graph is a graph that can be drawn in the plane without any edges crossing. In this case,
a region is the area bounded by a circuit in the graph. There are quite a few corollaries for planar
graphs, for example,
* If G is a connected planar simple graph, then G has a vertex of degree not exceeding 5.
* If G is a connected planar simple graph with e edges and v vertices, where v ≥ 3, then
3v – 6 ≥ e
* If a connected planar simple graph has e edges and v vertices with v ≥ 3, and has no circuits of
length three, then 2v – 4 ≥ e.
* Kuratowski’s Theorem states that a graph is nonplanar iff it contains a subgraph
homeomorphic to K3,3 or K5. 2 graphs are homeomorphic if they can be obtained from the same
graph by a sequence of elementary subdivision, which is the action of putting a vertex in the
middle of an edge.
The crossing number is the minimum number of crossings that can occur when a graph is drawn
in plane where no 3 edges are permitted to cross at the same point. The thickness of the graph is
the smallest number of planar subgraphs of G than have G as their union.
2. Euler’s Formula
For a connected planar simple graph, The amount of regons can be represented by the equation r
= e – v + 2.
3. Chromatic Number
We define the chromatic number χ(G) to be the least number of colours needed for a colouring
of a graph. Colouring is defined to be the assignment of a colour to each vertex of the graph so
that no 2 adjacent vertices are assigned the same colour. Recall what you learnt about bipartite
graphs in the previous section. The Four Colour Theorem states that the chromatic number is
always less than 4 for a planar graph. We say that a graph is chromatically k-critical if the
chromatic number of G is k, but for every edge e, the chromatic number is k – 1 by deleting this
edge from G.
We could modify the definition of colouring to describe edges too. Edge colouring is an
assignment of colours to edges so that edges incident with a common vertex are assigned
different colours. The edge chromatic number is the smallest number of colours that can be
used in an edge colouring in a graph. The edge chromatic number can actually be found by
finding the biggest number of degree of vertex in the graph.
4. Vertex Basis
A vertex basis is a set of vertices where there’s a path to every vertex outside this set from
vertices of this set, and there’s no path from any vertex in the set to another vertex in the set. In
other words, any vertex in a directed graph, which only points outwards but not inwards, is
belongs to a vertex basis.
10.3 Matrix Representation (adjacency & incidence, problem models)
In many cases and situations, we need to represent a graph in the form of mathematical equations
or formulas, as this will help us analyse the graph easier. Before we learn how to use a matrix to
represent a graph, we’ll first consider representing a graph in a list. We call it an adjacency list.
By the way, the word ‘adjacent’ means ‘next to’. This list shows us the vertices, and its adjacent
vertices and how they are related. For example, the graph below is represented by its adjacency
list on the right.
Notice that some adjacent vertices are double counted due to multiple edges. You should be able
to create an adjacency list from a given graph, and also sketch the graph with a given adjacency
list.
The above adjacency list was for an undirected graph. An adjacency list for a directed graph
looks like the one below:
Sometimes it really doesn’t matter whether you have dots or circles. Notice that this adjacency
list has its top row labelled with the initial and terminal vertices, which differs from the previous
one.
As you know, anything that can be represented in a table can be represented in a matrix. There
are 2 kinds of matrices that we can represent a graph with:
1. Adjacency Matrix
Consider the graph above. Once we label down all the vertices, we can represent it as an
adjacency matrix, with the rows and columns being the vertices. When there is an edge
connecting 2 vertices v1 and v2, then the slot in the matrix will show the number 1, and if there
are 2 edges, then it will be 2, and vice versa. Note that if there exist a loop in a vertex itself, it
only counts as 1 edge, unlike the degree of vertex when we counted 2. The adjacency matrix of
the graph above is like the one below.
You don’t really need to label the vertices in the matrix, I put it there for clarity. An adjacency
matrix for a directed graph is slightly different. The rows represent the initial vertices, while the
columns represent the terminal vertices. Take a look at the graph and its adjacency matrix
below:
This matrix is the more useful one. It helps us to find the number of paths with a certain length
between a pair of vertices. The power of the matrix Mn represents the length of a path. So if we
square the matrix above, getting
gives us a new matrix which shows us the amount of paths with length 2 from vertex to another.
It means, there is 1 path of length 2 from a to a, 3 paths of length 2 from b to a and etc. When I
find M3, it will be paths of length 3 and so on. With this, we are able to find the shortest path for
one vertex to reach another vertex and also to find the number of paths of a particular length
from one vertex to another. In cases where you have graph of vertices more than 4 or 5, you may
want to consider multiplying the particular row and column only to find the required answer, as
evaluating the whole matrix will be wasting your time.
One more thing to note, is that the sum of a row is not the degree of vertex for a pseudograph,
since the count of loops will be wrong. You need to double count the loops in order to get the
correct degree of vertex.
2. Incident Matrix
To create an incident matrix, you need to label all the edges as well. For this matrix, the rows are
the vertices, but the columns are the edges. So for every slot in the matrix, it will be 1 if its
vertex is connected to that edge and 0 if none of the above. See the graph above, and its incident
matrix below:
Multiplying this matrix with itself won’t get you anywhere. Notice that every column adds up to
2, except if it is a loop. This is because, every edge is connected to 2 vertices. Again similar to
the adjacency matrix, adding up the rows will not give you the degree of vertex if it is a
pseudograph.
PROBLEM MODELLING
Graph theory has many uses, as it can help us solve some complicated, as well as some simple
daily problems. In planning for a flight route, you could construct a graph where the vertices are
places, edges exist when there is a flight between the two places. In an assignment of a team to
do a particular project, we can use graphs to assign who to do what, after knowing the abilities
and talents of the individuals.
Some examples of graph models:
Acquaintance graphs:
Vertices represent people. ab is an edge if a and b know each other.
Influence graphs:
In studies of group behaviour, it is observed that certain people can influence the thinking of
others. A digraph can be used to model this: uv is a directed edge if u can influence v.
Call graphs:
Directed multigraphs can be used to model telephone calls in a network. Here telephones are
represented as vertices and each call from a to b is represented by ab. From this graph, we can
actually deduce who has changed his phone number by viewing who the new phone line
contacts, and who has not used his phone for a long while.
Web graphs: The World Wide Web can be modelled as a digraph where each webpage is
represented by a vertex. There is a directed edge ab is there is a link on a pointing to b.
Precedence graph:
Computer programs can be executed more rapidly by executing certain statements concurrently.
However, certain statements depends on the results from other statements. Thus we to create a
precedence graphs. Here each vertex represents a statement. A directed edge ab means that a
must be executed before b.
Dual graph:
The resulting graph of a map. You represent a map with a graph. For example,
The red graph G’ is the dual graph of the existing graph G. The red dots represent a region, and
an edge exists between two vertices when the regions are adjacent to each other.
A more interesting one is the weighted graph, which is a graph that has a number assigned to
each edge. These numbers could represent the distance between 2 cities, the airfare between 2
cities, and we could actually come out with a shortest / cheapest path from a place to the other.
Let me give you an example:
Let’s say, the alphabets A to G all represent a name of a place, and the numbers represent the
time (minutes) to get to each town by car. We want to find the path which takes the shortest time
to get from town A to town G. To do this, we will make use of Dijkstra’s Algorithm. I won’t
explain this algorithm in words (as even I don’t understand what the textbook talks about), but
I’ll briefly show you how it’s done.
Starting from A, you find the shortest edge to its adjacent town. It is D, which takes 2 minutes.
Then find the shortest time to reach B, and we see that it takes 7 minutes. Now, all the adjacent
towns of A are done, we shall proceed to the adjacent towns to D and B, which are C, E and F.
From A, the shortest distance to F, you get there either by passing B or D, don’t you? And
obviously, you make use of the route with the shortest time to both of those towns. So make use
of those 2 routes, we find that the shortest time to F is 11 minutes (ADF). We further find that
the shortest time to E and C are 14 minutes (ABE) and 15 minutes (ABC) respectively. Using
these 3 routes, we find the shortest time to get to G, which is obviously, the ADFG route, 22
minutes.
Try googling for the Travelling Salesman Problem. It has something related to this issue over
here.
BAB 11
11. Transformation Geometry
11.1 Transformation (isometries, similarity transformation, stretch & shears)
A transformation is a correspondence between 2 sets of points in a plane. A transformation M
is described as a linear transformation of n-dimensional space when it has the properties
T(λx) = λT(x), and
T(λx + μy) = λT(x) + μT(y)
where λ and μ are arbitrary constants.
Recalling your Form 4 Mathematics, you learned how to find the image of points on the
Cartesian plane under a certain transformation. Here you will further learn how to use matrices
and some simple linear algebra to represent transformations in 2 dimensions only.
An equation of a transformation looks like this:
where M is a matrix of transformation. The matrix M,
will determine how the point (x, y) will transform into its image (x’, y’). The matrix M is easy to
compose. Basically,
where (1, 0) and (0, 1) are the unit vectors of directions x and y respectively (or rather, you can
treat these 2 vectors as points on the x and y plane). For example, if I want to transform the point
(1, 0) to (2, 0), and the point (0, 1) to (0, 2), then my matrix of transformation will be
So if you want to find the transformation of a unit box, (0, 0), (1, 0), (0, 1) and (1, 1), just use
this matrix and pre-multiply with the points, then you will get the image of the transformation.
An example will be given in the next section.
Knowing how a transformation matrix works, we now want to learn how to represent a few types
of linear transformation with 2 × 2 matrix. We learned the 3 isometries: translation, rotation and
reflection in Form 4. Now we will go through them again, and then we will learn some new ones
too. By the way, an isometry is a distance-preserving map between metric spaces. Geometric
figures which can be related by an isometry are called congruent. This means that, after an
isometric transformation, the area remains unchanged.
1. Translation
Translation is just the moving of coordinates, moving of an object from one point to another,
without altering its size, shape and orientation. The matrix below will represent a transformation
where a and b will be the amount of shift of the object. (1, 2) will translate the point (x, y) one
step right and 2 steps upward and vice versa.
2. Rotation
Given an angle, a point is rotated along the origin either clockwise or anticlockwise. A rotation,
once the angle being known, could be represented by the matrix
Note that this rotation restricts to rotation about the origin only. We will discuss later what to do
if the point of rotation is not zero. The area and the shape of the object is unchanged, and once
rotated about 360o, the object gets back to its initial position.
3. Reflection
For a reflection, you need a line which acts like a “mirror”, such that the whole image reflects to
the other side of the the line, equidistance and perpendicular to that line. This line, in this case,
must pass through the origin. Again, the shape of the object doesn’t change, and so is the area. A
few common reflection matrices are as follows:
along x-axis
along y-axis
along the line y = x
It is actually a little tedious to find the matrix of reflection with only given a line in the form of
y=mx. First, you find the normal line, y = – m-1x + c. Substitute the points (1, 0) and (0, 1) to
find two parallel normal lines, which passes through these 2 points. Next, you find the
intersection point of these 2 lines, with the line of reflection. Taking that intersection point as
the mid point, you probably know how to figure out where the reflected points of (1, 0) and (0,1)
are, and thus completing your matrix.
But there is a faster way. Let the line of reflection y = mx be written in the form of y=(tan θ)x.
We see that the gradient m = tan θ. With this information, we find θ, and the reflection matrix is
just represented by
You can try figuring out why this is true. This has something to do with the angles subtended
from the point to the origin, then the angle of the line, the uses of cosine and sine and etc. To find
cos 2θ and sin 2θ, you could either calculate θ, or you might want to make use of some
trigonometric identities.
4. Scaling
Scaling does not preserve the size, but it preserves the ratio of the object. This scaling starts from
the origin. Scaling can be represented by the matrix
where a is a constant. If |a| > 1, then it is an enlargement. If |a| < 1, then it is a contraction, that
means the size decreases. A negative value of a makes the object enlarge or contract at another
direction. In the case of the red box above, it will enlarge in the 3rd quadrant instead of the 1st. a
also represents the factor of enlargement. a = 2 means that the image will be twice as large as the
object, and vice versa.
5. Stretch
A stretch looks similar to an enlargement, but this time, the ratio of the sides and shape is not
preserved. It can be a stretch along the x-axis, along the y-axis, or a stretch along both axis, with
different proportions. A stretch is represented as below:
along x-axis
along y-axis
You probably could have guessed that for values of |a| < 1 turns the stretch into a compression,
while a negative value of a stretches the object the other way. For a stretch, it really doesn’t
matter whether it stretches from the origin or some other point, as they are the same anyway.
6. Shear
A shear deforms a shape a little. It turns a square into a rhombus, as shown above. It looks like as
if we are flattening something sideways. The shear can be represented by the matrices below:
parallel to x-axis
parallel to y-axis
2-way shear at different angles
The angle θ is calculated from the opposite axis. For example, the box above undergoes a shear
parallel to the x-axis, and the angle is calculated clockwise from the positive y-axis. If the angle
was 45o, we say that it is a shear of 45o parallel to the x-axis. Conversely, it can be a shear of xo
parallel to the y-axis, which looks like the one below:
The shear depends on the origin too.
WHEN THE REFERENCE POINT IS NOT THE ORIGIN
As I said earlier, these transformations transform with respect to the origin. rotations, reflections,
scaling and shears all have their reference points at the origin. In order to make their
transformation not from the origin, we need to translate the point of reference to the origin
(translating the coordinate of the objects together), do the transformation, then translate the
coordinate points back again. I don’t know what is the terminology for this, since this is
something I figured out myself. If the point of rotation / scaling / shear is (a, b), with M as the
transformation matrix, then (x, y) is transformed as follows:
In the case of a reflection, as I said earlier, the reflection matrix above applies only for lines
passing through the origin, y = mx. Now that we want to find the reflection of an object across
the line y = mx + c, we take (0, c) as the point of reference to be subtracted and added in this
case. The transformation will become
You can try it out and see whether this is true. You will find that translating any point (a, b) will
be correct, as long as the line translates such that it passes through the origin.
SIMILARITY TRANSFORMATION
Two square matrices A and B that are related by A = P-1BP where P is a square non-singular
matrix are said to be similar. A transformation of the form P-1BP is called a similarity
transformation, or conjugation by P. Try recalling what you learnt about similar triangles in
Maths T. Similarity transformation simply means that the 2 transformation A and B are similar to
each other, just that they probably changed their basis, coordinate or are multiplied by a different
factor. I don’t have much information on this, so I wouldn’t elaborate much here (please share
with me if you have good information on this, I will add it in here some day). However, if you
are asked to find whether 2 matrices A and B are similar, just make use of the formula above,
and if the equations are consistent, that it is, if not then otherwise.
11.2 Matrix Representation (images, scale-factor, operations)
Knowing all the different types of transformation, we shall now get to do the algebra of
transformations. Let’s begin with a simple example:
Find and describe the image of the triangle ΔABC where A(1, 0), B(2, 0) and C(2, 3) under the
transformation matrix
.
Plotting the new coordinates OA’, OB’ and OC’, we find that the transformation is a reflection
in the x-axis (or reflection in Ox).
Singular transformation in 2 dimensions maps all shapes are transformed into either a point or
a line, and a line is transformed into a single point. In other words, the area of the object is
destroyed. Consider the matrices below:
The first one maps all shapes to the line y = x. The second matrix maps all points to the x-axis,
while the last one maps everything to the origin. You will know that a matrix M is a singular
matrix when | M | = 0. There is a way to tell whether a matrix maps to a line or to a point.
Consider a singular matrix
If the column vector (a, b) = (c, d), then the matrix maps all shapes to a point. If the column
vector (a, b) ≠ (c, d) but (a, b) // (c, d), then the matrix maps all shapes to a line.
AREA SCALE-FACTOR AND THE DETERMINANT
Throughout our discussion on transformations, we haven’t discussed on how the transformation
affects the area of an object. We want to know whether a certain transformation makes a certain
object enlarged or diminished. It turns out that the determinant of the matrix of transformation
tells us information on how the area would be in the end. With the matrix of transformation M,
We see that
Area of object × det (M) = Area of image
In the case when | M | = 0, the transformation maps lines or shapes to a point, and the area is
destroyed, in which agrees with the part earlier on.
Invariant points are points which map to themselves after the transformation. This means that
If you might have noticed, this reminds you on the chapter about eigenvalues and eigenvectors,
in which this situation, the eigenvalue is one. To find the invariant points for the transformation
M, for example
You substitute it into the equation above, then you get
x=x
y = –y
So this tells us that the invariant points of this transformation are any points (x, 0), or simply just
the points on the x-axis. Verify yourself to see whether this is true.
An invariant line, maps a line to the same line, but not necessarily mapping all the points to the
same points. In our study, all invariant lines must pass through the origin, and even if there were
invariant lines that do not pass through the origin, it must be parallel (has the same gradient) to
another invariant line which passes through the origin. To find the invariant lines under a certain
transformation, we make use of the parametric form of the line, x = t, y = at. We substitute the
variable t into x and y and we have
or to make life easier, we rather put
Note that the variable x maps to another variable X, but not to itself. I’ll show you an example:
Find the invariant lines of the transformation
So we have two equations
mx = X
x(5 – 4m) = mX
Dividing both the equations, we get a quadratic equation
m2 + 4m – 5 = 0, m = 1, –5
We have the lines y = x, y = –5x.
You might want to test whether the lines y = x + c or y = –5x + c are invariant too. Substitute it
back into the equation,
For m = 1,
x+c=X
5x – 4x – 4c = X + c
We get c = 0, ∴ the lines y = x + c are not invariant.
For m = –5,
-5x + c = X
25x = –5X + 5c
Since both are just 1 equation, c is dependent of x and X, and thus y = –5x + c are invariant
lines.
∴ The invariant lines are y = x, y = –5x + c, where c is an arbitrary constant.
TRANSFORMING LINES
Knowing how to transform points, we shall now learn how to transform lines. As in the part on
invariant lines, we substitute the parametric equation of x and y, then we solve the equation in
terms of X & Y, as the equation below
Example,
Find the image of the line y = 2 – 2x under the transformation
We first substitute the line into the transformation,
2x + 2 – 2x = X = 2
4x + 4 – 4x = Y = 4
∴ The line transform into the point (x, y) = (2, 4).
Notice that in this case, the line is transformed into a point. In other cases if it transforms into
another line, remember to find an equation that relates X with Y. You should be aware that this is
the very same method you will do if you were to find the transformation of circles, parabolas,
hyperbolas, ellipses or other curves. Make use of their parametric equations and substitute them
into the equation. Recall the parametric forms of these curves.
INVERSE TRANSFORMATION
I think I don’t need to elaborate too much on this. An inverse transformation helps us to find
the object if the image is given. You find the inverse of the matrix of transformation, and the
equation will become
From here you should recall that a singular transformation has no inverse. In other words, you
can’t find a matrix that transform a single point to 4 other points, or transform a line into a
pentagon.
ADDITION, SUBTRACTION, SCALAR MULTIPLICATION, COMPOSITION
The addition and the subtraction of transformations M and N,
M(x) + N(x) = (M + N) (x)
M(x) – N(x) = (M – N) (x)
Although is defined so, has no geometrical meaning. For example, I add a matrix of rotation of
45 degrees with a matrix of reflection along the line y = x, gives you some awkward
transformation, which doesn’t really have a relation to both. But the scalar multiplication of a
matrix does mean something,
(cM) (x) = c(M) (x)
as it has the effect of scaling. Both these operations, I assume you already know how to do so, as
this is covered in the chapter Matrices in Maths T. We are more interested in the composition of
transformations. Given two transformation M and N, If the an object undergoes transformation
M, then transformation N, it can be written as
Or we could also write it as (N ∘ M) (x) = x.
I think you probably remembered in form 4 that the transformation NM means “transform M
first, then transform N”. This is quite straightforward, I think. In exams, you will be asked to find
the matrix of the combined transformation of 2 or more transformations. If not, you will be given
the points of the object and image, with half of the transformation, then ask you to find the other
missing transformation, as well as describing it. Just make use of what you learnt about Matrices.
BAB 12
12. Coordinate Geometry
12.1 3D Vectors (scalar & vector product, properties)
This chapter will be a continuation and combination of what you learnt from the chapters
Coordinate Geometry and Vectors. As we come into 3 dimensions, we make use of vectors as it
makes our analysis much easier. Here, we introduce the coordinate systems for threedimensional space ℝ2. The study of 3-dimensional spaces lead us to the setting for our study of
calculus of functions of two and three variables later in University.
We set up the 3D coordinate system by fixing a point O in space (called the origin) and take
three lines passing through O that are perpendicular to each other. These lines are labelled as xaxis, y-axis and z-axis respectively. The direction of the z-axis is determined by the right-hand
rule:
I think you should be familiar with this rule in Physics. When your fingers point in the direction
in the x-axis, and make it curl towards the y-axis, then your thumb will be pointing to the z-axis.
Try to get used to this setting: with the z-axis pointing upwards, x on the left, y on the right.
A point P in space can be represented by an ordered triple (a, b, c) where a, b and c are
projections of the point P onto the x-, y- and z-axis respectively. The three dimensional space is
also called the xyz-space.
You probably should know how we represent a vector in 3D. Using the same conventions of unit
vectors i and j, we just add one more k to represent the unit vector in the z direction (e.g., 2i + 3j
– 5k). Everything about a vector in 2D works about the same in 3D. The length of a vector P(a,
b, c) follows the Pythagorean relation
And similarly, the distance between 2 position vectors A and B can be found by the equation
Let’s do a little revision on the properties of vectors, scalar multiplication, addition, subtraction
& etc. We let a, b and c be 3 vectors, k and h be 2 constants, then we have
(1) a + b = b + a
(2) a + (b + c) = (a + b) + c
(3) a + 0 = a
(4) a + (–a) = 0
(5) k(a + b) = ka + kb
(6) (k + h)a = ka + ha
(7) (kh)a = k(ha)
(8) 1a = a
SCALAR PRODUCT
Scalar product, also known as the dot product, is a multiplication of 2 vectors (a, b, c) and (d, e,
f) such that
The scalar product yields an answer in the form of a scalar, which is a value instead of a vector.
In trigonometry, it can be represented by the equation
a • b = |a||b| cos θ
I believe all these are not new to you, as you have studied it in Maths T. However, in this section,
we will be going quite detail on the algebra of vectors, unlike in Maths T where you focused
more on the applications, namely the resultant force / velocity and relative velocity. Let us
look at the properties of scalar products. Given a, b and c are vectors, d being a constant, we
have
(i) a • b = b • a (commutativity)
(ii) a • (b + c) = a • b + a • c (distributive law)
(iii) (da) • b = d(a • b) = a • (db)
(iv) 0 • a = 0
(v) a • a = |a|2
We say that two vectors are orthogonal to each other when they are perpendicular to each other.
Two vectors a and b are orthogonal if and only if a • b = 0. In 3D, we say that a vector a is
orthogonal to vectors b and c if a is perpendicular to both b and c.
The component of b onto a (or scalar projection) is the resolved part of a in the direction of b.
This means that when we have 2 vectors a and b pointing at 2 different directions, with their tail
of the arrow connected to each other, the component of b onto a is the length of the orthogonal
projection of b onto a.
We write the notation compa b to represent the component of b onto a, and mathematically, it
has the value
and according to the picture above, it is the length of PS.
The vector projection of b onto a is just the vector PS itself. it has the formula
We write the notation proja b to represent the projection of b onto a. Remember that the answer
is a VECTOR, not just a VALUE.
For a vector a (ai, aj, ak), The direction ratio is written as ai : aj : ak, whereby your answer could
be in the simplest form (divided by its highest common divisor). The direction cosines of the
vector a are
respectively.
The angle between the vector and the z-axis can be found using the equation
and therefore you can deduce the angle between the vector and the x-axis & y-axis respectively.
Recalling that the dot product of 2 vectors, a • b = |a||b| cos θ, we can easily find the angle
between 2 vectors,
VECTOR PRODUCT
Also known as cross product, the vector product is something new for you, as it cannot exist in
a 2D plane. We define the vector product of 2 vectors (a, b, c) and (d, e, f) to be
The cross product yields a vector (it has a magnitude and a direction), which is orthogonal to
both the original vectors. In trigonometry, the cross product a × b = |a||b| sin θ.
You can use the right hand rule to determine the direction of the cross product. Point your fingers
to the direction of a, curl it towards the direction of b, then your thumb points in the direction of
a × b. This information is very important we come to the section on planes.
Different from the dot product, any vector cross itself yields zero.
ḭ × ḭ = 0, j̰ × j̰ = 0, k̰ × k̰ = 0
Or in other words, the cross product of 2 parallel vectors is zero. You can use your right hand
rule to verify this. For the unit vectors, you could also get the following results:
We shall now see the properties of the cross product. If a, b and c are vectors and d is a scalar,
then
(i) a × b = –b × a
(ii) (da) × b = d(a × b) = a × (db)
(iii) a × (b + c) = a × b + a × c
(iv) (a + b) × c = a × c + b × c
(v) a • (b × c) = (a × b) • c
(vi) a × (b × c) = (a • c)b – (a • b)c
(vii) (a × b) • a = 0
Probably (vi) is hard to remember. (vii) is just the definition of the dot product, where the dot
product of 2 orthogonal vectors equals to zero. Also take note that the cross product is not
commutative. Reversing the a’s and b’s will result in an extra minus sign.
The cross product has many applications, especially in physics. You use the cross product to find
the torque, magnetic force and etc. In geometry, we see that the area of a triangle made up by 3
vectors a, b and c is
A scalar triple product of vectors a, b and c is a • b × c. If you might have noticed, you have to
do the cross product first before the dot product. If you did the dot product first, then you get a
scalar crossing a vector, in which by definition, does not exist. Note also that a • b × c = a × b •
c. We could evaluate a • b × c using determinant
Where a = (a1, a2, a3), b = (b1, b2, b3), and c = (c1, c2, c3) respectively. We use the scalar triple
product to find the volumes of various solids. Since b × c is the base area of a solid, when dotted
with another vector a, it multiplies the area with the cosine of the height. So the formulas for
different solids are as below:
1. volume of cuboid & parallelogram:
a•b×c
2. volume of tetrahedron:
3. volume of triangular prism
4. volume of pyramid
12.2 Straight Lines (equation, skew, parallel, intersect)
Straight lines in 3 dimensions isn’t as easy as it is in 2 dimensions. When we want to construct a
straight line in space, it must be pointing at a specific direction, and you must give at least one
point that it passes through.
EQUATION OF A LINE
Let r be a line in xyz space, we let a and b be 2 vectors and t be an arbitrary constant. The vector
equation of a line can be represented by the equation
The vector a (x0, y0, z0) is a position vector. It is a point in space in which the line passes
through. Then the vector b is a direction vector. This vector determines the direction of the line.
The constant t is there, meaning that any scalar multiplication of the direction vector, is also the
same direction vector. Summarizing it up, you actually get this:
You need some visualization here. Look at the diagram below. The green line L first needs a
point a in space. Then you need a direction vector b to tell you where the line extends too. So if
you analyse carefully, an equation of a line is not unique. You can put in an infinite amount of
different position vectors, or use an infinite amount of direction vectors of the same ratio to
construct different line equations, which actually refers to the same line. This is unlike lines in
2D, where a line only has one representation.
You might have also noticed that the vector equation of a line is actually a parametric equation
of a line. If you break it down,
This is where
is the position vector a, and
is the direction vector b. Probably
now you figure out why the line is not unique, since parametric equations are not unique. By the
way, we can also write the vector equation as r = ai + bj +ck + t(pi + qj + rk). I don’t like this
method as we waste too much time writing the ijk’s and +/- signs.
Now if we try to modify the 3 parametric equation, such that it is t in terms of something else,
we get the cartesian equation, as below:
We normally write this whole chunk of equalities without the ‘=t’, I only show it here for clarity.
A line in 3D space has 2 equal signs. So what if p, or q, or both are 0? An example of such lines
are
You might want to substitute it back into the vector equation to check this out. You probably
could have guessed why we prefer to use the vector equation instead of the cartesian equation.
WIth all these information, you should be able to know how to construct a line equation, given
only 2 points it passes through.
SKEW, PARALLEL, INTERSECT?
In 2D, lines are either parallel to each other, or they intersect. However in 3D, there exist another
relationship between 2 lines, in which they do not intersect and are not parallel to each other.
These lines are called skewed lines.
Our question is this: how do we show that whether 2 lines are parallel, intersect one another, or
are skewed?
To show that 2 lines are parallel, we show that they have the same direction vector. The 2 lines
below
are parallel, because they have the same direction vector. You can further check whether the
lines coincide (or, whether they are just both the same line). To do this, we take the point (1, 2,
3) and substitute into (x, y, z) in the second equation. Doing some algebra, we find that the value
of s for the 3 parametric equations are not consistent. Therefore, it does not coincide, and is a
parallel line. This method also tells us whether a particular point lies in the line. So here we see
that the point (1, 2, 3) does not lie in the second line.
To show that 2 lines intersect, we let line 1 equal line 2. We get 3 equations. Consider the two
lines below:
We have
-3 + 4t = s
-5 + 3t = -9 + 2s
-4 + t = 13 – 3s
If we could find a value of s and t such that it satisfies all the 3 equations, the lines intersect. If
the value of s and t contradict one another, then the lines are skewed. We can further find the
point of intersection. By using the values of s and t, substituting them back into the initial
equations, we get the intersection point. In this case, the point of intersection is
(5, 1, –2).
DISTANCE FROM POINT TO LINE
Given a line r1 and point r2,
to find the distance from the point to the line, we want to make use of the sine of the angle
between the line r1 and the line (r2 – a). Look at the diagram below.
Recalling that |a × b| = |a||b| sin θ, the distance between the line and the point r2 is
DISTANCE BETWEEN 2 LINES
To find the distance between 2 lines, we have 2 situations:
1. the lines are parallel
Given the two lines, we can make use of what we learnt from the part above, and find that the
distance between these 2 lines are just
2. the lines are skewed
Given 2 lines, the shortest distance between 2 skewed lines can be found through the equation
where k is a constant. Let me explain this a little. The distance between the two lines is r2 – r2. It
is parallel to the normal vector (b × d), and that is why we multiply it with k. So after setting up
the equation, we get the equation c + sd – a – tb = k(b × d), which is actually 3 parametric
equations in terms of 3 variables t, s and k. From here, we solve for s, t and k, and we multiply k
to the magnitude of b × d,
and thus you get the shortest distance between 2 skewed lines.
ANGLE BETWEEN 2 LINES
Recalling the formula you learnt in the previous section,
You use this formula to find the angle between two lines, by substituting a and b as the direction
vectors of both lines. Shouldn’t be a problem for you, I think.
12.3 Planes (equation, intersection, distance, angle)
A plane is simply just a flat surface in space. We first start by introducing the vector equation
of a plane,
where a is a position vector, and b and c are 2 non-parallel vectors, s and t being 2 arbitrary
constants. Consider the diagram below,
We need to have at least 2 direction vectors to show the direction of the plane, and then a point
to know where does the plane lie exactly. We multiply the 2 direction vectors with different
constants, to show that any direction vector proportion to that ratio is also a direction vector.
Similarly, this form of the plane equation is not unique. Again, this form can be written in the ijk
form, in which looks ugly and long.
There is another vector equation of the plane. Though not named properly, I call it the ‘normal’
form. We first find the normal vector of a plane, i.e., a vector which is normal to both the
direction vectors. You obtain the normal vector by getting the cross product of b and c. Suppose
that the normal vector is (a, b, c), the normal form of the equation will be
Where d is constant which determines the position of the plane. d has a significant meaning. If
the normal vector (a, b, c) is a unit vector (magnitude = 1), then d is the perpendicular distance
from the plane to the origin. For 2 planes, if their values of d have opposite signs, it means that
they are at the opposite sides of the origin. Finding the value d is simple: Just plug in a point
lying in the plane into x, y, z, then you get it.
If we evaluate the dot product above, we get the cartesian form,
This cartesian form is unique, unlike the other forms. This is the most common form of the
equation of planes used. You can see that this equation is linear, and that the equation
y = mx + c, or x = a are all equations of planes in 3 dimensional space.
So to sum up, to construct a plane equation, you need one of these information:
1. 3 points lying on the plane.
2. 2 points lying on the plane, and 1 directional vector.
3. 2 lines lying on the plane.
4. a point lying on the plane, and the normal vector of a plane.
There is a fast way to get the equation of the plane when 3 points are given. I haven’t tried this
before, but you could make use of the determinants below to find your equation:
LINE LIES IN / PARALLEL / INTERSECT A PLANE
We shall now discuss how to determine whether a line lies in / is parallel to / intersects a plane.
Given the equations of the line and plane to be
We first find whether the direction vector of the line is parallel to the plane. In other words, we
want to know whether the direction vector of the line is perpendicular to the normal vector of
the plane. By taking b • n, if the answer is zero, then the line is parallel to the plane. We might
want to know whether the parallel line actually lies in the plane. We can do this by substituting
the position vector of the line into r2, and if LHS = RHS, then indeed the line lies in the plane,
and is otherwise if the equality doesn’t hold.
So if b • n ≠ 0, this means that the line definitely intersects the plane. The point of intersection
can be found by letting r1 = r2, that is,
You should be able to solve for t, which satisfies all the 3 parametric equations. Then finally, to
find the point of intersection, we substitute t back into the line equation to find (x, y, z).
PLANE PARALLEL TO / INTERSECTS ANOTHER PLANE
Since the cartesian equation is unique, 2 planes can only coincide one another if they have the
same plane equation. 2 planes are parallel only if they have the same normal vector, which is
also easy to find. Planes that are not parallel have to intersect somewhere, and we can determine
the line of intersection. Consider 2 plane equations below:
We first find the common direction by using
this will be the direction vector of the intersecting line. To find a position vector of the line, we
make use of the cartesian equation of both planes,
We need to solve this system of linear equations to find x, y and z. Recall the Chapter on
Matrices, this system of equations have infinitely many solutions. As usual, let one of them be t,
solve for x, y and z in terms of t, and then just substitute a value for t to get a random position
vector. The line equation is thus found.
DISTANCE FROM A POINT / PARALLEL LINE TO A PLANE
I think I won’t prove this one, as it is similar to the proof in 2D. To find the distance between a
point (x, y, z) to a plane, make use of the equation in your Data Booklet:
Notice that there is something different in my equation. It is ‘-d’ instead of ‘+d’ because I made
use of the cartesian equation ax + by + cz = d instead of ax + by + cz + d = 0. Please DO NOT
CONFUSE THEM.
If you want to find the distance between a parallel line to the plane (note that the line has to be
‘parallel’ to have a ‘distance’…), you substitute the position vector of the line (x, y, z) into the
above equation, and you get it.
DISTANCE BETWEEN 2 PLANES
Given 2 parallel planes,
We can find the distance between them by finding
I will explain why this makes sense. Firstly, you should recall that the values d/|n| and e/|n| are
the perpendicular distances from the planes to the origin. Also remembering that the distance
really depends on whether both the planes lie on the same side of the origin, or the other (same
sign or different sign). You subtract them, then take the modulus because distance is never
negative.
ANGLE BETWEEN LINE AND PLANE
Consider a line with direction vector a and a plane with normal vector n. The angle between the
line and the plane can be found by using the equation
Note that if you used cos θ, you would have gotten the angle between the line and the normal
vector instead.
ANGLE BETWEEN PLANES
The angle between 2 planes is actually the same angle between the 2 normal vectors. So given 2
planes with normal vectors m and n respectively, we can find the angle between 2 planes by
using the dot product,
Recall that this is the same formula to find the angle between 2 lines.
Now that you know how to construct planes, you might be curious as in how 3D shapes are
constructed. Again, you could make use of the applet I shared with you in the previous post,
from the drop down menu of new graph, choose z = f(x, y) surfaces. Fiddle around it and have
fun creating awkward shapes. This is obviously out of your syllabus, but let me just give you
some equations for some very common shapes in 3D:
cylinders,
x2 + y2 = r2
elliptic paraboloid,
z = x2 + y2
hyperbolic paraboloid,
z = x2 – y2
ax2
ellipsoid,
+ by2 + cz2 = 1
elliptic cone,
+ y2 – z2 = 0
x2
x2
hyperboloid
+ y2 – z2 = 1
BAB 13
13. Sampling & Estimation
13.1 Random Samples (population, parameter, statistic)
In statistics, we are always interested to get information from a particular group, be it people,
animals, or even non-living things. This group of interest is what we called as a population. A
population is a particular group which we need information about in a statistical enquiry. A
population can be very big, for example, the amount of hair growing on one’s head, or the
amount of people in a country. So some times, we could only gather information from a sample
of people. A simple random sample is a sample of size n if all possible samples are equally
likely to be selected. So here, we differentiate the terms population and sample, as the sample
being the subset of a population.
A parameter is an unknown or known numerical characteristics of a population, such as the
mean μ and the standard deviation σ. A statistic is a value computed from a sample such as
mean x̅ and standard deviation s. Notice the symbols for both cases are different, and we will
make use of this convention. So here we can conclude that the parameter is the actual value of a
population, while the statistic is a value obtained from samples, which is supposed to be quite
close in value to the parameter.
In order to get the information required, we need to do surveys. There are 2 main kinds of
surveys:
1. Census
A census is done to survey on every single member of a population. For a country, they need to
do a census to count how many people are there in it. Or in a class, we need everyone to submit
their health report, in order to know which blood type do the students belong to. However, there
are situations that the census can’t be used. In infinite samples, for example, we have an infinite
number of stars, and we can’t measure the brightness of every star to find its mean brightness or
distance from the earth. Another example, is testing the durability of light bulbs. To test the
average lifespan of light bulbs, you can’t test every light bulb, if not, you’ll destroy the
population!
2. Sample Survey
A sample survey is done by interviewing / collecting data from only a small group of members
within the group, which is the sample. A sample is always less than 100% of the population. For
example, we do a survey on 100 residents in Petaling Jaya, to see whether they like it if we
replace the McDonald outlet in SS2 with an A&W outlet.
Both the census and a sample survey have their advantages and disadvantages. To sum up, a
census is good for a small population, and a sample survey is more suitable for a big population.
Look at the table below:
Before you start sampling, you need to do a few things. First, you need to identify the target
population, as in where and who do you want to interview. Next, you determine the sampling
units, the people / item to be sampled. If your population is all the primary schools in Malaysia,
is your sampling unit the student, the teacher, or the canteen waiter? You have to make it clear.
Then, you need a sampling frame. You need a list in which the sampling units within a
population are individually named or numbered. Of course the list cannot be complete, or
sometimes just couldn’t be generate, as the list of units will change, move in and out, or maybe if
they are fish in a pond, they couldn’t be listed down!
Once you are done, you can start your survey.
Knowing that we can start surveying, we need to know the possible sampling methods. We
shall not focus on census in this chapter (the title says it). Now we shall look into a few types of
sampling methods:
1. Random Sampling
I believe you are familiar with the term ‘random’. It means that you do not choose a sample on
purpose, you just simply pick one. There are 3 kinds of random sampling:
Simple Random Sample
As its name suggest, it is ‘simple’, you don’t need to do any homework to get that sample. You
could draw lots, use a random number to choose which unit you want to take the survey. You can
make use of a random number table to choose your units. It acts as a large dice, and looks
something like the one below:
You can use numbers from left to right, following the numbers given. Or you could also close
your eyes, and use a pencil to point on a number on the table. For example, in a group of students
numbered 1 to 100, you want to choose 5 random students. You can take 2 digit numbers starting
from the left of the table, namely 82, 03, 14, 58 and 21 to be the students you want.
You could actually use your calculator as a random number generator. On your CASIO fx570MS, press shift - Ran#, then you will get a random number, 3 decimal places, between 0.000
to 1.000. You can use multiplication or division to manipulate the random number to the range
you want.
Note that there exist 2 kinds of simple random samples, one with replacement, one without
replacement.
Systematic Random Sample
In systematic sampling, you make use of a certain pattern, a certain sequence to find your
samples. For example, in a list of 1000 people, you take every kth person to take the survey,
depending on your sample size.
Stratified Random Sample
In a stratified sample, there are many distinguishable layers. For example, in a population of
people, they have different age groups, they have different occupations and etc. We take a few
units from different age groups, and combine them in one sample in the end.
2. Non-Random Sampling
I think I don’t need to elaborate much on this. It is not random, and therefore you choose a unit
with a solid and particular reason. There are 2 kinds over here:
Clusters
Clusters are like natural sub-groups of a population. For example, in a primary school, there are
6 classes in standard 1, with all the kids having the same status. Note that this differs from
stratified random sample, since stratas are different, and classes are alike. You choose to study
on one cluster, which means that you didn’t randomly pick students from any class in the school.
You save a lot of effort, time and money, as you don’t need to pick the survey forms from every
class or so.
Quotas
Quota sampling is widely used in market researches where the population is divided into groups
in terms of age, sex, income level and etc. Then when you are about to survey, you already have
your plans in mind: I want to survey one person who has high income, has a big family, and
another one with low income, with a small family and etc. You already set specific requirements
for the members of the population that you are about to interview or collect data from.
All these sampling methods have their pros and cons. I summarize them in the table below:
In every survey, there will sure be some sources of bias. Obviously, when you are collecting
data from a population, you want it to be as accurate as possible, and thus should eliminate any
bias in the process of sampling. These biases will cause the survey or data collection to be very
inaccurate, and give a wrong picture of what the population really is. Examples of sources of bias
are:
1. lack of good sampling frame
It’s like using a list of friends generated from your Twitter account. You will miss out those
friends who don’t use Twitter. You need a good sampling frame in order that everyone has an
equal chance of being sampled.
2. wrong choice of sampling unit
In surveying on who has a car at home, you chose the wrong sampling unit ‘people’, since a
better sampling unit would be ‘household’, since children don’t drive.
3. no response by some chosen units
Some people just choose to answer your survey questions for God-knows-what reason. Then,
your questionnaire might have some questions in which they don’t have much choice to answer
with. For example, they don’t respond the question “do you like Subway Sandwiches? Yes / No”
when they don’t even know that such outlet exist.
4. introduced by the person conducting the survey
The person conducting the survey might already have a conclusion in mind, and tries to make his
survey results to suit his mindset. For example, on the question “Which party will do a better job
in the next General Elections?” If the surveyor is a Pakatan Rakyat supporter, he might influence
the person taking the survey to agree with his stand.
SIMULATING RANDOM SAMPLES
There are many ways to get random samples, just like what we did above. We used a random
number table, or using the random number generator from the calculator. But now, we want to
simulate random samples from a given distribution. There are 2 kinds of distributions that we
can obtain a simulated random sample:
1. Frequency Distribution
A frequency distribution looks something like this:
It has a value x and a frequency. Let’s say, I would like to generate a sample of size 6 from this
population. For data like this, we could not just simply use a calculator to randomly get the
numbers 1 to 4 as our sample. It has a frequency, or rather a weightage of how we should
randomly choose the numbers. So what we can do is we can tabulate a table, making use of its
cumulative frequency.
Using this table, we can finally tabulate the random sample. For example, now that we have a
random number as 04938581365399, so we can get the numbers 4, 93, 85, 81, 36, 53, which
corresponds to the values of x being 1, 4, 3, 3, 2, 3 respectively. We have finally got our random
sample from the frequency distribution.
2. Probability Distribution
The method is the same as the above, we create a cumulative frequency, and change the base to
be over 1, then use the generated random numbers to find the random samples. There are a few
kinds of probability distributions:
probability distribution
This one is not hard. We find the cumulative frequency, then
Binomial distribution X ~ B (n, p)
Hope you still remember the formula, P(X = x) = nCxpxqn-x. For example, we take
X ~ B (3, 0.4), then we have
Poisson distribution X ~ P0 (λ)
The formula is
We tabulate the table for X ~ P0 (4)
Probability density function
It can be something like
We should find its cumulative density function,
From here, we let the random generated number 0 ≤ x ≤ 1 equal to that function, and find x
inversely.
Normal distribution X ~ N (μ, σ2)
Making reference to the formula
We let the random generated number 0 ≤ x ≤ 1 equal to the cumulative probability of the normal
distribution. Then by using normal tables (or your calculator), you can find z, and therefore x.
13.2 Sampling Distributions (sample proportion & mean, central limit theorem)
When we are in the process of finding sample means, or standard deviations, we might also want
to know how the data are distributed. So following the few distributions that we have learnt,
being Binomial, Poisson and Normal, we are learning a new one here: The Sampling
Distribution of means.
SAMPLING DISTRIBUTIONS OF A SAMPLE MEAN
Before we start, we need to recall some information on expectation algebra. We remember that
in a population, the expected value E(X) is actually the mean itself, μ, while the expected
variance Var(X) is the variance of the population itself, σ2. So now, we are going to find the
expected value of a sample mean, E(X̅).
We all know that the mean of a sample of size n can be represented by the equation
where x1, x2 and etc are independent observations in the populations. So we further find that the
expected value of sample mean is
which is actually the same value as the population mean. What this means is that the sample
mean estimated should have the same value of the population mean. We will then find that the
sample variance has a different value from the population variance. Using the fact that
we find the sample variance to be
So the standard deviation of the sampling distribution is
which we call as the standard error of the mean. However, remember that this standard error is
for samples with replacement. For samples without replacement, the variance would be
Where N is the size of the finite population, and n being the sample size. I do not know how to
derive this, and I don’t think it will appear in exams. I put it here for your reference.
So now, for every time when we have a normal distribution X ~ N(μ, σ2), we have a sampling
distribution of
Consider the distribution X ~ N(100, 64)
and consider the following:
Notice that the sample size affects the sampling distribution. So now to answer questions, unlike
Maths T, you have to be very particular as in whether it is talking about a population or a sample.
Let me give you an example:
The volume of wine in bottles are normally distributed with a mean of 758ml and a standard
deviation of 12ml. A random sample of 10 bottles is taken and the mean volume found.
Calculate the probability that the sample mean is less than 750ml.
Let X be the volume of wine in bottles.
X ~ N(758, 122)
Since X is normally distributed, then the sampling distribution with n = 10,
X̅ ~ N(758, 122 / 10)
X̅ ~ N(758, 14.4)
P(X̅ < 750) = P(Z < –2.108)
= 0.0175
I assume that you have fully studied the chapters Discrete Probability Distributions &
Continuous Probability Distributions in Maths T. So now you know the difference between
samples and populations, the final answer will be different if you used the wrong distribution.
We were assuming that the sample was taken from a population which follows the normal
distribution. So what if it isn’t? Maybe, the sample was taken from a Binomial, Poisson or even a
Uniform distribution?
Let’s do a little experiment. Suppose you have an unfair coin, such that every time you toss it, it
has 25% chance of getting a head. So if you toss it 10 time, you get a binomial distribution, X ~
B(10, 0.25). We plot the probability graph below. The red bars are the Binomial plots, while the
blue line is the normal approximation.
So now, we do the sampling distribution of X̅. That means, we do the experiment various times,
get different means, and tabulate them as a distribution. If we do it 30 times (sample size of 30)
we get a graph like below:
then 50 times, we get
It gets closer to a normal distribution, doesn’t it?
Now we try a Poisson distribution, probably the average amount of monkeys seen along the road
everyday is 4, then X ~ P0(4). So the probability of seeing n monkeys a day can be tabulated as
follows:
Again, we get into serious investigation to see how many monkeys appear everyday, and we get
the means for 30 times, and we find the sampling distribution of X̅ to be as follows:
Once again it is close to the normal blue curve. Remember that the y-axis stands for probability.
So this sampling distribution simply tells us “the probability of the mean monkeys seen on the
road daily, with a sample size of n”.
We try now for a uniform distribution. A uniform distribution X ~ R(a, b) means that X is
uniformly distributed with a range of a ≤ x ≤ b. It has the following expectation and variance:
Assume X ~ R(0, 27), representing the probability of getting a number between 0 to 27 in a
lucky draw to be equal. We can plot its distribution as
then again, we find the sampling distribution of X̅. We do 30 sample, and we find that actually, it
looks like a normal distribution!
All these graphs are done with this applet. So after doing all these, we find that the sampling
distribution taken from distributions not normally distributed, the sampling distribution takes
the normal shape as the size increase. In other words, for large sample size n, it is approximately
normal. And here, we introduce the central limit theorem:
When samples are taken from a non-normal population with known variance σ2 then
for large values of n, the distribution x̅ is approximately normal such that
In statistics, we define a large sample to be n ≥ 30. You will be using this convention for the rest
of the chapters. Let me show you an example of the use of central limit theorem:
The average number of telephone calls made in an evening to a counselling service is 4.5
calls. 30 random observations are taken, and find the probability that the sample mean
exceeds 5.
X ~ P0(4.5)
Since n ≥ 30, by central limit theorem, X̅ is approximately normal, so
X̅ ~ N(4.5, 0.15)
P(X̅ > 5) = P [Z > (5 – 4.5) / √0.15] = P (Z > 1.291) = 0.098
SAMPLING DISTRIBUTIONS OF SAMPLE PROPORTIONS
Suppose a random sample of n observations is taken from a population in which the proportion
of successes is p and the proportion of failures is q = 1 – p.
If X is the number of successes in the sample, then X follows a binomial distribution,
X ~ B(n, p). You should recall that E(X) = np and Var(X) = npq. Using the same method how
we find the expectation of sample mean X̅, now we use it find the expectation of the sample
proportion Ps .
We know that
So finding E(Ps) and Var(Ps), we get
This in turns give us the distribution of sample proportion,
and we define the term
as the standard error of proportion.
When using a distribution of sample proportions, we need to put continuity corrections into
account (try recalling what you learned in Maths T). For this case, the continuity correction is
I’ll show you an example:
It is known that 3% of frozen pies delivered to a canteen are broken. What is the
probability that, on a morning when 500 pies are delivered, 5% or more are broken?
Let p be the probability that a pie is broken, p = 0.03.
Let Ps be the proportion of pies in the sample that are broken.
q = 0.97, n =500, we have
Ps = N(0.03, 0.0000582)
P(Ps ≥ 0.05) = P(Ps ≥ 0.05 – 0.001) [continuity correction, as calculated]
= P(Ps > 0.049) = P(Z > 2.491) = 0.0064
if you could have noticed, there is another way of solving this solution, just by using Binomial
Distribution alone.
Let X be the number of broken pies in the sample.
X ~ B(500, 0.03)
Since n ≥ 30, np, nq > 5, it is approximately normal.
X ~ N(np, npq)
X ~ N(15, 14.55)
500 × 5% = 25
P(X ≥ 25) = P( X > 24.5) = P(Z > 2.491) = 0.0064
If I were you, I would choose to do the second method. However, in exam questions, if you were
asked to find the proportion, then you better do the first method to avoid deduction of marks.
Note that in either cases, this sample of proportion can only be used for large sample size n.
13.3 Point Estimates (unbiased estimates, t-distribution, standard error)
To define a certain distribution, be it Binomial, Poisson or Normal, you need to know their
population parameters. And of course, if you don’t know the parameters before hand, you would
want to use sampling to estimate it. This estimate is unbiased if the average (or expectation) of a
large number of values taken in the same way is the true value of the parameter. The best way to
estimate these parameters is by using one with the smallest variance.
So here in this section, we are focusing on point estimates. We estimate that the parameters are
those points or data that we collected through the samples. Look at the 3 equations below.
We denote an unbiased estimate with a cap. So the unbiased estimate of the proportion of
success in the population, the population mean and population variance are denoted by p̂, μ̂ and
σ̂2 respectively. It is found that, the best unbiased estimate for the population proportion and
population mean, are the sample proportion ps and the sample mean x̅ themselves. However, the
best unbiased estimate for the population variance is denoted differently, with the above formula.
The formula for the expected variance can also have the following forms:
That is all you need to know about this section. Let me give you a short example:
The concentrations, in milligrams per litre, of a trace element in 7 randomly chosen samples
of water from a spring were
240.8, 237.3, 236.7, 236.6, 234.2, 233.9, 232.5
Determine the unbiased estimates of the mean and the variance of the concentration of the
trace element per little of water from the spring.
To answer this question, we need to make use of our calculator. Set your CASIO 570MS to SD
mode, and input all the data into it, by pressing the individual numbers in, every time followed
by the DT button, until you finished inputting everything. Next, you press
shift+S-VAR. It gives you the option of x̅, xσn and xσn-1. The first one gives you the unbiased
estimate of the mean, while the last one will give you the unbiased estimate of the standard
deviation. Just show them a little working even though you know the answers straight away:
13.4 Interval Estimates (confidence intervals, large & small samples, sample size)
Point estimates might not be accurate. There is always a possibility that the unbiased estimate of
the population mean is far away from the actual mean. Another way of finding this value is to
construct an interval, known as a confidence interval. This confidence interval tells us that there
is a certain probability that the unbiased estimated mean will lie within it. We usually write this
interval in terms of (a, b), where the terms a and b are the confidence limits, or end-values of
the interval. Consider a normal curve:
We define a confidence interval in terms of percentage. For example, a 95% confidence interval,
like the one above means that there is 95% probability that the population mean lies in the
interval. Here we shall learn how to construct a confidence interval for a population proportion
and a population mean.
CONFIDENCE INTERVAL FOR POPULATION PROPORTION
Here you want to find p, the proportion of successes in a particular population. You take a
sample of size n, and then find the best unbiased estimate p̂. You need to recall quite a lot of
information from the last 2 parts, putting in mind that when we are dealing with population
proportions, whether it comes from a normal or non-normal distribution, it must be done with a
large sample (n ≥ 30). Recalling that the sampling distribution of population proportion is
The confidence limit will be
and the confidence interval will be
Okay, I need to explain this a little. If you would have observed closely, the confidence interval
is constructed by the unbiased estimate of population proportion, ± the standard error. The term a
determines the percentage of interval you wanted. This value a, can be obtained from the normal
tables (or the Buku Sifir given in STPM). It looks something like this:
I’ll teach you how to read this table, in the example below:
In order to assess the probability of a successful outcome, an experiment was performed 200
times. The number of successful outcomes was 72. Find a 95% confidence interval for p, the
population proportion of success.
We start by listing down the important values: ps, qs and n, and the distribution.
ps = 72 ÷ 200 = 0.36, qs = 0.64, n = 200
Ps ~ N (0.36, 0.001152)
To find a, we refer to the table. Note that the table was written for lower tail probability
P (Z ≤ a), but we are looking for P ( –a ≤ Z ≤ a). So a central 95% of the distribution, should
have an upper and lower tail of 2.5%. This table might help to explain a little:
The diagram on the left shows the lower tail probability, which is what the table in your Buku
Sifir gives. We want to find the one on the right, in which by looking at the position of the red
lines, you know that definitely are different. So here, the value of a comes from the column
0.975, which is 1.960. So your confidence interval shall be
( ps – 1.96√0.001152, ps + 1.96√0.001152 ) = (0.622, 0.738)
You might have probably noticed that the continuity correction is omitted. Yes, this is indeed the
case. You need to get used to reading the table to prevent yourself from using the wrong value of
a. A 90% interval means that it has a lower tail probability of 95%, a 80% interval means that it
has a lower tail probability of 90% and etc. To make things faster, I suggest you memorize the 4
most common percentage intervals:
90% confidence level 1.645
95% confidence level 1.960
98% confidence level 2.326
99% confidence level 2.576
CONFIDENCE INTERVALS FOR POPULATION MEAN
This section is not so straight forward. Although it shares a lot of similarities with the part above,
the construction of confidence intervals for population mean depends on the variance (known
or unknown), the distribution (normal or non-normal) and its sample size. So in this section,
there are 5 cases:
1. Normal with known variance σ2
The sampling distribution will be
using the best unbiased estimate of population mean x̅ = μ, the confidence interval is
2. Non-normal with known variance σ2 (n ≥ 30)
In this case, the sample may be taken from a Binomial or Poisson distribution. Since the sample
size is large, according to the central limit theorem, we approximate a normal distribution.
X ~ B(n, p) becomes X ~ N(np, npq)
X ~ P0(λ) becomes X ~ N(λ, λ)
X ~ R(a, b) becomes X ~ N( ½ (a + b), 1/12 (b – a)2 )
and etc. From here, after finding the sampling distribution X̅, again using the best unbiased
estimate of population mean x̅ = μ, the confidence interval is the same as above,
3. Normal with unknown variance σ2 (n ≥ 30)
The method of solving this is just the same as method 1, but here we do not know the population
variance. Using the unbiased estimate of population mean x̅ = μ, and the unbiased estimate of
population variance,
Our confidence interval will be
or we could rewrite it in terms of s,
4. Non-normal with unknown variance σ2 (n ≥ 30)
Similar to method 2, we approximate a normal distribution, and after finding the sampling
distribution X̅, we use the unbiased estimates μ̂ and σ̂, we use the same equation for confidence
interval as the method 3,
5. Normal with unknown variance σ2 (n < 30)
It is interesting to note that when the sample size is small, the sampling distribution
does not follow a normal distribution. Instead, it follows a t-distribution.
The distribution of T is a member of t-distributions. All t-distributions are symmetric about zero
and have single parameter ν (pronouced ‘new’) which is a positive integer. ν is known as the
number of degrees of freedom of the distribution and if, for example, T has a t-distribution
with 5 degrees of freedom, you would write T ~ t(5). For a sample size n, it can be shown that T
follows a t-distribution with (n – 1) degrees of freedom. Take a look at the t-distribution curves
below.
Notice that we only use the t-distribution when the sample size is small, and therefore, when t
tends to infinity, it will look like a normal curve. In other words, nothing much has changed, we
are just using a new distribution for small sample size. After knowing that our sample size is
small, we use the t-distribution using (n – 1) degrees of freedom, use the unbiased estimates for
both the mean and the variance, and our new formula will be
where t can be obtained from the t-distribution tables. It looks something like this. The way
you use it is exactly the same as the critical values for the normal distribution, its just that there is
a column of degrees of freedom.
HOW TO SOLVE EXAM QUESTIONS
It isn’t hard. All you need to do is to identify the quantities stated in the question, and you’ll
classify whether you should solve the question using which one of the 5 methods. I’ll put here a
few example of questions, and show you how to analyse them:
A plant produces steel sheets whose weights are known to be normally distributed with a
standard deviation of 2.4kg. A random sample of 36 sheets had a mean weight of 31.4kg. Find
the 99% confidence interval for the population mean.
It is normally distributed, variance = 2.42kg (known), sample size = 36, sample mean = 31.4kg.
Use method 1.
The heights of men in a particular district are distributed with mean μ cm and standard
deviation σ cm. On the basis results obtained from a random sample of 100 men from the
district the 95% confidence interval for μ was calculated and found to be (177.22cm,
179.18cm). Find the value of the σ and x̅.
Unknown distribution, variance known, but sample size large. Approximate normal, method 2.
You need to work backwards using the confidence interval formulas, get 2 simultaneous
equations, and solve for σ and x̅. Give it a try.
The fuel consumption of a new model of car is being tested. In one trial, 50 cars chosen at
random, were driven under identical conditions and the distances, x km, covered on 1 litre of
petrol were recorded. The results gave the following totals:
Σx = 525, Σx2 = 5625
Calculate a 95% confidence interval for the mean petrol consumption, in km/l, of cars of this
type.
Unknown distribution, variance unknown, big sample. Approximate normal, use unbiased
estimate of population variance (you have to calculate it this time), use method 4.
A sample of 8 independent observations of a normally distributed variable gave the following
values: 3.6, 3.9, 4.5, 3.8, 4.4, 4.9, 4.2, 3.8. Determine a 99% confidence interval for the
population mean μ.
Normal distribution, unknown variance, and small sample. Method 5. In your question, you need
to write these sentences very clearly:
since n < 30, a t(n-1) distribution is used. T ~ t(7)
Then you continue to find the confidence intervals.
Not hard, isn’t it?
Here are a few short notes you might want to take note as well:
1. interval width
The width of a confidence interval can be obtained from the expression
Also remember, when the width is increased, then either
a. the sample size n increases,
b. the confidence interval decreases, or
c. the variance decreases.
2. Assumption
Many times you might be asked, “state the assumptions you made”. You probably only have one
assumption, which is: we assume that it is a random sample.
To summarize this section, I made a chart for you to remember things easier.
BAB 14
14. Hypothesis Testing
14.1 Hypotheses (null & alternative hypotheses, test statistic, significance level)
Let’s imagine this story.
One day in town, you met this awkward looking Mathematics tuition teacher. He brags that 95%
of his pupils get A’s for their Mathematics T in STPM every year. Since you love Mathematics
so much, you thought that maybe you might want to take his tuition class. But being sceptical in
nature, you were wondering whether 95% of his students getting A’s, is a little too much. So you
decided that you want to put this teacher to a test. You managed to get some information from 15
of his ex-students, and find out that 11 of them got A for Maths T in the previous year.
Now your question is: is the Maths tuition teacher’s claim, a little bit overboard? Is 11 out of 15,
95%? Obviously it isn’t, but since you are only taking a sample, you can’t be sure that you are
right. What if there were 13 or 14 students got A’s? You know that if 2 or 3 students got A’s, he
is definitely lying. Then how about 10 students? 8 or 9 students? There must be a cut off point,
such that you are VERY SURE that he is lying, or not. Isn’t it?
Or let’s think of another story. Suppose you are an athlete, participating in the MSSM 400m
race. You find that every time, your running speed follows a normal distribution with a mean of
40km/h. Bored of running everyday, you decided to test whether drinking 2 cups of milk in the
morning everyday helps improve your running. So after drinking milk for 5 days, you find your
mean speed turned out to be 40.9km/h.
Again you question yourself: did you really “improved”? Well, it might so happen that you run a
little faster this time, and has nothing to do with the milk. You might also be wondering, how
much increase in speed is considered as ‘improve’? You need a cut off point, again.
NULL AND ALTERNATIVE HYPOTHESIS
If you didn’t notice, you were actually making hypotheses, or a significance test. You were
trying to test a hypothesis, to determine whether you can conclude something. You were testing
whether the 95% students get A’s and the ‘improvement’ in running is true. The initial
assumption is what we called as a null hypothesis, H0. It is very important as it provides the
model for the calculations. The null hypothesis for the first case is “95% of the students get A’s
for Maths T”. If your results show that indeed 95% of the students get A’s in Maths T, then your
hypothesis is true. The case is this: you can’t reject his claim if you don’t have enough evidence
to do so. If after your test, you have enough evidence to reject his claim, then you need an
alternative hypothesis, H1. The alternative hypothesis for this case is “less than 95% of the
students get A’s in Maths T”. This is a binomial problem, so in Mathematical terms, we have
H0: p = 0.95
H1: p < 0.95
Notice that you are only interested in whether the probability is less than 95% or not, so this
means that we are interested in the left hand end of the distribution. This is known as the lower
tail. In the second case, we are interested in the upper tail, as in whether you have improved or
not. There are cases that you want to know whether there is change in the values, e.g. whether
there is a change in supporters for Barisan Nasional, Pakatan Rakyat or etc. For this case, we use
a two-tailed test.
14.2 Critical Regions
You previously learnt how to formulate a null and alternate hypothesis, and determine your test
statistic and test value. With these information is still not enough. We shall now proceed to
setting the significance level, and determining the critical region.
When making a hypothesis test, you have to make a decision about the significance level, which
is the value of the probability that is considered to imply an unlikely or rare event. As a guide,
events that have a probability of 5% or less are regarded as unlikely and events having a
probability of 1% or less are regarded as very unlikely. Other significance level used are 10%
and 2% respectively. Try not to confuse this with what you learned in the previous chapter,
which was confidence intervals, in terms of 90+%.
Let’s say, in a test of 10 true-false questions which were written in Hindi, your friend got 6
questions correct, and you want to know whether he was guessing, or he really studied Hindi.
You formulate the hypotheses as below:
H0: Your friend is guessing. He makes use of the 50% luck.
H1: Your friend seriously studied Hindi before. He scores more than 50%.
Mathematically, this is a binomial problem again, X ~ B(10, 0.50).
H0: p = 0.50
H1: p > 0.50
Notice that the expression for H0 always has an ‘=' sign, while H1 should have either <, > or ≠
signs. To start our test, we need to define our significance level. We can say, for example, that
we want to test at the 5% level, that he could have obtained this score by guessing all the
answers. We can also choose to test at 1% level or 10% level, and obviously, you might get
different results.
So from here you can see that in the last section, you can’t get any answer if you don’t set a
significance level. You can’t say how much you have improved in your running, unless you state
that “an increase in 5% is significant”, or “if I run faster by 10%, then there is significant
improvement”. With this significant level, then only our hypothesis could be done. For the
example above, say, we want to test it at 5% level. We first need to find out the probability of
how many questions he get correct. We plot a cumulative binomial distribution
X ~ B(10, 0.50).
Notice that it is a decreasing cumulative Binomial plot.
This curve tells us the probability he gets ≥ n questions correct. So we see that, there is 99.9%
probability that he gets at least one question correct, and 62.3% probability that he gets at least 5
questions correct etc. Even if your friend gets 8 questions correct, there is 5.5% probability that
he is guessing, which is still above our required significance level. So here, if he gets 9 questions
correct, it must be really a rare event, as he has only 1.1% probability of getting this score if he
was guessing. We say that the numbers 9 and 10 lie in the critical region, which is the group of
observations that are considered to be unusual or unlikely (rare) events. We also say that number
9 is the critical value, or cut-off point, since anything above it is considered a rare event.
So what can we conclude from here? We can see that if your friend got 0 to 8 questions correct,
we have no evidence, saying that he did studied Hindi, as these are not rare events (they are >
5% probability). We say that the null hypothesis H0 is not rejected, which is the case. But if he
gets 9 or 10 questions correct, we say that there is evidence, at 5% significance level, that your
friend did study Hindi. In other words, the null hypothesis H0 is rejected in favour of the
alternative hypothesis H1.
Notice that if we did a 10% significance level test, number 8 now lies in the critical region! So
this is actually very subjective, and it really depends on you (or the question in your test paper)
to determine what is considered significant and what is not.
Let me sum up what you understand about hypothesis testing by now:
1. To test something, you need to first define your null hypothesis H0, something that is
claimed, or happening.
2. Then you define your alternative hypothesis H1 just in case H0 is not true.
3. Find your test statistic, test value.
4. Try to identify what kind of distribution it is from.
5. Determine a significance level to reject or accept the claim.
6. Plot or use a given cumulative distribution to find the critical regions.
7. Determine whether the test value lies in the critical region. If yes, then H0 is rejected. If not,
H0 is accepted.
14.3 Tests of Significance (population proportion & mean, Type I & Type II errors)
A Hypothesis Test is a Test of Significance. In this section, we will be looking at all the
possible types of hypothesis tests that can be made in STPM. Before we start, every hypothesis
test follow a general rule. You need to state these 7 steps (or workings) in your answer sheet:
1. Define the variable X.
Let X be …,
X ~ B(n, p) / X ~ N(μ, σ2) / X ~ P0(λ)
2. Define H0 and H1.
H0: p / λ / μ / μ1 – μ2 = ?
H1: p / λ / μ / μ1 – μ2 <, >, ≠ ?
3. Write down the case if H0 is true.
If H0 is true, then p / λ / μ / μ1 – μ2 = ?
and X ~ B(n, ?) / X ~ N(?, σ2) / X ~ P0(?)
4. Define your type of test and significance level.
Use a upper / lower / two tailed test, at ?% level.
5. Set the criteria to reject H0.
Reject H0 if P(X ≥ x) < ? / P(X ≤ x) < ? / z < ? / z > ? / |z| > ? / T < ?
6. Do the calculations.
P(X ≥ ?) = ? / P(X ≤ ?) = ? / z = ? / T = ?
7. Conclude your results.
Since P(X ≥ x) = ? / P(X ≤ x) = ? / Z = ? / T = ?, x lies / doesn’t lie in the critical region.
H0 is rejected in favour of H1 / not rejected. We conclude that ………. at ?% level.
If you have all these 7 steps on your answer sheets, then you will probably get 90% percent of
the marks. Don’t make calculation mistakes though.
TYPES OF SIGNIFICANT TESTS
In this part, there are 12 kinds of significant tests that you might face, be it lower tail, two-tailed
or upper tail tests. I will go through this section with an example for each one. Questions are in
blue, answers are in red:
1. Binomial Proportion p (n < 30)
A certain type of seed has a germination rate of 70%. The seeds undergo a new treatment after
which 9 germinates in a packet of 10 seeds. Test, at the 5% level, whether this is evidence of an
increase in the germination rate.
Let X be the germination rate of a certain type of seed, X ~ B(10, p)
H0: p = 0.7 [the germination rate is 70%]
H1: p > 0.7 [the germination rate increases]
If H0 is true, then p = 0.7, and X ~ B(10, 0.7)
Use an upper tail test, at 5% level.
Reject H0 if P(X ≥ x) < 0.05 [0.05 stands for 5%]
P(X ≥ 9) = P(X = 9) + P(X = 10) = 0.1211 + 0.0282 = 0.1493 = 14.93%
Since P(X ≥ 9) = 14.93%, x doesn’t lie in the critical region.
H0 is not rejected.
We conclude that there is no evidence that there is an increase in germination rate, at 5% level.
A binomial proportion with small sample isn’t hard. The thing that bothers you might probably
be the calculations of P(X ≥ 9). Remember the formula for Binomial distribution, nCxpxqn-x.
2. Binomial Proportion p (n ≥ 30)
For this case, an approximation to the normal distribution is used. Remember the continuity
correction is used, such that it lies in the critical region.
A manufacturer claims that 8 out of 10 dogs prefer its brand of dog food to any others. In a
random sample of 120 dogs, it was found that 85 appeared to prefer that brand. Test, at the 5%
level whether you would accept the manufacturer’s claim.
Let X be the number of dogs which prefer the manufacturer’s brand of dog food,
X ~ B(120, p)
H0: p = 0.8
H1: p ≠ 0.8 [notice that we are using the ≠ sign. This is because we are testing whether the claim
is exactly correct. That means, the claim is wrong if more than 8 dogs like the brand, and also if
less than 8 dogs like the brand.]
If H0 is true, then p = 0.8 and X ~ B(120, 0.8)
Since np > 5, nq > 5, then X is approximately normal,
X ~ N(np, npq), which is X ~ N(96, 19.2).
Use a two-tailed test, at 5% level.
Reject H0 if |z| > 1.960 [Still remember how to get this value 1.960? Remember that a two-tailed
test at 5% means that both ends of the bell curve has 2.5% each. Refer to the critical values for
the normal distribution at the end of this post.]
[85.5, continuity correction, such that it lies in the critical region, that means you correct it such
that the value is nearer to the critical region.]
Since z = –2.396, z lies in the critical region.
H0 is rejected in favour of H1. There is evidence that the proportion is lesser, and therefore the
manufacturer’s claim is not accepted, at 5% level.
3. Poisson Mean λ
The number of white corpuscles on a slide has a Poisson distribution with mean 3.5. After treat,
a sample was taken and the number of white corpuscles was found to be 8. Test at the 5% level
of significance, whether the number of white corpuscles has increased.
Let X be the number of white corpuscles on a slide, X ~ P0(λ).
H0: λ = 3.5
H1: λ > 3.5
If H0 is true, then λ = 3.5, and X ~ P0(3.5).
Use an upper tail test, at 5% level.
Reject H0 if P(X ≥ x) < 0.05.
P(X ≥ 8) = 1 – P(X < 7) = 1 – 0.9733 = 0.0267 = 2.7% [I hope you remember the Poisson
formula. In some formula booklets, there are Poisson cumulative probability tables, they help
too.]
Since P(X ≥ 8) = 2.7% < 5%, x lies in the critical region.
H0 is rejected in favour of H1. There is evidence, at 5% level that the number of white corpuscles
increased.
Not a hard one, I suppose. Remember that if λ > 5, you can actually make an approximation to
the Normal distribution, X ~ N(λ, λ2).
4. Population Mean μ (Normal, σ2 known)
A machine fills cans with soft drinks so that the volume of liquid in the cans follow a normal
distribution with mean 335ml and standard deviation of 3ml. A setting on the machine is altered,
following which the operator suspects that the mean volume of liquid discharged by the machine
into the cans has decreased. He takes a random sample of 50 cans and finds that the mean
volume of liquid in these cans is 334.6ml. Does this confirm his suspicion? Perform a
significance test at the 5% level and assume that the standard deviation remains unchanged.
Let X be the volume of liquid in the cans, X ~ N(μ, 32)
H0: μ = 335
H1: μ < 335
The sample size is 50, X̅ ~ N(μ, 32/50) [recall what you learned in the previous chapter]
If H0 is true, then μ = 335, and X̅ ~ N(335, 9/50)
Use a lower tail test, at 5% level.
Reject H0 if z < –1.645
Since z = –0.9428 > –1.645, z doesn’t lie in the critical region.
H0 is not rejected. There is no evidence to confirm the suspicion of the operator, at 5% level.
For hypothesis type 4 to 8, you might want to recall what you learn in the previous chapter.
Remember when to use t-distribution, when to approximate normal and etc. These few types
make use of the sampling distribution.
5. Population Mean μ (Non-normal, σ2 known)
I think I don’t need to show you an example on this one. It is similar to number 4. You make that
non-normal distribution (or sometimes unnamed, or unknown distribution) approximate normal,
and follow the exact same steps as type 4.
6. Population Mean μ (Normal, σ2 unknown, n ≥ 30)
When the variance is unknown, you make use of the best unbiased estimate of population
variance,
and the rest of the steps follows.
7. Population Mean μ (Non-normal, σ2 unknown, n ≥ 30)
Similar to type 6, you make use of the best unbiased estimate of population variance. The
following example illustrates both type 6 and 7:
A random sample of 75 11-year-olds performed a simple task and the time taken, x minutes,
noted for each. The results were summarized as follows:
Σx = 1215, Σx2 = 21708
Test, at the 1% level, whether there is evidence that the mean time taken to perform the task is
greater than 15 minutes.
Let X be the time taken to perform a simple task by the 11-year-olds.
H0: μ = 15
H1: μ > 15
The distribution is unknown. But since n = 75 is large, by the central limit theorem, X̅ is
approximately normally distributed, X̅ ~ N(μ, σ̂ 2/75), with variance unknown.
If H0 is true, then μ = 15, and X̅ ~ N(15, σ̂ 2/50)
Use an upper tail test, at 1% level.
Reject H0 if z > 2.326
Since z < 2.326, z doesn’t lie in the critical region.
H0 is not rejected. There is no evidence, at ?% level that the mean time is greater than 15
minutes.
8. Population Mean μ (Normal, σ2 unknown, n < 30)
You probably might have guessed correctly. You should use the t-distribution to do this kind of
significance test.
Family packs of bacon slices are sold in 1.5kg packs. A sample of 12 packs was selected at
random and their masses, measured in kilograms, noted. The following results were obtained: Σx
= 17.81, Σx2 = 26.4357
Assuming that the masses measured in kg packs follow a normal distribution with variance σ2
unknown, test at the 1% level whether the packs are underweight.
Let X be the mass of packs of bacon slices, X ~ N(μ, σ2)
H0: μ = 1.5
H1: μ < 1.5
Since σ2 is unknown, and n < 30, a t-distribution is used, T ~ t(n – 1)
If H0 is true, then μ = 1.5, T ~ t(11), where
Use a lower tail test, at 1% level.
Reject H0 if t < –2.718 [refer to the t-distribution tables]
x̅ = 1.484, so T = –3.506 < –2.718
t lies in the critical region.
H0 is rejected in favour of H1. There is evidence that the packs are underweight, at 1% level.
9. Difference between Means μ1 – μ2 (different variance σ12 & σ22 known)
This is something new. Type 9, 10 and 11 are only for 2 Normal populations, X1 and X2 with
unknown means μ1 and μ2. So it means that here, you have 2 samples, with the new test statistic
X̅1 – X̅2, and let us consider its sampling distribution. Doing some expectation algebra,
and so our sampling distribution of difference of means will be
and therefore, in standardised form,
Let’s try one example:
Due to differences in the environment, the masses of a certain species of small animal are
believed to be greater in Region A than in Region B. It is known that the masses in both regions
are normally distributed with masses in Region A having a standard deviation of 0.04kg and
masses in region B having a standard deviation of 0.09kg.
To test this theory, random samples are taken: 60 animals from Region A had a mean mass of
3.03kg and 50 animals from Region B had a mean mass of 3.00kg.
Does this provide evidence, at the 1% level that the animals of this species in Region A have a
greater mass than those in Region B?
Let X1 be the mass (kg) of an animal in Region A, with population mean μ1. X1 ~ N(μ1, 0.042)
Let X2 be the mass (kg) of an animal in Region B, with population mean μ2. X2 ~ N(μ2, 0.092)
H0: μ1 – μ2 = 0 [there is no difference in the masses between the regions]
H1: μ1 – μ2 > 0 [the animals in Region A have greater mass]
Consider the distribution of the difference between the means X̅1 – X̅2, with n1 = 60, n2 = 50.
If H0 is true, then μ1 – μ2 = 0 [there can be cases where it is not 0 too.]
Use an upper tail test, at 1% level.
Reject H0 if z > 2.326
z doesn’t lie in the critical region.
H0 is not rejected. There is no evidence, at the 1% level, that the animals in region A have a
greater mass than those in region B.
10. Difference between Means μ1 – μ2 (common σ2 known)
This one has not much difference from the one above. This means that the 2 populations have a
common variance, which doesn’t change in time. The distribution will then be represented by
and the test statistic,
By the way, you can also create confidence intervals for situations like this too. Try it out
yourself. There can be 2 tail, upper tail and lower tail tests as well.
11. Difference between Means μ1 – μ2 (common σ2 unknown)
I don’t know if the variances are different. But if both populations have a common unknown
variance, the unbiased estimate σ̂ 2, also known as a pooled two-sample estimate, has the
formula
where n1 and n2 are the sample sizes and s12 and s22 are the variances of the 2 samples
respectively. The distribution will be
and the test statistic,
This is, however, not always the case. When both the samples are small, we should use the tdistribution instead. The test statistic will now be
where T ~ t(n1 + n2 – 2). We should only use the t-distribution when n1 + n2 – 2 < 30.
A large group of sunflowers is growing in the shady side of a garden. A random sample of 36 of
sunflowers is measured. The sample mean height is found to be 2.86m, and the sample standard
deviation is found to be 0.60m. A second group of sunflowers is growing in the sunny side of the
garden. A random sample of 26 of these sample flowers is measured. The sample mean height is
found to be 3.29m and the sample standard deviation is found to be 0.9m. Treating the samples
as large samples from normal distribution having the same variance but possibly different
means, obtain a pooled estimate of the variance and test whether the results provide significant
evidence at the 5% level that the sunny-side flowers grow taller, on average, than the shady-side
sunflowers.
Let X1 be the height of sunflowers in the shady side, X1 ~ N(μ1, σ2)
Let X2 be the height of sunflowers in the sunny side, X2 ~ N(μ2, σ2)
where σ2 is unknown.
H0: μ1 – μ2 = 0
H1: μ1 – μ2 > 0
Consider the distribution of the difference between the means X̅1 – X̅2, with n1 = n2 = 36.
If H0 is true, then μ1 – μ2 = 0
and therefore
Use an upper tail test, at 5% level.
Reject H0 if z > 1.645
z lie in the critical region.
H0 is rejected in favour of H1. There is evidence, at the 5% level, that the sunny-side sunflowers
grow taller than the shady-side sunflowers.
When you perform a significance test, you tend to make errors. If H0 is correct and you accept it,
or if H0 is false and you reject it, then you’ve made a correct decision. However, there are 2
kinds of errors that you will made:
1. A Type I Error, which is made when you reject H0 when it is true
2. A Type II Error, which is made when you accept H0 when it is false.
Questions are usually interested to know the probability of making these errors. The first one is
easy, P(Type I error) = level of significance. For the type II error, things are not so straight
forward. A specific value of H1 is stated in order to find the probability of this error. I’ll show
you an example below:
A random observation is taken from a binomial distribution X ~ B(20, p) and used to test the null
hypothesis p = 0.8 against the alternative hypothesis p> 0.8. The significance level of the test is
7%. Find the probability of making a Type I error. Find also the probability of making a Type II
error if in fact p = 0.85.
The probability of making a Type I error is 7%. [same as the level of significance]
You make a Type II error if you accept H0 when p is the value specified in H1.
For Type II error,
H0: p = 0.80
H1: p = 0.85
P(X = 20) = 0.012 = 1.2%
P(X ≥ 19) = 0.069 = 6.9%
P(X ≥ 18) = 0.206 = 20.6%
So the critical region is X ≥ 19.
So P(Type II error) = P(accept H0 when H1 is true)
= P(X < 19 when p = 0.85)
= P(X < 19 when X ~ B(20, 0.85))
P(X ≤ 18) = 1 - P(X = 20) - P(X = 19) = 0.824 = 82.4% [Note that in this part of the calculations,
you are using p = 0.85, but not 0.80 as when you were finding the critical region above.]
∴ The probability of making a Type II error is 82.4%.
Let me summarize how you find the probability of a Type II error:
1. Define your new H1
2. Find the critical region
3. Find the probability of the new value in H1 that lies outside your found critical region.
By the way, the expression 1 – P(Type II error) is known as the Power of the Test.
BAB 15
15. χ2 Tests
15.1 χ2 Distribution
The χ2 Distribution (read as ‘kai-squared’, written as ‘chi-squared’) is just another new
distribution that we will be learning today. This distribution mainly helps us to see or analyse,
whether a particular population fits into a certain distribution (Binomial, Poisson, Normal etc).
For example, if you have a list of the height of the students in your school, you want to know
whether the list fits a normal distribution. You conduct a test of goodness-of-fit. Then probably
you also want to know whether the weather during the football match affects the results of the
same football team. You conduct a test of independence. These tests have the similar format of
how a hypothesis test is conducted, but before I go into it, let me introduce this distribution and
its attributes.
The χ2 Distribution has one parameter, ν, pronounced ‘new’, known as the number of degrees
of freedom. The shape of the distribution is different for different values of ν. Take a look at the
graph below:
As ν increases, the curve looks rounder, and tends to start from zero. The curve is positively
skewed for ν > 2, and when ν is large, the distribution is approximately normal. If we are using a
chi-squared distribution with 5 degrees of freedom, we say that we are using a χ2(5) distribution,
or X2 ~ χ2 (5).
The table for the critical values of the χ2 distribution looks something like this:
Unlike hypothesis testing and confidence intervals, which you might look for 2 tailed, upper
tail or lower tail tests, you are only required to look for the upper tail critical value for the χ2
distribution. However, the similarity with hypothesis testing is the significance levels, which are
5%, 10%, 1% or so.
The critical region, as before, is the upper blue region shown in the graph, and the boundary of
the critical region is called the critical value. For a 5% level of significance, the critical value is
written as χ25% (ν), where the value depends on ν. So looking at the graph, χ25% (5) has the value
of 11.070.
Before I introduce to you the test statistic, I’ll give you an illustration on how they come about.
Suppose that I give you a table below.
I told you that these were the random numbers generated by a calculator. However, you doubt
whether it is truly random. When we say that the values are random, it means they have equal
probability of appearing. So this means, the frequency should follow a uniform distribution, with
every digit appearing with a frequency of 10 each.
So the observed frequency O is your experimental results, while the expected frequency E is
what you think it should be. So your test statistic X2 is
Let’s try to understand this expression. The term (O - E)2 will become very large, if the expected
and observed frequency are very far apart (like for the digit 0). Dividing with E gives us the
percentage difference. So this tells us that, if X2 is very big, then you know that it is definitely
not the correct distribution (in this case, it is not uniformly distributed). But if X2 is close to 0,
then we can say that the data definitely follows that distribution.
This formula for X2 is correct for any value of ν, except when ν = 1. We need to use the Yates
Correction,
Now you might be wondering how do we determine the value of ν. ν can be found through the
formula
ν = number of classes – number of restrictions
The number of classes is the number of columns you have in your table. For the one above, it has
10 classes. The number of restrictions depends on whether the mean or variance is known and
whether the sum of observation frequency is known. In a uniform distribution, there is only 1
restriction, that is ΣO = 100. We will get into the details in the next post.
15.2 Tests for Goodness of Fit
A χ2 Goodness-of-Fit test is used when you have some practical data and you want to know how
well a particular statistical distribution, such as a binomial or a normal, models that data. The
null hypothesis H0 is that the particular distribution does provide a model or the data; the
alternative hypothesis H1 is that it doesn’t.
Just like Hypothesis Tests, Goodness-of-fit Tests also follow a general guideline. You need to
write all these 6 steps in your answer sheet:
1. State the null & alternate hypothesis
H0: x is uniformly, B, P0, N distributed / distributed in a ratio of ?
H1: x is not distributed this way
2. Calculate the expected frequency E in the table
3. State the degree of freedom
There are ? classes and ? restrictions
Consider a χ2 (n – ?) distribution
4. State the significance level
Perform at ?% level
From the tables, χ2 (?%) (ν) = ?, so reject H0 if X2 > ?
5. Calculate X2 using the tables
6. Make your conclusion
Since X2 > / < ?, H0 is rejected in favour of H1 / not rejected. There is evidence, at ?% level,
that __________ .
Now we shall proceed to learn how to solve 5 kinds of χ2 tests through examples. Questions are
in blue and answers are in red:
1. Uniform Distribution (Random)
A tetrahedral die is thrown 120 times and the number which on it lands is noted.
Test at the 5% level whether the die is fair.
H0: The die is fair [I can also write, “the die follows a uniform distribution”. But this is better.]
H1: The die is not fair
There are 4 classes and 1 restriction (ΣE = 120) [Remember that ΣE = ? is always one of the
restrictions]
Consider a χ2 (3) distribution, perform at 5% level.
From the tables, χ2 (5%) (ν) = 7.815, so reject H0 if X2 > 7.815.
Since X2 < 7.815, H0 is not rejected. There is evidence, at 5% level, that the die is fair.
2. Distributed in Given Ratio
The outcomes A, B & C of a certain experiment are thought to occur in the ratio 1 : 2 : 1. The
experiment is performed 200 times and the observed frequencies of A, B & C are 36, 115 & 49
respectively. Is the difference in the observed and expected results significant? Test at the 5%
level.
H0: The outcomes A, B & C are in the ratio 1 : 2: 1
H1: The outcomes A, B & C are not in the ratio 1 : 2: 1
There are 3 classes and 1 restrictions (ΣE = 200)
Consider a χ2 (2) distribution, perform at 5% level.
From the tables, χ2 (5%) (ν) = 5.991, so reject H0 if X2 > 5.991
[To save time, you could just construct one table instead of 2. You find the E and the test statistic
in one table.]
Since X2 > 5.991, H0 is rejected in favour of H1. The difference in the observed & expected
results are significant, at 5% level.
3. Binomial Distribution
Nothing much is different from this with the above two, just that you need more vigorous
calculations to find your E. Once again, remember your binomial and Poisson formula, and
combine expected frequencies less than 5. You do that because the error will be reduced, and
of course, a different degree of freedom will be used.
Perform a χ2 test to investigate whether the following is drawn from a binomial distribution with
p =0.3. Use a 5% level of significance.
H0: X ~ B(5, 0.3) [Writing the short form is good enough.]
H1: X is not distributed this way.
The expected frequency for a Binomial distribution,
E = P(X = x) × 100 = 5Cx0.3x0.75-x × 100
where ΣO = 100. We tabulate the table below:
Since the expected frequency of x = 4, 5 are < 5, the last 3 classes are combined. [please take
note of this piece of information.]
There are now 4 classes and 1 restrictions (ΣE = 100)
Consider a χ2 (3) distribution, perform at 5% level.
From the tables, χ2 (5%) (3) = 7.815, so reject H0 if X2 > 7.815
Since X2 < 7.815, H0 is not rejected. ∴ X is binomially distributed.
Notice that the number of restriction can increase, if the population proportion is not known.
You use x̅ = np to find the value of p. For example, a random sample of size 50 is taken, and you
are given this table
You don’t know the mean, but you know that
You can find the value of p by using the equation x̅ = np, where n = 50. That will make the
question having 2 restrictions, and your degree of freedom n – 2.
4. Poisson Distribution
This one is very similar to the Binomial one. If the Poisson population mean λ is unknown, the
number of restriction will add 1, and you use the sample mean x̅ = λ. Just take a look at the
example.
A local council has records of the number of children and the number of households in its area.
It is therefore known that the average number of children per household is 1.4 It’s suggested that
the number of children per household can be modelled by a Poisson distribution with parameter
1.40. In order to test this, a random sample of 1000 households is taken, giving the following
data.
Carry out a χ2 test, at the 5% level of significance, to determine whether or not the proposed
model should be accepted.
Let X be the number of children per household.
[notice that in this case, I define X properly. You should do it when you know what is X.]
H0: X ~ P0(1.4)
H1: X is not distributed this way.
There are 6 classes and 1 restrictions (ΣE = 1000).
Consider a χ2 (5) distribution, perform at 5% level.
From the tables, χ2 (5%) (5) = 11.070, so reject H0 if X2 > 11.070.
I suppose you can related that
Since X2 > 11.070 , H0 is rejected in favour of H1. The proposed model shouldn’t be accepted, X
doesn’t follow a Poisson distribution.
5. Normal Distribution
As for normal distribution, it is either you know both the population mean μ and population
variance σ2, or you don’t know both μ and σ2. In this case, you either have degrees of freedom n
–1, or n – 3. See the example below:
The following data gives the heights in cm of 100 male students.
Find the expected frequencies of a normal distribution having the same mean and variance as
the data given, and test the goodness of fit, using a 5% level of significance.
To start, we need to find the values of μ and σ2 first.
Let X be the height (cm) of 100 male students.
H0: X ~ N(171.54, 50.56)
H1: X is not distributed this way.
Now this one needs a lot of calculations. The expectation frequency of each class can be found
by using
where a and b are the lower and upper boundaries of each class (remember to ±0.5). The work
for a continuous variable takes some time. Remember that the bell curve goes all the way to
infinity. I believe you know that your calculator can help you do tricks, right?
Remember to combine the small classes.
There are 5 classes and 3 restrictions (ΣE = 100, μ and σ2 estimated from the sample).
Consider a χ2 (2) distribution, perform at 5% level.
From the tables, χ2 (5%) (2) = 5.991, so reject H0 if X2 > 5.991.
Since X2 < 5.991, H0 is not rejected. X is normally distributed, X ~ N(171.54, 50.56).
Before I end this section, let me give you a summary of degrees of freedom used throughout this
post:
15.3 Tests for Independence (contingency table)
Sometimes situations arise when data are displayed in a contingency table, which is a table
displaying data classified according to to 2 different factors / attributes. For example, the table
below
This is a 2 by 3 table, which shows the different schools and their different performance in an
exam. We use a χ2 test to determine whether the two factors are independent, or whether there is
an association between them. According to the table above, we want to know whether the school
affects their exam performance. Or in other words, since the amount of students of school A and
school B. are different (80 and 70 respectively), we know that, if they have the same ratios of
credit, pass and fail, it means that whichever the school it is, also it doesn’t affect the grades.
This kind of test is known as the test for independence. As usual, we shall find the expected
frequency, find the degree of freedom ν and find the test statistic X2 which has the same formula
as the previous section.
Let’s take the above example. The degree of freedom for a h × k contingency table can be found
using the formula
ν = (h – 1)(k – 1)
and so, the above table has the value of ν = 2. The expected frequency E, can be found
through the formula
To find this, we first need to find the total of each row and column. We modify the table above,
colour it a little, then we get
The black numbers in the middle are known as the observed frequency. To proceed to find the
expected frequencies, we construct another table, but clearing off all the data in the middle.
Next, we use the above formula to fill in the expected frequencies. For the top left cell, we have
90 × 80 ÷ 150 = 48.0
We proceed to fill up the rest:
From here, we proceed to find X2 by making use of the 6 values of O and E that we just
calculated. Now let me give you an example:
A research worker studying the ages of adults and the number of credit cards they posses
obtained the results shown in the table.
Use the χ2 statistic and a significance test at the 5% level to decide whether or not there’s an
association between age and number of credit cards possessed.
H0: There’s no association between age and number of credit cards possessed.
H1: There’s an association between age and number of credit cards possessed.
Expected frequency,
ν = (2 – 1)(2 – 1) = 1, the Yates’ Correction is used.
Use the χ2 (1) distribution, perform the test at 5% level.
Since χ2(5%) (1) = 3.841, reject H0 if X2 > 3.841.
Since X2 > 3.841, H0 is rejected. There’s an association between age and number of credit cards
possessed, at 5% level.
16. Correlation & Regression
16.1 Scatter Diagrams
A scatter diagram is a diagram produced when pairs of values are plotted, to determine the
relationship between 2 variables. Usually a scatter diagram contains bivariate data, which is
data connecting 2 variables, x and y. Using the usual convention, x is the independent variable
(explanatory variable), where it is controlled by the user who is analysing the situation. y on the
other hand, is the dependent variable (response variable), it is the variable that is influenced
by the previous one. I believe you learned this in your form 1 Science already.
In a scatter diagram, the independent variable is represented by the x-axis, while the dependent
variable is on the y-axis. Basically, a scatter diagram is just a normal graph, with lots of dots on
it. Suppose you want to analyse the relationship between the temperature of a chemical mixture,
with its yield of a new compound. You started the experiment with various temperatures, and
after a fixed time, you measure the yield of the new compound (precipitate). And you plot them
in a graph like the one below.
Having drawn a scatter diagram, you can then look for a mathematical relationship between the
variables x and y. This relation of y = f(x) is known as the regression function. The scatter
diagram above shows a positive linear relationship between the data, but with a large
dispersion. You can also find a line of best fit, or regression line to make things clearer. Other
kinds of relationship between 2 data are:
For the data in diagrams (a) and (b), we say that there is linear correlation between the data.
Diagram (d) shows that there is no correlation between the data, meaning that x and y are
independent of one another.
Mathematically, there may appear to be a relationship between two data, but sometimes in
reality, there isn’t any relationship. For example, you want to prove that the ears of a spider are
on its legs. So you experiment it by putting it on the table, and shout at it and calculate its
reaction time. Then you repeat your experiment by cutting its legs one by one. When all the legs
are cut, it can’t hear your shout and therefore doesn’t move, so you have wrongly concluded that
its ears are grown on its legs!
16.2 Pearson Correlation Coefficient
Before we start, let us revise a little bit on standard deviation. We all know that the standard error
s is given by the formula
In this chapter, we will be dealing with 2 variables, and thus, we need to specify whether the
standard error is for the values of x or y. To make the difference, we put a subscript x or y to
indicate which variable it refers to. So over here, we have
the standard errors for x and y respectively. We denote the variances of variables x and y as
Note that sxx and sx2 mean the same thing, it is just a different notation for some books. With this
information in mind, we shall now introduce the covariance, which is defined by the formula
PEARSON’S PRODUCT-MOMENT CORRELATION COEFFICIENT
The correlation coefficient is a statistic which provides the information on how strong the
relationship of 2 variables is. Pearson’s product-moment correlation coefficient, also known
as Pearson correlation coefficient or product-moment correlation coefficient, is a numerical
value between –1 and 1 inclusive, which indicates the linear degree of scatter. It is represented
by the formula
where, –1 ≤ r ≤ 1.
When r → 1, it indicates strong positive correlation, which means the regression line has a
positive gradient, or y increases as x increases. Similarly, as r → –1, it indicates the presence of
strong negative correlation. If r = 1 or r = –1, The points lie exactly on a straight line, and we
say that they have perfect positive / negative correlation.
However, when r = 0, it does not necessarily mean that there is no correlation. It might indicate
that the variables x and y are independent of each other. Besides, it might also indicate that the
variables x and y have a non-linear relationship. Take a look at the diagram below:
Sorry but the dots are ugly. This diagram represents a quadratic function. The variables do have a
quadratic relationship, but however, its correlation coefficient r = 0. This is just an example of
how r = 0 fail to explain anything. On the other hand, having r close to zero only approximates
that the data is positively linear correlated. Take a look at the diagram below.
This diagram has a very high r, about 0.7 to 0.8. But however, it doesn’t mean that the data is
highly positively linear correlated. It might mean that there isn’t a relationship after all.
r is independent of the units used in the relation, and is very useful in determining the correlation
of a 2 variables. Evaluating r can be tedious if you make use of the definitions of sx and sy. So
here is the best way to calculate r:
Some other common formulas to find r are:
Besides, there is also this Big S format, whereby
and using this convention, the formula for r is
I would suggest that you keep to the ‘small s format’. In order to teach you how to find r
efficiently using the calculator, consider the example below.
Calculate the value of the p-m correlation coefficient for the data in the following table.
Comment on your answers.
Let’s make use of the calculator’s functions. Using your CASIO fx-570MS, press the mode
button, and select REG mode. There will many kinds of REG mode, so you press ‘1’ for Lin
mode (which means ‘linear’).
Now, to input the data, you press [x-value] [, button] [y-value] [DT button]. So you should
type in 5, 4.3 and the DT button for the first readings. Now the screen should display
[n=
[
]
1]
Continue typing every data, and press the AC button when you are done. Now you press SHIFT
+ S-SUM. You will be able to get lots of data from here: Σx2, Σx, n, Σy2, Σy and Σxy. These are
the useful information you needed for your r (you need these to show your workings). But
there’s a better one, press SHIFT + S-VAR. You get to find the values of x̅, xσn (sx), y̅, yσn (sy),
and in fact, r itself! The only thing you can’t get is sxy (what a pity). So using your calculator,
you find that the answer is
r = 0.93, it is a strong positive correlation.
16.3 Linear Regression Lines (method of least squares, correlation & regression
coefficient, coefficient of determination)
Regression analysis is a statistical technique which can be used to obtain the equation relating 2
variables. A regression line makes estimations on one of the variables when the corresponding
value of another variable is known.
In this section, we are going to learn how to draw regression lines (lines of best fit). There are
actually 3 methods that I know of:
1. By eye method
You look at the bunch of dots, estimate using your eye, and start drawing the line. Not a good
idea though. You probably used this method for your STPM Physics paper 3.
2. L & R method
We fisrt start by finding the average values of x and y. We draw a horizontal and vertical line
across the mid-point. Then, we proceed to find the mid point of the data on the left and right of
the vertical line, and we connect these 3 midpoints to obtain a line.
3. Least squares regression line
This is probably the best method of all, and we will be learning how to do it below.
METHOD OF LEAST SQUARES
The term ‘least squares’ tells us that the square of the distances between the points and the line is
minimized. For a least squares regression line of y on x, the distance taken into account is the
vertical distance. This line will definitely pass through the mid-point of the graph, (x̅, y̅). Take a
look at the graph below.
The red dots are the scatters, while the blue line is the least squares regression line. The line is
drawn in such a way that the sum of squares of the vertical distances between the red dots and
the blue line (green lines) is minimized. So to form a least squares regression line, we have 2
equations of lines, namely
y = a + bx
x = c + dy
The line y = a + bx is known as the regression line y on x, while the line x = c + dy is the line x
on y. Note that they are 2 different lines, and are not inversions of formulas. The line of y on x is
used when x is the independent variable, and y being the dependent one. However, the line x on
y is used only under 2 conditions:
1. when neither variable is controlled and you want to estimate x for a given value of y.
2. when y is the independent variable, and x the dependent variable.
The line of x on y, according to its equation, has its gradient and y-intercept as follows:
Notice another thing. In this chapter, the lines are not written as y = mx + c. The gradient is b,
and by usual convention is put behind the constant a, so y = a + bx, but not y = bx + a. The
constant b is known as the regression coefficient of y on x, and d is the regression coefficient
of x on y. They are both calculated using the formulas
which in the end, you find b to be
If you could have looked closely,
where r is the product-moment correlation coefficient you learned in the previous section. The
term r2 has a name too, called the coefficient of determination of regression lines.
r2 tells the percentage of the variable y can be explained by x. Or in other words,
Or mathematically,
You don’t really need to understand what it means, but just memorize it just in case they ask you
to define it in exams. Take note that 0 ≤ r2 ≤ 1.
Coming back to the relationship between the correlation coefficient and the regression
coefficient. We can see that if
* b and d are positive, then r is positive.
* b and d are negative, then r negative.
Finding b is not enough to plot the regression line of y on x. The equation of the line, in the end
will be
and from there, a can be found. Note that the terms x̅ and y̅ can be substituted with any ordered
pair (x, y) given, and you get the same line.
By the way, sometimes the lines are not that straightforward. You might be asked to make use of
coding, in the form of Y = a + bX to transform lines which are not linearly related, into a linear
line that can be analysed using regression lines. Common examples are
Most statistical questions on this chapter mainly asks you to do these few things:
1. Plot scatter diagrams, and draw a regression line on it
All you need to do is use the table of data given, plot the scatter diagram (on graph paper), and
find the respective values using your calculator to get the values of a and b.
2. Make predictions and estimations
Sometimes you are asked to extrapolate the line, to find a particular value of y, given x, and tell
whether the data is sensible. Remember: extrapolation of a regression line is unreliable. You are
to understand that there exists uncertainties of such predictions. In the case of a graph of age
against running speed, you know that it doesn’t mean the older you are, the faster you run!
3. Calculator estimation
Within the scatter data, sometimes you are given a value of x, to find the value of y, using the
regression line you formulated. The estimated value of y is denoted by ŷ. It is not hard: with your
regression line in hand, just substitute the value of x into it, and you get the value of y. In
calculator, you can press
[number] [x̂] [=] to find x̂, and
[number] [ŷ] [=] to find ŷ.
However, do take note that you find x̂ using the equation x = c + dy, and you find ŷ by using the
equation y = a + bx. Remember which is the dependent and independent variable, they both
make a lot of difference.
4. Find the correlation / regression coefficient or the coefficient of determination
This is quite obvious. That was why we learned them in the first place.
Before I end this chapter, let us take a look at an example, and we will learn how to use your
calculator to find the regression line too.
The following table shows the marks (x) obtained in a mid-year examination and the marks (y)
obtained in the year-end exam by a group of 9 students.
a) Plot the scatter diagram.
b) Find the equation of the estimated least squares regression line of y on x, and x on y, and plot
them.
c) A 10th student obtained a mark of 70 in the mid-year exam but was absent from the year-end
exam. Estimate the mark that this student would have obtained in the year-end exam.
I think you shouldn’t have problem plotting the diagram, right? It looks something like this:
So now, we are to find the regression lines. Firstly, key in all your data into your calculator.
Remember to clear your previous data by pressing SHIFT + CLR, press ‘1’, then ‘=' (refer to
previous post on how to key data in REG mode). Now press SHIFT + S-VAR. Press the right
button until you see A B r. Guess what, the given a and b are the coefficients of the line that you
wanted. So you immediately found the regression line of y on x,
y = 15.83 + 0.72x
Remember to show your workings though. You need to show how you calculate sxx, sxy, and syy,
x̅ and y̅. For the equation x = c + dy, there’s no shortcut, so you have to calculate yourself, which
gives you
x = 22.63 + 0.66y
We shall plot them on the graph:
with the red line being y = a + bx, and green line being x = c + dy. Remember to label them in
exams though.
As for the estimate, you can use your calculator again. From the SHIFT + S-VAR function, and
typing the formula I posted above, you should get 66.38.
Download