Course Notes - cda college

advertisement
1
CDA COLLEGE (LIMASSOL) 2015 - 2016
MBA 603 QUANTITATIVE METHODS Fall Term
Syllabus
1.
Introductory techniques & methods
2.
Descriptive Statistics
3.
Probability Theory
4.
Sampling and Sampling Distributions
5.
Hypothesis Testing
6.
One Sample t-tests
7.
Revision
8.
Mid-term Examination
9.
Two sample t-tests
10. Random digits – Degrees of Freedom - Contingency Tables
11. Simple Linear Regression
12. Multiple regression
13. Time Series Analysis & Forecasting
Revision for the final exam ►
Coursework 50%, Final Exam 50%
Pass 60%
Textbook:
Statistics, D. Freeman,R. Pisani, R.Purves, Norton , 2000
Reference books:
1. The Elementary Forms of Statistical Reason
R. P. Cuzzort and James S Vrettos.; St. Martins (1996)
2. Strategic Management and Business Policy
Glueck/Jauch McGraw and Hill
3. Management Policy, Stanford M J; Prentice and Hall
4. Business Policy, Thomas R E ; Philip Allan
2
WEEK 1: Introductory Techniques and Methods
“ A picture is worth a thousand words”
The Problem:
Presenting a set of numerical information is
mostly an art, and the ways of presentation vary
considerably, depending on the skill and
imagination of the performer.
STATISTICS: from the Latin word “Status”, meaning Situation – State
(Κατάσταση – Κράτος)
The use of quantitative techniques in business;
Objective, reliable, accurate
The role of quantitative techniques in business;
Basis for judgment and model building
Variables (qualitative, quantitative, discrete, continuous)
Review of : ∑ notation, Equations; Derivatives - Integration
Sequences and Series
n
Definition
 g (r ) = g(1) + g(2) +…+ g(n)
r 1

f ( x)  
r 0
f ( r ) (0) r
f ' (0)
f '' (0) 2
x  f (0) 
x
x  ...
r!
1!
2!
Arithmetic and Geometric series along with Maclaurin’s theorem, above, lead to the
following:
n(n  1)
S1=  x 
; S2 =
2
x 1
n
n
[n(n  1)]
n(n  1)( 2n  1)
3
x 
; S3 =  x 

6
4
x 1
x 1
n
2
n
a
(
1

r
)
k 1
Geometric (finite): a+ar+ar 2 +…+ar n 1 =  ar =
1 r
k 1
n

Geometric : 1+x+x 2 +x 3 +…+x n +…=  x 
r 0
r
1
, provided -1< x <1
1 x

xn
x
x
xn
Exponential : e  1+x+
+ +…+ +… = 
x  R
2! 3!
n!
n 0 n!
Examples and exercises
Evaluate the following sums
4
10
n
1
(i)  (3k  1) , (ii)  2 k , (iii)  (r 2  2r  1)
(n = 5, 10, 100) ,
2
x
k 1
n
k 0
2
3
r 1
(iv) 1000(.9) x (for n = 1, 10,  )
x 0
2
3
The exponential function
x2
x3
xn
x
e  1+ x+ 2! + 3! +…+ n! +…
x R

or
xn

n 0 n!
or
lim (1  n )
x
n
n 
Review of Logarithms
Fundamental statements: If y = b x , then x = log b y, b > o
The natural logarithm is defined from the equation x = e y ; Then: y = lnx
Properties: ln1= 0, lne = 1, ln(xy) = lnx + lny, ln(1/x) = - lnx,
ln(x/y) = lnx - lny
Differentiation:

y
dy
or y  = lim
dx
x 0 x
dy
If y= ax n , then
= nax n 1 ,  n
dx
(special case If y= constant, then y  = 0)
If y = sinx, then y  = cosx
If y = cosx, then y  = –sinx
If y = tanx. then y  = sec 2 x

Note that

If y = e ax , then y  = a e ax , also if y = a x , then y  = a x lna

If y = lnx, then y  = 1/x, Naturally

Product Rule:

Quotient Rule: ( ) 

Chain Rule:





Notation
x

d e
( )=
dx
e
x
and
e
x
dx  e x  c
d (u  v) du dv


dx
dx dx
(uv )  = u v  uv ,
u
v
u v  uv 
,
v2
dy dy du

dx du dx
Equation of: tangent y - y 1 = m(x - x 1 ), where
dy
m=
| x x1 , the gradient of the curve at the given point.
dx
4
Examples and exercises
1. Find dy/dx: (i) y = 2x 3 ,(ii) y = 4x 4 - x 2 +1, (iii) y = x(x - 2)(x + 1), (iv) y = x 2 /(1-x),
(v) y = 2cos3x + 4tan5x,
2. Find the second derivative (i) y = 7x 3 - 6x 5 (ii) y=(x 2 - 3)/x, (iii) y = sinx
3. Obtain the equation of tangent of the functions in question 1, at x =1
Integration:
The general principle is that integration is the inverse operation of differentiation, i.e
1
ax n 1 + c, provided n  -1
n 1
1
2.  x 1 dx =  dx =lnx + c
x
3. All formulae which can be deduced from the known derivatives.
1.
 ax
n
dx =
Area under the curve y = f(x) within the lines x = a and x = b and the OX-axis,
b
A=  f ( x)dx (Sum of rectangles)
a
Volume generated by revolution of the part of the curve between the lines x = a and
x = b about the OX - axis,
b
V= п  y 2 dx (Sum of cylinders)
a
Examples and exercises
1.
Evaluate the following integrals
1
(i)
5
 (2 x  3)dx , (ii)  2
0
1
0
x  1dx , (iii)  (t  2)(t  1)dt , (iv)
1
 /2
1
 cos xdx (v) 1( x  2)
0
2. Find the area between the graph of f and the x-axis
(i) f(x) = 2 + x 3 , x  [0,1], (ii) f(x) = x 2 (3+x), x  [0,8], (iii) f(x) = x  1 , x  [3,8]
3. If the region is rotated 360 0 , about OX,
Sketch the region bounded by the curves and find the volume generated
(i) y = x , y = x 2 , (ii) y = 6x - x 2 , y = 2x, (iii) y = x 2 + 2/x, y = 5
2
dx
5
WEEK 2: Graphical Representation


Pie chart; Histogram; Cumulative Frequency Polygon (Ogive) advantagesdisadvantages
General principles of presentation (simple, self-explained, accuracy e.t.c)
The Histogram
Principle: The area of each rectangular block is proportional to the corresponding
frequency, so that the Y-axis represents the density = frequency/width
Example
(i) The number of classes chosen from raw
(i)
data is arbitrary.
(ii)
Stem and leaf plot (An example)
Marks (out of 100) of 30 students:
(Often n is adequate!)
(ii) Connecting the midpoints of the top sides of
the rectangles, we get the frequency
polygon.
50, 55, 72, 51, 70, 63, 32, 44, 46, 68
85, 22, 25, 57, 74, 85, 35, 84, 53, 48
35, 53, 72, 61, 44, 64, 65, 45, 55, 53
Stem Leaf (4|5, means 45) Freq.
2│2 5
2
3│5 2 5
3
4│4 4 5 6 8
5
5│0 5 3 1 7 3 5 3
8
6│1 3 4 5 8
5
7│2 2 0 4
4
8│5 5 4
3
(i)
(ii)
(iii)
Each point on the leaf represents the
last digit of the observation.
Provides the shape of the histogram if
we turn it ninety degrees anticlockwise.
Most importantly, nothing is lost from
the raw data. (As opposed to the more
elegant histogram!)
Bar Chart
- Used for qualitative variable
- Area of block represents
frequency
Pie Chart
-
For each sector, the corresponding angle is
  360 0
f
, where f is the frequency of
N
the class, and
k
N is the total frequency:

i 1
fi  N
6
Cumulative frequency polygon or curve (ogive)
Each point (X, F) represents the pair:
X: upper boundary of the class.
F: Number of observations  x (Absolute, or relative %)
EXAMPLES – EXERCISES
1. The results of the Euro-elections in May 2014, were the following
Party
(a) ΔΗΣΥ
(b) ΑΚΕΛ
(c) ΔΗΚΟ
(d) ΕΔΕΚ
(e) Άλλοι
Total
Votes
Colour
37.75%
26.98%
10.83%
7.68%
16.76%
Blue
Red
Yellow
Green
Cyan
100.00%
After calculating the corresponding angles (to the nearest degree) construct the pie
chart with the corresponding colours.
2. The results of an Economics examination at a college are shown below. Construct
a stem and leaf diagram to represent these data:
31 54 80 58 73 50 69 65 84 49 67 47 70 78 77
67 55 78 62 59 54 41 69 65 41 96 80 89 54 68
3. The following table shows the time, to the nearest second, recorded for the
telephonist to answer the calls received during a particular day.
For the third class, find the class characteristics.(Boundaries; Limits; Midpoint;
Width; Frequency, Cumulative Frequency)
Represent these data by a histogram and a cumulative frequency polygon.
Time to answer
Number of
(Nearest second)
calls
10-19
20
20-24
20
25-29
15
30
14
31-34
16
35-39
10
40-59
10
7
WEEK 3: Descriptive Statistics
Measures of Location: Minimum, Maximum
1 n
1 k
Average: X =  xi , or  x j f
j
n 1
n 1
(n  1) / 2  F
 w (for continuous variables)
Median: m = L+
f
Where, n is the sample size, k is the number of classes, f is the median class
frequency, x j is the midpoint of the class, L is the lower boundary of the class, F
is the cumulative frequency of the previous class and w is the class width. Similarly,
(n  1) / 4  F
3(n  1) / 4  F
 w; Q =L+
w
1
3
f
f
Mode is the value with the highest frequency
Methods of calculation; Single values, frequencies, discrete, continuous
Quartiles:
Q =L+
Measures of Dispersion:
Understanding Dispersion: Consider the marks achieved in four Mathematics tests
by two students; Andy and Brian
Test #
Andy
Brian
1
15
10
2
16
20
3
14
18
4
15
12
Average: X
15
15
Do you see any differences between the performances of the two students?
Andy is more predictable, Brian has potential, but no consistency!
It is precisely this characteristic which is expressed by dispersion.
We quantify this characteristic by the following measures:

Range: R = X max - X min

Interquartile range IQR = Q 3 - Q 1 ; Semi IQR = (Q 3 - Q 1 )/2

Variance σ 2 or s 2 ; Standard Deviation: σ or S
For a Sample of size n:
1  n 2
1 n

2
2
X i  nX 2  (for single values)
s 
( X i  X ) or s 



n  1  i 1
n  1 i 1

2
1 k
1 k
2
2
2
s 
( X j  X ) f j , or s 
( X 2j f j  nX ) (for frequency table)


n  1 j 1
n  1 j 1
2
8
Applications
1. The high blood pressure for a sample of eleven pensioners is given below:
12.4, 14.7, 10.2, 16.3, 13.9, 12.2, 10.7, 11.8, 12.6, 11.9, 11.5
(a) Estimate the median and the quartiles.
(b) Estimate the mean, the standard deviation and the interquartile range.
.
2 Delegates to a National Congress had their ages recorded in years. The table below
summarises these data.
Age
Freq.
18-24
20
25-31
35
32-38
25
39-45
18
46-52
12
53-59
7
60-73
3
(i) Construct the histogram and the cumulative frequency polygon.
(ii) Calculate the appropriate measures, of location and dispersion, justifying your
choices.
3. The following table summarises the birth weights of a random sample of 100
babies born in clinic over a year.
Weight in Kg
1.5-1.9
2.0-2.4
2.5-2.9
3.0-3.4
3.5-3.9
4.0-4.4
4.5-4.9
5.0+
(a)
(b)
(c)
(d)
frequency
2
9
12
18
22
17
13
7
Write down the upper class boundary of the third class
Calculate estimates of the median and the quartiles of these birth weights
Calculate an estimate of the mean;
Comment on the skewness of these data.
9
WEEK 4: Probability Theory
The notion of Probability arises, when we have:
A random experiment, considered as the one:
Which can be performed over and over again, theoretically, under identical
conditions, but we cannot predict the outcome!
For example, tossing a coin, observing the price of a stock on a particular day,
measuring the weight of a randomly selected person, counting the number of bank
customers in a queue at a particular moment of the day, and so on;
For such an experiment, we may talk about:
- Events (Ενδεχόμενα): The term is often used before the experiment;
(Subject to probabilistic analysis)
- Outcomes (Γεγονότα): The term often examined after the experiment;
(Subject to statistical analysis)
Special Events

Α Simple Event can occur in a single way! (e.g. getting the pair [5, 6] with this
order, when we toss two dice, one red and one blue! i.e. 5 red, 6 blue)
 Α Composite Event can occur in more than one, ways; (obtaining a sum of 8
with two dice! i.e. 2-6, 6-2, 3-5, 5-3, 4-4)
 The Sample Space Ω (or S) is the set of all simple events;
Ω
 Venn Diagram: ---►
 The Certain event “Ω”
is the one which always occurs!
 The Impossible event “Ø”,
Never occurs!
Complement of an event A:
“Non- occurrence of A” is denoted by A , (or A c or
Aʹ). In the diagram the complement of A is the set of points outside A, but in the
frame Ω. i.e. blue, purple, red and white points!
Recall from set theory:
 Union A  B
means the set of all elements x belonging to, either the set A, or the set B, or
both sets A and B, but these elements are taken only once.
 Intersection A∩B or AB
(means the set of all elements x belonging to both sets A and B)
Definition of Probability: For any event E, a subset of the sample space
Ω,
P(E) = (# of favorable equiprobable simple events in E)  (# of
possible equiprobable simple events in Ω ) ( or lim
n 
relative frequency)
where f : number of occurrences of that particular event E,
out of a total of n, number of trials
f
: limiting
n
10
Kolmogorov’s Axioms (basic 3 out of 7)
For any events E, A or B belonging to the sample space, Ω we have:
E  Ω
(i) 0  P(E)  1
(ii) P(Ω ) = 1
(iii) If events A and B are mutually exclusive, i.e. A∩B = Ø, then
P(A  B) = P(A) + P(B) (Addition Law)
Independence:
Events A and B are said to be independent, if the occurrence of one, does not
affect the probability of occurrence of the other, and mathematically we have:
P(A∩B ) = P(AB ) = P(A) P(B)
Some Theorems: (may be proved straight from the axioms)
P (Ø) = 0
P (A  B) = P(A) + P(B) ─ P(AB)
Where the notation “AB” is the same as “A∩B”
P ( A ) = 1 ─ P(A) (Complement rule)
P ( A  B ) = P(B) ─ P(AB), Using the Venn Diagram
Morgan Laws:
( A  B)    
and ( A  B )    
Some practical ways of calculating probabilities
(i) From the definition of probability and/or counting
(ii) Using the complement
(iii) Applying the theorems
(iv) Drawing the Venn diagram
(v) Tree diagram 
with the corresponding probability on each branch!
Example: Consider the random experiment: Tossing two fair dice (Blue; Green)
Define the events:
B: “getting four with Blue die”; G: “getting four with Green die”;
E: “At least a four on either die”;
Find (a) P(Sum 10); (b) P(E):
11
Conditional probability (dependent events):
Event “B can occur, given event A has occurred”.
Definition: P( B | A) 
P( AB)
P( AB)
; or P( A | B) 
P( A)
P( B)
Another way to look at it is that knowing the occurrence of the first event A, we can
restrict the sample space to those points covered by the occurrence of A!
If, of course, A and B are independent, then simply
P(B|A) = P(B) and P(A|B) = P(A).
Use of Tree or Venn diagrams helps a lot!
Multiplication Law (Inverting conditional probability):
P(AB) = P(A) P(B|A) = P(B) P(A|B), or, for three events A, B, C, we have
P(ABC) = P(A) P(B|A) P(C|AB)
Law of Total probability: P( E )  P( AE )  P( AE )  P( A) P( E | A)  P( A) P( E | A)
“In words, event E can occur , either with event A, or without A!”
Random variables:
A r.v X takes its values according to some Probabilistic Law;
A random variable and its probability density function (p.d.f.) always obey the
following two conditions:
Discrete
(i) P(x) ≥ 0 , or
(ii)

P(x) = 1, or
allx
Continuous
f(x) ≥ 0  x

f ( x) dx  1
allx
The Mean or Expected Value:
  E[ X ]   xP( x) or
allx
 xf ( x)dx
allx
The Variance: σ = Var(X) = E[ (X- μ) ] = E[ X ] - μ 2
Useful Note : E(X 2 ) = σ 2 + μ 2
2
2
2
NORMAL DISTRIBUTION
Notation X~N(μ, σ 2 ):
E(X)= μ, Var(X)=σ 2
Shape of the distribution: Symmetric – Bell Shape
x
Standardize Z=
~ N(0,1); Tables, P(Z≤z)=Ф(z)

Central Limit Theorem CLT: the average X =
1
n
n
 x ~ N(μ, σ
i
2
/n)
1
Combinatorial Analysis (Optional!)
Arrangements of n different objects;
(Ordering); An = n! , e.g. for three letters A,B,C we get
ABC, ACB, BAC, BCA, CAB, CBA (i.e. 3! = 1x2x3 = 6); Cyclical: An= (n-1)! ,
12
n!
where r, s, t are the numbers for similar objects.
r! s!t!
Of course ( r + s + t = n)
n!
Permutations: P nx =
; (Choosing and ordering)
(n  x)!
n
n!
Combinations; notation: C nx or   =
;
x!(n  x)!
 x
(Just choosing, with no ordering)
 4
e.g. for three different objects: A, B, C, D we have: C 42 =   = 6 :
 2
AB, AC, AD, BC, BD, CD.
With repetitions: An=
For Permutations however, ordering is important, so
P 42 = 12: (AB, BA, AC, CA, AD, DA, BC, CB, BD, DB, CD, and DC)
Note that P nx  C nx A x for all x, < n  N
Application :
Find the probability of getting at least 1 heart if we draw four cards
out 52 (there are 13 hearts in a full pack)
Examples – Exercises
1. A Casino player bets €10 on Red. If it comes up Red, he bets another €10. If the
first number is Black, he bets €20 again on Red. Find the probability that a total of
two red numbers come up. Find also his expected gain. Note that P(Red) = 1/2
2. A committee of size 5 is to be selected at random from 3 women and 5 men.
a) Show that there are 56 ways of choosing the committee.
Let W represent the number of women on the committee.
15
b) Show that P (W = 2) =
28
c) Find the probability distribution for W.
d) Find E (W).
3. Jam is packed into tins of advertised weight 1 kg. The weight of a randomly selected
tin of jam is normally distributed about a target weight with a standard deviation of
12g.
If the target weight is 1kg, find the probability that a randomly chosen tin weights
i) less than 985g,
ii) between 970 and 1015g.
4. The random variable X (number of matches in a box) is roughly normally distributed
with mean μ = 40 and standard deviation σ = 3. For the sample mean X , of a
random sample of 12 boxes, write down:
The mean and the standard deviation, and find:
The probability that the average X is between 37 and 43
13
WEEK 5: Sampling and Sampling Distributions
“You don’t have to eat the whole soup to realize that it turned sour”
The Problem: One of the fundamental objectives of statistical science
is to make inferences about the whole population,
examining a small portion of it!
Simple Random Sample :
A random sample is a collection of independent identically distributed (i.i.d.) random
variables X1 , X 2 , , X n or, we can consider it as a small group of size n, randomly
taken from the population of size N. To draw the sample, we can use random
numbers, generated by computer, a calculator, or from statistical tables.
In order to understand some of the important aspects of sampling, we consider a
very simple example:
Consider a “population” of 4 families in a small village with numbers of children:
{ 3, 1, 0, 8}
Population size: N= 4; Variable of interest: X= the number of children in a family.
Note that for this population
1
1
X  (3  1  0  8)  3 ,

N
4
1
(3  3)2  (1  3)2  (0  3)2  (8  3) 2 0  4  9  25
 2   ( X   )2 

 9.5
N
4
4

If we take a sample of size n=2, without replacement, the number of possible
 4
samples is   = 6, and the sampling distribution is the set of all values of X ,
 2
along with the corresponding probabilities, for all possible samples:
Sample
(average) X
Probability
(3,1)
2
(3,0)
1.5
(3,8)
5.5
(1,0)
0.5
(1,8)
4.5
(0,8)
4
totals
18
1/6
1/6
1/6
1/6
1/6
1/6
1.0
Note that the median of the population is 2 and the mode does not exist!
Now:
1
( X )   XP( X )  (2  1.5  ...  4)  3(  )
6
allX
1
Var ( X )  [(2  3) 2  (1.5  3) 2  ...  (4  3) 2 ]  3.167
6
Methods of sampling
(i)
Simple random sampling without replacement. (SRSWOR). Each sample
(subset) of the same size from the population has the same probability to be
14
selected. Consequently each unit has the same chance to be chosen in the
sample.
(ii)
Stratified with prior knowledge (Often the best scheme, but not fully random
in the above sense); Choose from every stratum a SRSWOR! The size of the
sample for each stratum should be proportional to the population stratum
size. i.e.: n j 
j
n
N j , j  1,2,..., k ; (where k is the number of strata and n
N
and N j are the sizes of the sample and the population respectively, of the
corresponding j th stratum,)
(iii) Quota sampling: Selection is based on the characteristics of the population
(gender, geographical areas, age, education level and so on…), so that it is
essentially a small picture of the population. This is a sort of stratified sample,
but not random, although some randomness may be introduced, but it remains
biased!
(This scheme is preferred for practical considerations and significantly lower
cost!)
(iv) Cluster sampling: Choose clusters (regions, classes, villages, etc), either as a
whole or a random sample within each of the chosen cluster. This is a good
scheme when clusters behave similarly, but it is not a fully random sample;
there is however a component of randomness.
(v) Systematic sampling. If the sampling ratio, k 
N
, then pick a random number
n
from 01 to k (using tables of random numbers), and then pick every k th
member. For example, if the population size N =120 and the sample size n=5,
then k=24.
Choosing a two digit random number from 01 to 24 from the tables, let say
r = 14, we have the sample corresponding to the items with numbers: (14, 38, 62,
86, 110).
This sampling scheme is easy, cheap and satisfactory, provided no pattern in the
variable of interest exists, along the series of observations in the population.
For example, if soldiers in a parade are standing ranked in descending order, with
respect to their height, then, systematic sampling, to estimate their mean height, is
not a good idea! (It is clear that, choice of a random number like, r = 1 or 2, will
tend to grossly overestimate the true mean height, whereas choice of r = 23 or 24 in
the above example would underestimate the mean height of the group of soldiers!)
Central Limit Theorem:
For a sample: ( X1 , X 2 ,
X
1 n
 Xi
n i 1
N ( ,
, X n ) of considerable size from any distribution, then
2
)
n .
15
This is why the Normal distribution is the most important distribution in the
theory of Statistics
Confidence intervals
Point estimation (estimating a parameter with a single value), gives us some idea
about the value of the parameter, but important information about the precision of
this estimate is missing! An estimate of the population mean μ is given by
for the variance
Var(X) = σ 2 , the estimate is
correlation coefficient ρ, is



2
 S 2 , for the proportion
π, is

 =X ,

 = p and for the
 =r.
This procedure is useful, but almost certainly out of target. A more realistic and
“safe” procedure is often entertained through Confidence Intervals.
Given a random sample X1, X2,…, Xn, we define a P% Confidence Interval
(CI) about the parameter θ, as a random interval [Τ1, Τ2], such that:
Pr(T1    T2 ]  P%
$
(Central about   T , even if the distribution is not symmetric),
where Τ1, and Τ2, are the values of the statistic T, obtained from the sample.
As a general principle the C.I. for θ is
    z  S.E.( )
Approximately! Where z is the appropriate value from N(0,1)
Very commonly used intervals:
for μ:
X  z / n
X  t (n 1) s/ n
for π:
p  z p(1  p)
n
(when σ is not known, and sample size n is small)
16
Applications
1. What is meant by a random sample?
Here is an extract from a table of random numbers:
86 13 84 10 07 30 39 05 97 96 88 07 37 26 04 89 13 48 19 20
60 78 48 12 99 47 09 46 91 33 17 21 03 94 79 00 08 50 40 16
78 48 06 37 82 26 01 06 64 65 94 41 17 26 74 66 61 93 14 97
(a) Starting from the first line and the third column of the table with the number 84,
and reading across the table select and write down 10 random numbers between 01
and 80 from the table.
(b) Explain how you could use these random numbers to select a sample of 10
students from 80 students
2. A large civil engineering firm issues all new employees with a safety helmet. Five
different sizes are available numbered 1 to 5.
A random sample of 90 employees required the following sizes:
2
2
4
2
2
3
4
3
4
3
3
3
2
1
3
2
4
3
2
5
4
3
2
2
2
4
4
3
3
3
5
3
5
5
4
4
4
2
3
4
5
2
5
3
3
2
2
3
4
3
3
3
3
2
4
3
2
4
3
4
4
3
4
2
2
3
2
3
4
4
4
3
4
4
2
2
3
2
3
3
2
2
2
2
4
2
3
3
2
3
Calculate an approximate 90% confidence interval for the proportion of employees
requiring size 2
3. A computer program is designed to take a random sample of size 36 from a
normal population with mean μ and standard deviation = 5.1. The sample
mean for one such sample was X = 26.3.
Calculate a 95% confidence interval for μ based on this sample.
4. Prior to an election in the state of Texas, USA, the Opinion Research Centre
wishes to take a random sample of voters so large that the probability is at most
0.02 that they will find the proportion supporting the Democratic candidate to be
less than 0.5 when it is actually 0.55. Assuming that a continuity correction is
unnecessary, calculate the size of the sample needed.
17
WEEKS 6 and 7 : Tests
“Doubt is the root of progress”
The Problem: Why testing? First, there should be a claim for the value
of a parameter (μ, σ, π, λ, and so on) or a model, and then the evidence
from the observed data should be left to decide about the claim!
A test is a statistical procedure which decides with some confidence whether a
statement – hypothesis, about a parameter value, or the whole model, is valid!
The Characteristics or the “ingredients” of a test!
The supposed claims are formulated as follows:
Null Hypothesis: H 0 : This statement should specify completely the value of the
parameter of interest! e.g. μ = 50 vs:
Alternative Hypothesis: H 1 : This is a complementary statement to H 0 , in some
direction!
e.g. μ > 50 (upper tail), or μ < 50 (lower tail), one
tailed test, or μ  50 (two tailed test);
Test Statistic: e.g.
Z
(X  ) n

N (0,1) ; This is (used to be called) a
statistic from the sample. In fact this is rather a Pivotal Quantity, since by
definition; a Statistic is a function of the sample only! Essentially it serves as a
criterion for decision:
Accept or Reject H 0 Naturally we could also use as a test statistic:
n
x
1
2
X =
i ~ N (μ, σ /n);
n 1
Significance level: α = Pr (we observe what we did, under H 0 , or even worse in
the direction of H 1 ) ; often called the p-value;
Often this is predetermined at 5% or at 1%, but it can take any value!
Critical value; Critical region: The value, or rather the set of values of the test
statistic Z or X which lead to rejection of H 0 .
The Decision Rule: This is a rule which is based on a pre-specified value
the test statistic and is of the form:
T * , of
Reject H 0 , if Tobs  T * or Tobs  T *
Accept H 0 , otherwise
Relation with Confidence intervals:
For a two tailed test, if the assumed value of μ lies inside the C.I, then we accept H 0 ;
If, however it is outside the interval, then we reject the null hypothesis in favor of the
alternative.
18
Example: For the purpose of placing the order of the purchase of shoes for the
incoming soldiers, it has been suggested that the mean length of their foot is 26 cm.
A statistician has been called to decide on the suspicion that during the last ten
years, this length has been increased significantly!
So the statistician decided to take a random sample of n = 50 measurements on
this year’s soldiers, and the results were X = 26.3 and s 2 =1.44
The test was formulated as H 0 : μ = 26 vs H 1 : μ > 26; One tailed test.
The test statistic chosen is
Z
(X  ) n

N (0,1)
If we prespecify the S.L. to α = 5%, then the critical value would be Z=1.645.
The observed value of the test statistic (taking σ ≈ s, considering this sample size
large), turns out to be: Zobs.= 1.77>1.645, which is significant, so we reject H 0 in
favour of H 1 .
On the other hand, the p-value is (Observed) S.L.= Pr(Z>1.77) = 0.04.
In the light of the above test, the suspicion of increase of “soldiers foot length”,
seems to be justified and the order for new boots, with respect to the sizes, has to
be adjusted accordingly!
PRINCIPLE: The rationale behind a statistical test is the following;
“If a small value of the significance level α is observed, or
equivalently an extreme value of the test statistic T is realized, then
we are faced with two options:
(a) Either H 0 is true and we just observed, by chance alone, a quite rare event, or
(b) H 0 is false, and that is the reason, we have observed the realized outcome”!
Test statistics:

For μ : ( X ─μ) n /σ~N(0,1) or ( X ─μ) n /s~ t ( n 1)
For π :
Z

(p ) n
 (1   )
(n  1) s 2
N 0,1) (approximately)
~
2

For

Two sample problem: (a) Unpaired test
Tests for two samples (sizes n, m): Η0 : μx = μy
Z
2
:
2
(Y  X )  (  y   x )
 

n m
2
x
where
2
y
s  
2
2
( n 1)
N (0,1) , (known variances), or T 
(Y  X )  (  y   x )
s
(n  1) sx2  (m  1)s y2
nm2
;
1 1

n m
t( nm2) ,
19
The pooled estimate of the common, but unknown, population variance σ 2
(b) Paired t test:
When natural pairing exists between the observations in the two samples, the most
efficient test is the paired one and the set-up is the following:
Η0: μx = μy or δ = μx - μy = 0
Paired Sample
Assumption:
Di = Xi - Yi, (difference)
N ( , 2 ) , independent
X1, X2,…, Xn
Y1, Y2,…, Yn
1 n
1 n
2
D   Di , and ,s d 
( Di  D) 2

n i 1
n  1 i 1
Test statistic: T 
(D   ) n
sd
t( n 1)
Applications-revision
1. An experimental kit, used to illustrate basic statistical ideas to students, contains a fair
six sided die and a biased six sided die as well as other equipment. The probability of
a 6 on any toss of the biased die is 0.25. One of these dice is tossed 120 times, and
of these 29 results in a 6. Assuming that these results are independent, test the null
hypothesis that the die is the fair one against the alternative that it is the biased one.
Subsequently, it was established that the die tossed is definitely the fair one.
Comment upon your test result, in the light of this information.
2. The weight, X grams of soup put in a tin by a machine is normally distributed with a
mean of 160 g and a standard deviation of 5 g.
A tin is selected at random.
(a) Find the probability that this tin contains more than 168 g.
The weight stated on the tin is changed to w grams.
(b) Find w such that P(X < w) = 0.01.
3. A firm is to buy a fleet of cars for its salesmen and wishes to choose between two
alternative models, A and B. It places an advertisement in a local paper offering four
gallons of petrol for anyone who has bought a new car of either model in the last
year. The offer is conditional on being willing to answer a questionnaire and to note
how far the car goes, under typical driving conditions, on the free petrol supplied.
The following data were obtained:
Miles driven on four gallons of petrol
Model A 117 136
108
147
20
Model B
98
124
96
117
115
126
109
91
108
(a) Test at the 5% significance level, whether there is any difference between the
mean petrol consumption of the two models
(b) How can we improve the experimental design?
4. Weather records for Limassol lead the local weatherman to suggest that the high
temperature for November 15 is a normal random variable with mean 22 C and
standard deviation 5 C. Find the probability that on the next November 15 the high
temperature will:
(a)
be less than 21
(b)
be more than 26
(c)
Lie between 20 and 25 C.
What temperature would be exceeded with 90% probability?
5.Roastie’s Coffee is sold in packets with a stated weight of 250 g. A supermarket
manager claims that the mean weight of the packets is less than the stated weight.
She weighs a random sample of 90 packets from their stock and finds that their
weights have a mean of 248 g and a standard deviation of 5.4 g.
(a) Using a 5% level of significance, test whether or not the manager’s claim is
justified.
(b) Find the 98% confidence interval for the mean weight of a packet of coffee in the
supermarket’s stock.
6. Manuel is planning to buy a new machine to squeeze oranges in his cafe and he
has two models, at the same price, on trial. The manufacturers of machine B claim
that their machine produces more juice from an orange than machine A. To test this
claim Manuel takes a random sample of 8 oranges, cuts them in half and puts one
half in machine A and the other half in machine B. The amount of juice, in ml,
produced by each machine is given in the table below.
Orange
1
2
3
4
5
6
7
8
Machine A
60
58
55
53
52
51
54
56
Machine B
61
60
58
52
55
50
52
58
Test, at the 10% level of significance, the manufacturer’s claim.
7. A random sample of 10 tomato plants had the following height, in mm, after 4 days
growth.
5.0, 4.5, 4.8, 5.2, 4.3, 5.1, 5.2, 4.9, 5.1, 5.0
Those grown previously had a mean height of μ = 5.1 mm. Using a 5% significance
level, test whether or not the mean height of these plants is less than those grown
previously.(Assume that the heights above are normally distributed).
21
WEEK 10: Contingency Table
Test for Association or Independence
This is one of the most important and very popular tests in Statistical Applications,
since it proves possible associations between factors!
The values in the cells are frequencies. If the values are percentages, then, before
proceeding to the test, we have to transform them to frequencies.
H 0 : No association between the two factors, or the two classifications A and B
are independent (m rows and n columns)
The basic idea is that the estimated probability for row i, is
corresponding probability for column j, is p j 
cj
N
pi 
ri
and the
N
. So the expected frequency
(under H 0 ), in each cell (i th row, j th column) is
eij  NPij  Np i p j 
ri c j
N
=
(row total) × (column total)/ (grand total).
For the test to be “good” and reliable (according to Pearson), each expected
frequency, should always be at least 5; otherwise we have to combine adjacent or
similar columns or rows, in order to achieve all expected frequencies to be at least
5.
Test statistic
(oij eij )
D
e
i, j
ij
2
 (2m1)( n 1)
22
Exercises
1. A survey in a college was commissioned to investigate whether or not there was
any association between gender and passing a driving test. A group of 50 male and
50 female students were asked whether they passed or failed their driving test at
the first attempt. All students asked had taken the test. The results were as follows,
Pass Fail
Male
23
27
Female 32
18
Stating your hypotheses clearly test, at the 10% level of significance, whether there
is any evidence of an association between gender and passing a driving test at the
first attempt
.
2 Research was carried out to investigate for a possible connection between weekly
alcohol consumption and development of Type 2 diabetes.
The results are summarised in the table.
Type 2 diabetes developed
Level of alcohol
consumption (gr)
Yes
Less than 5
38
Between 5 and 30 12
More than 30
35
No
382
653
380
(a) Test, at the 1% level of significance, whether the development of Type 2
diabetes is independent of the average level of weekly alcohol consumption.
Assume that the sample was random.
(b) A medical reviewer for a newspaper read the report and then he stated that
people should increase their weekly alcohol consumption in order to decrease their
chance of developing Type 2 diabetes.
Make two comments on his statement, referring to both the study and the sources
of association, if any, identified when carrying out the test in part (a).
23
WEEK 11: Correlation - Regression (Simple &Multiple)
The Problem: Given a set of bivariate observations, we attempt to
search and estimate the “best” algebraic relationship
between X and Y
-We are interested in measuring linear association between:
The response (dependent) Variable: Y, vs
The explanatory (independent) Variable X
-A first indication of the existence of this linear association is revealed, if we draw
the scatter diagram between X and Y
Other possible shapes
Correlation: meaning linear association between X and Y; cause and effect.
A numerical measure of the strength of this linear association between X and Y is
the
sample product moment correlation coefficient (pmcc)
r
s xy
1  n

X iYi  nXY  , is the sample covariance and
, where sxy 


sx s y
n  1  i 1

s x ,s y , are the sample standard deviations of X and Y. Note that always
−1≤ r ≤1, with r =  1, for perfect correlation; positive or negative! (See the shapes
above)
Another way to calculate the pmcc: r 
SXY
;
SXX  SYY
(can be obtained easily from an advanced calculator! )
where
n
n
i 1
i 1
SXY   ( X i  X ) (Yi  Y )   X i Yi  nXY  (n  1) s xy ,
24
n
n
SXX   ( X i  X ) 2   X i2  nX 2  (n  1) s 2x ,
i 1
i 1
n
n
i 1
i 1
SYY   (Yi  Y ) 2   Yi 2  nY 2  (n  1) s 2y
Note that:
(i)
The numerical presence of significant correlation does not necessarily
imply association between the two variables, unless of course some
natural explanation exists! Spurious (unexplained, nonsense) correlation
sometimes occurs!
(ii)
On the other hand, a value of the pmcc close to zero, implies no linear
association, but it might indicate strong non-linear association!
(Examples are shown above on the scatter diagrams)
Linear Regression
“It is our opinion of a situation at one stage, but this must change, if
we find, at a later stage, that the facts are against it!”
Glancing the scatter diagram, we may suspect a rather strong linear association of
the response variable Y on the explanatory variable X.(a p.m.c.c. close to r   1
reveals that!). In this case, we use regression methods to establish the “suspected”
linear relationship.
For example, for the set of points, given below, we draw the scatter diagram:
(X, Y) points: (1, 49), (3, 51), (4, 52), (6, 52), (6, 53), (7, 53), (8, 54), (11, 56),
(12, 56), (14, 57), (14, 58), (17, 59), (18, 59), (20, 60), (20, 61)
r =0.991
The line of “best” fit is obtained through the statistical regression model:
Y i =α+ βX i + ε i ,
where ε i is the unobservable error.
The vertical random errors satisfy the conditions;
(i)
ε i ~ N (0, σ 2 ), and
(ii)
they are independent ( i=1, 2,…, n)
25
Where α, is the Y intercept and β, is the slope (gradient) of the straight line
Variables: Explanatory, Independent; X
Response, Dependent; Y
Principle of regression:
Applying least squares methods, we obtain the estimates of the parameters
α and β, which turn out to be the Statistics a and b, by minimizing the sum of
squared vertical errors, with respect to α and β:
n
n
SSE      (Yi     X i ) 2
i 1
2
i
i 1
Solving the resulting two regression equations (see appendix A4), the “best”
estimates of the unknown parameters α and β, turn out to be:
b = SXY/SXX
n
n
i 1
i 1
SXY   ( X i  X ) (Yi  Y )   X iYi  nXY
n
n
i 1
i 1
,
SXX   ( X i  X ) 2   X i2  nX 2
The Y- intercept:
  a  Y  bX
 2  s2 
SSE
n2
The variance of the Model:
Natural interpretation of the parameters of the regression Model:
a: Is the estimated value of the response variable Y, when the explanatory variable
X=0
b: Represents the estimated change of the response variable Y, for a unit increase
of the explanatory variable X
Note that prediction may be obtained through the fitted line. However, this is
quite risky, in particular, outside the range of observations (extrapolation), as
in any other hard science!
How good is the model?
The goodness of the model is measured by the value of the coefficient of
determination:
R2 
SSR
SSE
 1
SST
SST
26
Application
An agricultural researcher collected the following data showing the annual yield of
wheat (in bushels per acre) and the annual rainfall (in inches)
Rainfall (x)
9.7 19.0 8.2 11.1 6.9 13.6 13.0 15.0
Yield of wheat (y) 28.0 35.1 23.8 25.6 20.1 30.2 28.5 33.7
(i)
(ii)
(iii)
(iv)
(v)
(vi)
Plot the data on a scatter diagram.
Evaluate the product moment correlation coefficient between x and y.
Give an interpretation of the value obtained.
Find the regression of y on x in the form y=a+bx
Give an interpretation of the value of b.
Can we predict the yield for rainfall at the level of:
(a)
16 inches? (b) 30 inches
Multiple Regression
For a multiple regression situation, we entertain the matrix model:
y  X  

 y1   x11
  
 y2    x21
; In matrix form,  ...   ...
  
 yn   xn1
... x1k  1   1 
   
... x2 k   2    2 

... ...  ...   ... 
   
... xnk   k    n 
x12
x22
...
xn 2
MVN (0,  2 I n )
b  ( X T X )1 X T y
with the best solution
and
E (b)   ;
with
Var (b)  ( X T X )1 2
being:
The predicted value for a given explanatory vector value
xi
is:
yi  xi b  x i ( X T X ) 1 X T y, with,Var ( yi )  x i ( X T X ) 1 x i 2
t
t
t
and the estimate of
( y  y) ( y  y)
T
s2 
(n  k )
2,
is
27
Matrix approach to the simple model
The simple model may be formulated in a multiple regression context as follows:
y  X   ,
where
 y1 
 1 x1 
 1 
 


 
y2 

1 x2 
 


y
,X 
,    ,   2 
 ... 
 ... ... 
 ... 
 
 


 
 yn 
 1 xn 
n 
In this context, it turns out that
n

XT X  
 nx

1 n 2

nx 
xi  x 
1



T

1
n
(X X ) 
2  , and
 n i 1
,
SXX
x

i 
 x
1 
i 1


28
WEEK 12: Time Series and Forecasting
“Standing on the past, we live the present, hoping for the future”
The Problem: A set of measurements taken at consecutive time points
constitutes a time series. A number of ways to analyze
the series, estimate the parameters, and forecast future
values is considered!
15.1.The Model: Y t = Trend + Seasonal variation (main components) +
Short term (non random) variation + Random variation
For practical reasons we present some of the popular methods with a particularly
simple example: The data below show the sales (Y) of sandwiches at three shifts
Morning – Day – Evening, for three consecutive days at a particular Store.
Labelling the time as T, we have the following table:
Day
Mon
Tue
Wed
Shift M D E M D E M D E
T
1 2 3 4 5 6 7 8 9
Y
26 53 50 34 64 60 42 73 71
The first thing to do is to plot the values to discover its basic characteristics:
From the plot we observe:
(i)
Upward Trend
(ii)
Seasonal Variation with period k=3
(iii)
Random variation
TECHNIQUES
(i)
Regression with dummies:
Yt  a  bt   c j X j   t , where the X’s are dummy variables, taking the values 1
or 0, depending on whether we are on the second period, the third, and so on; thus
reflecting any existing seasonal component!
Application for the first model, regression with dummies:
Yt     t   X 2   X 3   t
29
Time : 1, 2, …,9
Response Y=[26 53 … 71]
Matrix of Explanatory Variables and parameters
1
1
T
X 
0

0
1 1 1 1 1 1 1 1
 
 
2 3 4 5 6 7 8 9 
       ( X T X ) 1 X T y
 
1 0 0 1 0 0 1 0

 
0 1 0 0 1 0 0 1
 
Results:
Correlation Matrix: for Y, t, and the two X’s :
 1 .70 .50 .36 
 21.33 
.99 .12 .41 .29 
.70 1
 3.17 

0
.27 
.03 0  .03 .06 




C
b
Cov(b) 
.50 0
 26.17 

1
.50 
1.07
.58 






1 
1.16 
.36 .27 .50
 20.00 

T
 1 1 1   21.333
 y10 
10 11 12   3.167  53.00 
 

T

 
 
 
And finally forecasting: y  y11  F b   0 1 0   26.167   82.33
 

 
  79.33
 y12 
0
0
1
20.000

 

(ii)
The Moving Average Model
The second model, often used is the Moving Average (M.A.) of appropriate order
(usually the period of the process), to estimate the trend. If k (the period of
seasonality) is even, then we need further MA of order two to center the estimates,
so that these estimates correspond to the existing observed values. In our example
we have the seven Moving Averages of order k=3 as:
[43.00, 45.67, 49.33, 52.67, 55.33, 58.33, 62.00] .
A simple linear regression of the above smooth values versus T [2, 3, 4, 5, 6, 7, 8]
reveals the fitted line Yt  36.56  3.15t , with( R  99.8%)
Now the differences for all points [Y-(a+bT)] are calculated as
2
[-13.71, 10.13, 3.98, -15.18, 11.67, 4.51, -16.64, 11.20, 6.05]
M
D
E
M
D
E
M
D
E
To estimate the seasonal effects for the three points (shifts) we calculate the
averages of the corresponding differences. i.e.
13.71  15.17  16.64
 15.18;
3
10.13  11.67  11.20
sD 
 11.00;
3
3.98  4.51  6.05
sE 
 4.85
3
sM 
30
Finally forecasting for let say T=10, 11, 12, would be trend +seasonal component
Y10  36.56  3.15(10)  15.17  68.11  15.18  52.93; Y11  82.26; Y12  79.26
Box and Jenkins models:
For stationary series (i.e. after removing the trend) we have a variety of models
which cover identification, estimation and prediction based on existing computer
packages. These models proved to be very popular and efficient, during the
seventies, and they work reasonably well. One of the simple models which is
characteristic is an autoregressive of order p
p
Yt     jYt  j   t , where  t
N (0,  2 ), independent
j 1
The parameters γ and ϕ are estimated by the method of least squares!
Very simple linear regression models:
Due to the fact that the main components of most time series are the trend and the
seasonal effect, it is possible to fit simple models which take into account these two
main effects, for example:
Yt     Yt k   t , where  t
N (0,  2 ), independent
and k is the period of seasonality. α and β are the parameters to be estimated
from the data. In our example we take k = 3, (k=4 for quarterly data or k = 12 for
monthly data)!
So we regress
Yt= [34, 64, 60, 42, 73, 71] vs Y(t-3)= [26, 53, 50, 34, 64, 60],
to obtain the fitted line:
Yt  6.37  1.07Yt 3 , with( R 2  99.6%)
Hence the corresponding predicted values for comparison is
Y10  6.37  1.07Y7  6.37  1.07(42)  51.31; Y11  84.48; Y12  82.34
Note that some differences in decimals are due to the fact that, calculations have
been performed straight from calculator with all accuracy provided!
Examples – Exercises
1. Oil usage in a small farm is given in the table. Analyse fully the series and predict the
usage for first and second quarter of 2001
Quarter
Year
1997
1998
1999
2000
1
125
137
117
162
2
96
113
118
155
3
72
88
94
162
4
119
131
142
176
31
2. “HOPE” is a travel company which organizes package holidays that are sold
through a number of travel agents. It decides to offer the travel agents a bonus if
they can increase the number of holidays sold by 10% or more. The number of “HOPE”
holidays sold by Ajay, a travel agent, is shown in the table.
2007
2008
2009
January– May–August–January–May–August–January–May–August
145
98 121
123
85
101
118
76
74
(a) Calculate values of a suitable moving average.
(b) Plot the moving averages on the graph on page 6 and draw a trend line.
(c) (i) Estimate the seasonal effect for January–April, and hence forecast the number of holidays
Ajay will sell during January–April 2010 if current trends continue.
(ii) Hence calculate how many holidays Ajay needs to sell during January–April 2010 to exceed
current trends by at least 10%. (2 marks)
(d) Ajay argues that, if he sells 82 or more holidays during January–April 2010, he will have
exceeded the September–December 2009 sales by more than 10% and so should qualify for a
bonus. The company argues that, in order to qualify for a bonus, he will need to sell 130 or
more holidays, as he sold 118 during January–April 2009.
Suggest a suitable value for the number of holidays Ajay will need to sell during
January–April 2010 in order to qualify for a bonus. Explain why your value is fairer than either
of the values suggested by Ajay and “HOPE”.
Revision for the Final Exam
■
Download