Uploaded by njekwa kananga

QUANTITATIVE METHODS - GBS 541

advertisement
THE COPPERBELT UNIVERSITY
SCHOOL OF BUSINESS
QUANTITATIVE METHODS
EXTERNAL PROGRAMME
TAILOKA FRANK P (PROF)
CONTENTS
PAGE
1.
Introduction
1
2.
Methods of Organizing and Presenting Data
4
•
•
•
3.
Descriptive Measures
•
•
4.
36
Probability Experiment
Nature of Probability
Addition Rules
Conditional Probability and Independence
Multiplication Rules
Baye’s Theorem
70
Binomial
Poisson
Normal
Sampling and Sampling Distribution
•
•
•
7.
Measures of Central Tendency
Measures of Variability
Probability Distributions
•
•
•
6.
14
Probability
•
•
•
•
•
•
5.
Frequency Tables
Bar Charts
Pie Charts
97
Distribution and Sample mean
Distribution of proportion
Distribution of Sums
Estimation
•
•
•
105
Point Estimates
Interval Estimates
The t-distribution
Hypothesis Testing
112
Type I and II Errors
Hypothesis Tests
Application
2
9.
Analysis of Variance
•
•
•
10.
11.
The F-distribution
Tests under Analysis of Variance
Application
Time Series
•
•
•
148
Components of Time Series
Isolating Time Series Components
Application
Index Numbers
•
•
123
167
Construction of Index numbers
Uses of Index Number
12.
Assignments
3
CHAPTER 1
INTRODUCTION TO STATISTICAL ANALYSIS
Reading
Newbold 1.1, 1.3, parts of 1.2.
Anderson, Sweeney, and Willians Chapter 1
Wonnacott and Wonnacott Chapter 1
James T Mc Clave, P. George Benson Chapter 1
Introductory Comments
This Chapter sets the framework for the book. Read it carefully, because the ideas
introduced are a basis to this subject and research Methodology.
1.
Random Sampling, Deductive and Inductive Statistics.
Random Sampling
Only in exceptional circumstance is it possible to consider every member of the
population. In most cases only a sample of the population can be considered and
the results contained from this sample must be generalized to apply to the
population.
In order that these generalizations should be accurate the sample must be random,
that is, every possible sample has an equal chance of selection and the choice of a
member of the sample must not be influenced by previous selection, this is simple
random sampling.
Example 1
Suppose that a population consists of six measurements, 1, 2, 3, 4, 5, and 7. List
all possible different samples of two measurements that could be selected from
the population. Give the probability associated with each sample in a random
sample of n =2 measurement selected from the populations.
Solution
All possible samples are listed below
4
Sample
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Measurements
1,2
1,3
1,4
1,5
1,7
2,3
2,4
2,5
2,7
3,4
3,5
3,7
4,5
4,7
5,7
Now let us suppose that I draw a single sample of n = 2 measurement from the 15
possible sample of two measurements. The sample selected is called a random sample if
every sample had an equal probability (1/15) being selected.
It is rather unlikely that we would ever achieve a truly random sample, because the
probabilities of selection will not always be exactly equal. But we do the best we can.
One of the simplest and most reliable ways to select a random sample of n measurements
from a population is to use a table of random numbers (See Appendix vii). Random
number tables are constructed in such a way that, no matter where you start in the tables
no matter what direction you move, the digits occur randomly and with equal probability.
Thus if we wished to choose a random sample of n = measurements from a population
containing 100 measurements, we could label the measurements in the population from
0 to 99 (or 1 to 100). Then referring to Appendix Vii and choosing a random starting
point, the next 10 two-digit numbers going across the page would indicate the labels of
the particular measurements to be included n the random sample. Similarly, by you
moving up or down the page, we would also obtain a random sample.
Example 2
A small community consists of 850 families. We wish to obtain a random sample of 20
families to ascertain public acceptance of a wage and price freeze. Refer to Appendix vii
to determine which families should be sampled.
Solution
Assuming that a list of all families in the community is available (such as a telephone
directory), we could label the families from 0 to 849 (or equivalently, from 1 to 850).
Then referring to the Appendix, we choose a starting point. Suppose we have decided to
start at line 1, column 4. Going down the page we will choose the first 20 three digit
numbers between 000 and 849 from Table B, we have
290
207
424
367
219
065
302
541
0.78 466
454
607
083
6.42 219
254
068
462
160
823
These 20 members identify the 20 families that are to be included in our example/
5
Deductive and Inductive Statistics.
The reasoning that is used in statistics hinges on understanding two types of logic,
namely deductive and inductive logic. The type of logic that reasons from the particular
(sample) to the general (Population) is known as inductive logic, while the type that
reasons from the general to the particular is known as deductive logic.
Learning Objectives
After working through this chapter, you should be able to:
•
Explain what random sampling is
•
Explain the difference between a population and a sample
6
CHAPTER 2
METHODS OF ORGANISING AND PRESENTING DATA
Reading
Newbold Chapter 2
James T Mc Clave and P George Benson Chapter 2
Tailoka Frank P Chapter 3
Introductory Comments
This Chapter contains themes to do with the understanding of data. We find graphical
representations from the data, which allow one to easily see its most important
characteristics. Most of the graphical representation are very tedious to construct without
the use of a computer. However, one understands much more if one tries a few with
pencil and a paper.
Graphical Representations Of Data
Types of business data; methods of
frequency distribution.
representation of qualitative data, cumulative
Types of business data. Although the number of business phenomena that can be
measured is almost limitless, business data can generally be classified as one of two
types: quantitative or qualitative.
Quantitative data are observations that are measured on a numerical scale. Examples of
quantitative business data are:
i.
ii.
iii.
The monthly unemployment percentage
Last year’s sales for selected firms.
The number of women executives in an industry.
Quantitative data is one that is not measurable, in the sense that height is measured, or
countable, as people entering a store. Many characteristics can be classified only.
Examples of qualitative business data are:
i)
The political party affiliations of fifty randomly selected business executives.
Each executive would have one and only one political party affiliation.
7
ii)
The brand of petrol last purchased by seventy four randomly selected car
owners. Again, each measurement would fall into one and only one category.
Notice that each of the examples has nonnumerical or qualitative measurements.
Graphical methods for describing qualitative data.
(a)
The Bar Graph
For example, suppose a woman’s clothing store located in the downtown area of a
large city wants to open a branch in the suburbs. To obtain some information
about the geographical distribution of its present customers, the Store manager
conducts a survey in which each customer is asked to identify her place of
residence with regard to the city’s four quadrants. Northwest (NW), North east
(NE), Southwest (SW), or Southeast (SE) out of town customers are excluded
from the survey – the response of n = 30 randomly selected resident customers –
might appear as in Table 1.1 (note that the symbol n is used here and throughout
this course to represent the sample size i.e. the number of measurements in a
sample). You can see that each of the thirty measurements fall in one and only
one of the four possible categories representing the four quadrants of the city.
Table 1.1.
Customer
1
2
3
4
5
6
7
8
9
10
Customer resident Survey: n = 30
Resident
NW
SE
SE
NW
SW
NW
NE
SW
NW
SE
Customer
11
12
13
14
15
16
17
18
19
20
Residence
NW
SE
SW
NW
SW
NE
NE
NW
NW
SW
Customer
21
22
23
24
25
26
27
28
29
30
Residence
NE
NW
SW
SE
SW
NW
NW
SE
NE
SW
A natural and useful techniques for summarizing qualitative data is to tabulate the
frequency or relative frequency of each category.
Definition:
The frequency for a category is the total number of measurements that fall in the
category. The frequency for a particular category, say category i will be denoted by the
symbol f i .
The relative frequency for a category is the frequency of that category divided by the
total number of measurements; that is. The relative frequency for category I is
8
Relative frequency =
fi
n
Where n = total number of measurements in the sample
f i = frequency for the i category.
The frequency for a category is the total number of measurements in that category,
whereas the relative frequency for a category is the proportion of measurements in the
category. Table 1.2 shows the frequency and relative frequency for the customer
residences listed in Table 1.1. Note that the sum of the frequencies should always equal
the total number of measurements in the sample and the sum of the relative frequencies
should always equal 1 (except for rounding errors) as in Table 1.2.
Category
Frequency
Relative Frequency
NE
5
5/30 = .167
NW
11
11/30 = .367
SE
6
6/30 = .200
SW
8
8/30 = .267
Total
30
1
A common means of graphically presenting the frequencies or relative frequencies for
qualitative data is the bar chart. For this type of chart, the frequencies (or relative
frequencies are represented by bars-one bar for each category.
The height of the bar for a given category is proportional to the category frequency (or
relative frequency). Usually the bars are placed in a vertical position with the base of the
bar on the horizontal axis of the graph. The order of the bars on the horizontal axis is
unimportant. Both a frequency bar chart and a relative frequency bar chart for the
customer residence Example are shown in Figure 1.1.
10
Relative
Frequency
5
Frequency
0
NE
NW
SE
SW
Residential quadrant
9
a)
A frequency bar chart.
.50
.25
0
NE
NW
SE
SW
Residential Quadrant
b)
b)
A Relative Frequency bar char.
Figure 1.1
The Pie Chart
The second method of describing qualitative data sets is the pie chart. This is
often used in newspaper and magazine articles to depict budgets and other
economic information. A complete circle (the pie) represents the total number of
measurements. This is partitioned into a number of slices with one slice for each
category. For example, since a complete circle spans 360o, if the relative
frequency for a category is .30, the slice assigned to that category is 30% of 360
or (.30) (36) = 108o.
108o
Figure 1.2 The portion of a pie char corresponding to a relative frequency of .3.
10
Graphical Methods for Describing Quantitative Data.
The Frequency Histogram and Polygon.
The histogram (often called a frequency distribution) is the most popular graphical
technique for depicting quantitative data. To introduce the histogram we will use thirty
companies selected randomly from the 1980 Financial Magazine (the top 500 companies
in sales for calendar year 1979). The variable X we will be interested in is the earnings
per share (E/S) for these thirty companies. The earnings per share is computed by
dividing the year’s net profit by the total number of share of common stock outstanding.
This figure is of interest to the economic community because it reflects the economic
health of the company.
The earnings per share figures for the thirty companies are shown (to the nearest ngwee)
in Table 1.3.
Company
1
2
3
4
5
6
7
8
9
10
E/S
1.85
3.42
9.11
1.96
6.48
5.72
1.72
.8.56
0.72
6.28
Company
11
12
13
14
15
16
17
18
19
20
E/S`
2.80
3.46
8.32
4.62
3.27
1.35
3.28
3.75
5.23
2.92
Company
21
22
23
24
25
26
27
28
29
30
E/S
2.75
6.58
3.54
4.65
0.75
2.01
5.36
4.40
6.49
1.12
How to construct a Histogram
1.
Arrange the data in increasing order, from smallest to largest measurement.
2.
Divide the interval from the smallest to the largest measurement into between five
and twenty equal sub-intervals, making sure that:
a)
Each measurement falls into one and only one measurement class.
b)
No measurement falls on a measurement class boundary.
Use a small number of measurement classes if you have a small amount of
data; use a larger number of classes for large amount of data.
3.
Compute the frequency
measurement class.
(or relative frequency) of measurements in each
4.
Using a vertical axis of about three-fourths the length of the horizontal axis, plot
each frequency (or relative frequency) as a rectangle over the corresponding
measurement class.
11
Using a number of measurements, n = 30, is not large, we will use six classes to
span the distance between the smallest measurements, 0.72, and the largest
measurement, 9.11. This distance divided by 6 is equal to
Largest measurement – smallest measurement
Number of intervals
=
≅
9.11 – 0.72
6
1.4
By locating the lower boundary of the first class interval at 0.715 (slightly below the
smallest measurement) and adding 1.4, we find the upper boundary to be 2.115. Adding
1.4 again, we find the upper boundary of the second class to be 3.515. Continuing this
process, we obtain the six class intervals shown in the table below. Note that each
boundary falls on a 0.005 value (one significant digit more than the measurement), which
guarantees that no measurement will fall on a class boundary.
The next step is to find the class frequency and calculate the class relative frequencies
Class
1
2
3
4
5
6
Measurement
Class
0.715 – 2.115
2.115 – 3.515
3.515 – 4.915
4.915 – 6.315
6.315 –7.715
7.715 – 9.115
Total
Class
Frequency
8
7
5
4
3
3
Class relative
Frequency
8/30 = .267
7/30 = .233
5/30 = .167
4/30 = .133
3/30 = .100
3/30 = .100
30
1.00
Table 1.4
Definition
The class frequency for a given class, say class i, is equal to the total number of
measurements that fall in that class. The class frequency for class I is denoted by the
symbol f i .
Definition
The class relative frequency for a given class, say class i, is equal to the class frequency
divided by the total number n of measurement, i.e.
Relative frequency for class i =
fi
n
12
8
6
4
2
0
a)
0.517 2.115
Earnings per share
Frequency Histogram.
3.515 4.915
6.315 7.715
9.115
.3
.2
.1
0.715
(b)
2.115 3.515 4.915 6.315 7.715 9.115
Earnings per share
Relative Frequency histogram
Cumulative Frequency Distribution
It is often useful to know the number or the proportion of the total number of
measurements that are less than or equal to those contained in a particular class. These
quantities are called the class cumulative frequency and the class cumulative relative
frequency respectively.
13
For example, if the classes are numbered from the smallest to the largest values of x, 1, 2,
3, 4, . . . , then the cumulative frequency for the third class would equal the sum of the
class frequencies corresponding to classes 1, 2, and 3.
Cumulative frequency for class 3 = f1 + f 2 + f 3
Similarly, cumulative relative frequency for class 3 =
f1 + f 2 + f 3
where n is the total
n
number of measurements in the sample.
Cumulative frequencies and cumulative relative frequencies for earning per share data.
Class No.
Measurement
class
Class
Frequency
Cumulative
frequency
Class Relative Class
Frequency
Cumulative
Relative
Frequency
1
0.715 - 2.115
8
8
8/30 = .267
8/30 =.267
2
2.115 – 3.515
7
(8 + 7) = 15
7/30 = .233
15/30 = .500
3
3.155 – 4.915
5
(15 + 5) = 20
5/30= .167
20/30 = .667
4
4.915 – 6.315
4
(20 + 4) = 24
4/30 = .133
24/30 = .800
5
6.315 – 7.715
3
(24 + 3) = 27
3/30 = .100
27/30 = .900
6
7.715 – 9.115
3
(27 + 3) = 30
3/100 = .100
30/30 = 1.00
30
Cumulative relative frequency Distribution for earnings per share data.
1.0
Cumulative
Relative
.8
Frequency
.6
.4
.2
0.715
2.115 3.115 4.915 6.315
Earnings per share
7.715
9.115
14
Learning Objective
After working through this Chapter you should be able to:
•
Draw a pie chart, bar
frequencies, histogram.
•
Interpret the diagrams. You will understanding the importance of captions, axis
labels and graduation of axes.
chart and also construct frequency tables, relative
15
16
CHAPTER 3
DESCRIPTIVE MEASURES
Reading
Newbold Chapter 2
Wonnacott and Wonnacolt Chapter 2
Tailoka Frank P. Chapter 4
James T McClave , Lawrence Lapin L and P George Benson Chapter 3
Introductory Comments
This Chapter contains themes which allow one to easily se the most important
characteristics of data. The idea is to find simple numbers like the mean, variance which
will summarize those characteristics.
3.
Numerical Description of Data.
The Mode; A measure of Central tendency.
Definition.
The mode is the measure that occurs with the greatest frequency in the data set.
Because if emphasizes data concentration, the mode has application in marketing
as well as in description of large data sets collected by state and federal agencies.
Unless the data set is rather large, the mode may not be very meaningful. For
example, consider the earning per share measurements for the thirty financial
companies we used in the previous chapter. If you were to re-examine these data,
you would find that none of the thirty measurements is duplicated in this sample.
This, strictly speaking, all thirty measurements are mode for this sample.
Obviously, this information is of no practical use for data description. We can
calculate a more meaningful mode by constructing a relative frequency histogram
for the data. The interval containing the most measurements is called the modal
class and the mode is taken to be the midpoint of this class interval.
The modal class, the one corresponding to the interval 0.715 – 2.115 lies to the left side
of the distribution. The mode is the midpoint of this interval; that is
17
Mode =
0.715 + 2.115
= 1.415
2
In the sense that the mode measures data concentration, it provides a measure of central
tendency of the data.
The Arithmetic mean
A measurement of Central Tendency
The most popular and best understood measure of central Tendency for a quantitative
data set is the arithmetic (or simply the mean):
Definition
The mean of a set of quantitative data is equal to the sum of the measurements divided by
the number of measurement contained in the data set. The mean of a sample is denoted
by x (read “x bar”) and represent the formula for this calculation as follows:-
Example 1
Calculate the mean of the following five simple measures,. 5, 3, 8, 5,6.
Solution
Using the definition of the sample mean and demand shorthand notation we find
5
x=
∑
1=1
5
xi =
5 + 3 + 8 + 5 + 6 27
=
= 5 .4 .
5
5
The mean of this sample is 5.4
The sample mean will play an important role in accomplishing our objective of making
inferences about populations based on sample information. For this reason it is important
to use a different symbol when we want to discuss the mean of a population of
measurement s i.e. the mean of the entire set of measurements in which we are interested.
We use the Greek letter µ (“mu”) for the population mean
The Median: Another measure of Central Tendency
The median of a data set is the number such that half the measurements fall below the
median and half fall above. The median is of most value in describing large data sets. If
the data set is characterized by a relative frequency histogram, the median is the point on
the x-axis such that half the area under the histogram lies above the median and half lies
below. For a small, or even a large but finite, number of measurements, there may be
18
many numbers that t satisfy the property indicated in the figure on the next page. For this
reason, we will arbitrarily calculate the media of a data.
Calculating a median
1.
2..
If the number of n of measurements in a data set is odd, the median is the middle
number when the measurements are arranged in ascending (or descending) order.
If the number of n of measurements is even, the median is the mean of the two
middle measurements when the measurements are arranged in ascending (or
descending) order.
Example 2
Consider the following sample if n = 7 measurements.
5, 7, 4, 5, 20, 6, 2
a)
b)
Calculate the median of this sample
Eliminate the last measurement (the 2) and calculate the median of the remaining
n = 6 measurements.
Solution
a)
The seven measurements in the sample are first arranged in ascending order
2, 4, 5, 5, 6, 7, 20
Since the number of measurements is odd, the median is the middle measure.
Thus, the median of this sample is 5.
b)
After removing the 2 from the set of measurements, we arrange the sample
measurements in ascending order as follows:
4, 5, 5, 6, 7, 20
Now the number of measurements is even, and so we average the middle two
measurements. The median is (5+6)/2 = 5.5.
Comparing the mean and the median
1.
If the median is less than the mean, the data set is skewed to the right.
Relative
Frequency
Median
Rightward Skewness
Mean
measurement units
19
sKewness =
=
2.
Mean − Mode
S tan dard deviation
3(mean − median)
s tan dard deviation
The median will equal the mean when the data set is symmetric.
Median
Mean
Measurement unit
Symmetry
3.
If the median is greater than the mean, the data set is skewed to the left.
Mean
Median
The range: A measure of variability
Measures of Variation
Definition:
The range of a data. Set is equal to the largest measurement minus the smallest measure.
When dealing with grouped data, there are two procedures which are not adopted for
determining the range.
1.
Range = class mark of highest class – class mark of lowest class.
2.
Range = upper class boundary of highest class – lower class boundary of lowest
class.
20
Variance and Standard Deviation
The Sample Variance for a sample of n measurements is equal to the squared distances
from the mean divided by (n-1). In symbols using S 2 to represent the simple variances.
n
S2 =
∑ ( x − x)
i =1
2
i
n −1
The second step in finding a meaningful measure of data variability is to calculate the
standard deviation of the data set.
The sample standard deviation , s, is defined as the positive square root of the sample
variance, S 2 thus,
n
S = S2 =
∑ ( x − x)
i =1
2
i
n −1
The corresponding quantity, the population standard deviation, measure the variability of
the measurements in the population and is denoted by σ (‘sigma’). The population
variances will therefore be denoted by σ 2 .
Example 3
Calculate the standard deviation of the following sample. 2, 3, 3, 3, 4.
Solution
For this set of data, x = 3. Then
S=
=
(2 − 3) 2 + (3 − 2) 2 +(3 − 3) 2 + (4 − 3) 2
5 −1
2
= 0.5 = 0.71
4
Shortcut formular for simple variance
21
S2 =
( sum of square of sample measurement ) −
n −1
 n 
 ∑ x1 
n
2
xi −  i =1 
∑
n
i =1
n −1
( sum of sample measurement )
n
2
2
Example 4
Use the shortcut formula to compute the variances of these two samples of five measures
each.
Sample 1:
1, 2, 3, 4, 5
Sample 2:2, 3, 3, 3, 4
Solution
We first work with sample 1. The quantities needed are:
n
∑x
i =1
1
5
∑x
i =1
2
1
= 1 + 2 + 3 + 4 + 5 = 15,
and
= 12 + 2 2 + 32 + 4 2 + 52
= 1 + 4 + 9 + 16 + 25 = 55
2
 5 
 ∑ xi 
n
(15) 2
2
x1 −  i =1 
55 −
∑
5
5
=
S 2 = i =1
4
5 −1
55 − 45 10
=
= 2 .5
4
4
Similarly, for sample 2 we get
5
∑x
i =1
i
= 2 + 3 + 3 + 3 + 4 = 15
22
5
∑x
Add
i =1
= 2 2 + 32 + 32 + 32 + 4 2 = 4 + 9 + 9 + 9 + 16 = 47
2
1
Then the variance for sample 2 is
2
 5 
 ∑ xi 
n
(15) 2
2
x1 −  i =1 
47 −
∑
5
5
S 2 = i =1
=
5 −1
4
=
47 − 45 2
= = 0 .5
4
4
Example 5
The earnings per share measurements for thirty companies selected randomly from 1980
Financial/Daily mail are listed here. Calculate the sample variance S 2 and the standard
deviation, S, from these measurements.
1.85
3.42
9.11
1.96
6.48
5.72
1.72
8.56
0.72
6.28
2.80
3.46
8.32
4.62
3.27
1.35
3.28
3.75
5.23
2.92
2.75
6.58
3.54
4.65
0.75
2.01
5.36
4.40
6.49
1.12
Solution
The calculation of the sample variance , S 2 , would be very tedious for this example if we
tried to use the formula.
30
S2 =
∑ (x
i =1
i
− x) 2
30 − 1
Because it would be necessary to compute all thirty squared distances from the mean,
however, for the shortcut formula we need only compute:
23
30
∑x
i =1
i
30
∑x
i =1
2
i
= 1.85 + 3.42 + . . . + 1.12 = 122.47 and
= (1.85) 2 + (3.42) 2 + . .
.
+ (1.12) 2 = 6.57.5239
2
 30 
 ∑ x1 
30
(122.47) 2
 i =1 
2
x
−
657
.
5239
−
∑ i 30
30
S 2 = i =1
=
30 − 1
29
= 5.4331
Notice that we retained four decimal places in the calculation of S 2 to reduce rounding
errors, even though the original data were accurate to only two decimal places.
The standard deviation is
S = S 2 = 5.4331 = 2.33
Interpreting the Standard Deviation
If we are comparing the variability of two samples selected from a population , the
sample with the larger standard deviation is the more variable of the two. Thus, we know
how to interpret the standard deviation on a relative or comparative basis, but we have
not explained how it provides a measure of variability for a single sample.
One way to interpret the standard deviation as a measure of variability of a data set would
be to answer questions each as the following. How many measurements are within 1
standard deviation of the mean? How many measurements are within 2 standard
deviation of the mean? For a specific data set, we can answer the questions by counting
the number of measurements in each of the intervals. However, if we are interested on
obtaining a general answer to these questions, the problem is more difficult. There are
two guidelines to help answer the questions of how many measurements fall within 1, 2,
and 3 standard deviations of the mean. The first set, which applied to any sample is
derived from a theorem proved by the Russian Mathematician Chebyshev. The second
set, the Empirical Rule is based on empirical evidence that has accumulated over time
and applies to samples that posses mould shaped frequency distributions those that are
approximately symmetric, with a clustering of measurement about the mid point of the
distribution (the mean, median and mode should all be about the same) and that laid off
as we move away from the center of the histogram.
24
Aids to the Interpretation of a Standard deviation.
1.
2.
A rule (from Chebyshev’s theorem) that applied to any sample of measure
regardless of the shape of the frequency distribution.
a.
It is possible that none of the measurements will fall within 1 standard
deviation of the means ( x − S to x +S ).
b.
At least ¾ of the measurement will fall within 2 standard deviations of the
mean ( x − 2 S to x + 2 S ).
c.
At least 8/9 of the measurements will fall within 3 standard deviations of
the mean ( x − 3S to x + 3S ).
A rule of thumb, called the empirical rule, that applies to samples with frequency
distributions that are mould-shaped:
a)
Approximately 68% of the measurements will fall within 1 standard
deviation of the mean ( x − S to x +S ).
b)
Approximately 95% of the measurements will fall within 2 standard
deviations of the mean ( x − 2 S to x + 2 S ).
c)
Essentially all the measurements will fall within 3 standard deviations of
the mean ( x − 3S to x + 3S ).
Example 6
Refer to the data for earnings per share for thirty companies selected randomly from the
1980 Financial/Daily Mail . x = 4.08, S = 2.33. Calculate the fraction of the thirty
measurements that lie within the intervals x + S , x + 2 S , and x + 3S , and compare the
results with those of the Chebyshev and Empirical rule.
Solution
x − S , x + S ) = (4.08 − 2.33, 4.08 + 2.33) = (1.75, 6.41)
A check of the measurements show that 19 of the 30 measurements i.e., approximately
63% are within 1 standard deviation of the mean.
( x − 2 S , x + 2 S ) = (4.08 − 4.66, 4.08 + 4.66) = (0.58, 8.74)
25
Contains 29 measurements, or approximately 97% of the n = 30 measurements. Finally
the 3 standard deviation interval around x
( x − 3S , x + 3S ) = (4.08 − 6.99, 4.08 + 6.99) = (−2.91, 11.07).
Contains all the measurements. These 1, 2 and 3 standard deviations percentages (63, 97,
and 100) agree fairly well with the approximations of 68%, 95% and 100%, given by the
Empirical Rule for mould-shape distributions.
Example 7
The aid for interpreting the value of a standard deviation can be put to an immediate
practical use as a check on the calculation of the standard deviation. Suppose you have a
data set for which the smallest measurement is 20 and the largest is 80. You have
calculated the standard deviation of the data set to be S = 190.
How can you use the Chebyshev or empirical rule to provide a rough check on your
calculated value of S?
Solution
The larger the number of measurements in a data set, the greater will be the tendency for
very large or very small measurements (extreme values) to appear in the data set. But
from the Rules, you know that most of the measurements (approximately 95% if the
distribution is mould-shaped) will be within 2 standard deviations of the mean, and
regardless of how many measurements are in the data set, almost all of them will fall 3
standard deviations of the mean. Consequently we would expect the range to be between
4 and 6 standard deviations – i.e. between 4s and 6s.
Range – largest measurement – smallest measurement = 80 – 20 = 20.
x − 2S
x
x + 2S
Range 4S
26
The relation between the range and the Standard deviation.
Then if we let the range equal 6S, we obtain
Range
60
S
=
=
=
6S
6S
10
Or, if we let the range equal 4s, we obtain a larger (and more conservative) value for S,
namely
Range =
60
=
S
=
4s
6s
15
Now you can see that it does not make much difference whether you let the range equal
4S (which is more realistic for most data set) or 6S (which is reasonable for large data
sets). It is clear than your calculated value,, S = 190, is too large, and you should check
your calculations.
Calculating a mean and standard Deviation from Grouped data
If your data have been grouped in classes of equal width and arranged in a frequency
table, you can use the following formulas to calculate x , S2, and S
xi = midpoint of the ith class
f i = Frequency of the ith class
K = Number of classes
K
x=
∑x f
i i
i =1
n
K

∑ xi f i 
K

x12 f i −  i =1
∑
n
S 2 = i =1
n −1
2
S = S2
Example 8
Compute the mean and standard deviation for the earnings per share data using the
grouping shown in the frequency table 1.4.
27
Solution
The six class interval, midpoints, and frequencies are shown in the accompanying table.
Class
Class Midpoint
0.715 – 2.115
1.415
Class frequency
fi
8
2.115 – 3.515
2.815
7
3.515 – 4.915
4.215
5
4.915 – 6.315
5.615
4
6.315 – 7.015
7.015
3
7.715 – 9.115
8.415
3
n = ∑ f i = 30
K
x=
∑x f
i i
= (1.415)(8) + (2.815)(7) + (4.215)(5) + . . . + (8.415)(3) / 30
n
120.85
=
= 4.03
30
i =1
K

∑ xi f i 
K

x12 f i −  i =1
∑
n
S 2 = i =1
n −1
2
K
We found
∑x f
i =1
i i
= 120.85 when we calculated x, therefore
((1.415) 2 (8) + (2.815) 2 (7) + . . . + (8.415) 2 (3)) − (120.85)3 / 30
30 − 1
646.49875 − 486.82408
=
29
= 5.5060
S2 =
S = 5.5060 = 2.35.
28
You will notice that values of x, S 2 , and S from the formulas for grouped data usually do
not agree with these obtained for the raw data ( x = 4.03 and S = 2.311). This is because
we have substituted the value of the class mid point for each value of x in a class
interval. Only when every value of a x in each class is equal to its respective class
midpoint will the formulas for grouped and for ungrouped data give exactly the same
answers for x, S 2 , and S. otherwise, the formulas for grouped data will give only the
approximations to these numerical descriptive measures.
Measures of Relative Standing
Descriptive measures of the relationship of a measurement to the rest of the data are
called measure of relative standing.
One measure of relative standing of a particular measurement is its percentile ranking.
Definition
Let x1 , x2 , . . . , xn be a set of n measurements arranged in increasing (or decreasing)
order. The pth percentile is a number x such that p% of the measurements fall below the
pth percentile and (100 – p)% fall above it.
For example. if oil company A report that its yearly sales are in the 90th percentile of all
companies in the industry, the implication is that 90% of all oil companies have yearly
sales less that A’s, and only 10% have yearly sales exceeding company A’s.
Relative
Frequency
.90
.10
Company A’s sales. Yearly sales.
Another measure of relative standing in popular use is the Z-score. The Z-Score makes
use of the mean and standard deviation of the data set in order to specify the location of a
measurement.
29
Definition
The sample Z-score for a measurement x is
Z=
x−x
S
The population Z-Score for a measurement x is
Z=
x−µ
σ
The Z-score represents the distance between a given measurement x and the mean
expressed in standard units.
Example 9
Suppose 200 steel workers are selected, and the annual income of each is determined.
The mean and standard deviation are x = K14,000, S = K 2,000
Suppose Chipo’s annual income is K12,00 what is his sample Z-score?
K8,000
x − 3S
K12,000
x
K14,000
x
K20,000
x + 3S
Annual income of steel workers.
Solution
Chipo’s annual income lies below the mean income of the 200 steel workers.
We compute Z =
x − x 12000 − 14000
=
= −1.0
S
2000
Which tells us that Chipo’s annual income is 1.0 standard deviation below the sample
mean, in short, his sample Z-score is –1.0.
Example 10
Suppose a female bank executive believes that her salary is low as a result of sex
discrimination. To try to substantiate her belief, she collects information on the salaries
of her counterparts in the banking business. She finds that their salaries have a mean of
K17,000 and a standard deviation of K1,000. Her salary is K13,500. Does this
information support her claim of sex discrimination?
30
Solution
The analysis might proceed as follows: First, we calculate the Z-score for the woman’s
salary with respect to those of her male counterparts. Thus
Z=
13500 − 17000
= −3.5
1000
The implication is that the woman’s salary is 3.5 standard deviations below the mean of
the male distribution. Further more, if a check of the male salary data shows that the
frequency distribution is mould-shaped, we can infer that very few salaries in this
distribution should have a Z-score less than –3, as shown in the figure on the next page.
Relative
Frequency
Z-Score = -3.5
13.500
17,000
Salary (K)
Male Salary Distribution
Therefore, a Z-score of –3.5 represents either a measurement from a distribution different
from the male salary distribution or a very unusual (highly improbable) measurement for
the male salary distribution.
Well, which of the two situations do you think prevails? Do you think the woman’s
salary is simply an usually low one in the distribution of salaries, or do you think her
claim of salary discrimination is justified? Most people would probably conclude that
ther salary does not come from the male salary distribution.
31
However, the careful investigator should require more information before inferring sex
discrimination as the case. We would want to know more about the data collection
technique the woman used, and more about her competence at her job. Also perhaps
other factors like the length of employment should be considered in the analysis.
Learning Objectives
After working through this Chapter you should be able to
•
Calculate the arithmetic mean, standard deviation, variance, median, quartiles for
grouped or ungrouped data.
•
Explain the use of all the above quartiles.
Sample Examination Questions
1.
(a)
(b)
Briefly state, with reasons, the type of chart which would best convey the
information for each of the following:
(i)
Students at the University classified by programme of study.
(ii)
Members of a professional association classified by age.
(iii)
Numbers of cars taxed for 2002, 2003 and 2004 in areas A, B and
C of a city.
The weekly cost (K) of rented accommodation was recorded for 100
students living in an area.
Amount in Thousand of
Kwachas
0–4
5–9
10 – 14
15 – 19
20 – 24
25 - 29
Frequency
3
17
24
31
19
6
(i)
Draw a histogram.
(ii)
Give the median and the interquartile range.
(iii)
Calculate the mean, mode, and standard deviation.
(iv)
What conclusions can you draw from the data?
32
2.
3.
The data below are per capita per week numbers of cigarettes sold for 38 states in
a country.
19.20
26.82
19.24
27.18
25.96
30.14
29.27
21.10
28.91
29.92
29.64
21.94
22.58
29.92
26.91
43.40
30.18
23.86
28.56
24.75
24.32
24.78
22.17
20.96
27.38
24.44
26.89
41.46
21.08
23.57
15.80
32.10
24.44
29.04
31.34
29.60
23.12
17.08
(a)
Plot the data using an approximate graphical method.
(b)
Give the mean, the median and the mode.
(c)
Assuming this is a normal distribution, and given a standard deviation of
these figures of 4.387, what proportion of the states would expect to have
more than 20 cigarettes smoked per capita per week?
(d)
How does this compare with the actual situation as shown in the table
above?
(a)
Briefly state, with reasons, the type of chart which would best convey in
each of the following:
(b)
(i)
A country’s total import of cigarettes by source.
(ii)
Students in higher education classified by age.
(iii)
Number of students registered for secondary school in year 2001,
2002 and 2003 for areas X, Y, and Z of a country.
The weekly cost (K’000) of rented accommodation was recorded for 40
students living in an area.
(i)
35
56
33
30
31
55
29
27
21
32
43
33
29
27
30
29
26
26
27
26
35
32
28
27
31
27
33
24
27
28
33
49
22
19
46
36
26
38
36
55
Summarize the data in a frequency distribution table.
33
4. (a)
(ii)
Calculate the mean and the standard deviation from your frequency
table.
(iii)
Plot a histogram for these data. What is the value of the median?
(iv)
What conclusions can you draw from these data?
Given below is a sample of 25 observations, calculate:
(i)
The range
(ii) The arithmetic mean
(iii)
The median
(iv)
The lower quartile
(v)
The upper quartile
(vi)
The quartile deviation
(vii)
The mean deviation
(viii) The standard deviation
5
18
29
42
50
61
8
20
33
43
54
63
10
21
35
46
56
67
11
25
39
48
58
69
14
(b)
5.
Explain the term ‘measure of dispersion’ and state briefly the advantage and
disadvantage of using the following measures of dispersion:
(i)
Range
Mean deviation
(ii)
(iii) Standard deviation
A machine produces the following number of rejects in each successive period of
five minutes.
20
84
16
26
27
(a)
55
58
25
42
42
58
7
55
57
13
40
40
43
73
28
15
41
22
27
24
28
67
66
66
37
21
28
32
7
34
29
19
29
23
27
30
26
11
17
24
17
26
21
35
12
(b)
Construct a frequency distribution from these data, using seven class
intervals of equal width.
Using the frequency distribution, calculate:
(i) the mean
(ii) the standard deviation
(c)
Briefly explain the meaning of your calculated measures.
34
CHAPTER 4
PROBABILITY
Reading
Newbold Chapter 3
Tailoka Frank P Chapter 8
Wonnacott and Wonnacolt Chapter 3
Introductory Comments
Probability is more abstract than other parts of this subject, and solving the problems may
be difficult. The concepts are very important for statistics because it is the rules of
probability that allow one to reason about uncertainty. Independence and conditional
probability are important to understand clearly for the purpose of statistical investigation.
4.
Elementary Probability
Counting Techniques. Introduction of the probability concept. The event and the
event relationships. Probability trees, conditional probability and statistical
independence.
Counting techniques: In calculating probabilities, it is essential to be able to work
out n(s) and n(E) as straight-forwardly as possible.
Permutations and
combinations are very helpful here. We begin with the following basic principle.
Fundamental principle of counting. If two operations A, B are carried out, and
there are M different ways of carrying out A and k different ways of carrying out
B, then the combined A and B may be carried out in M x K different ways.
Example 1
Suppose a license plate contains two distinct letters followed by three digits with
the first digit not zero. How many different license places can be printed?
The first letter can be printed in 26 different ways, the second letter in 25 different ways
(since the letter printed first cannot be chosen for a second letter, the first digit in 9 ways
and each of the other two digits in 10 ways. Hence
26.25.9.10.10 = 585,000
different plates can be printed.
35
Example 2.
A toy manufacturer makes a wooden toy in two parts, the top part may be coloured red,
white or blue and the bottom part brown, orange, yellow or green. How many differently
coloured toys can be produced?
A red top part may be combined with a bottom part of any of the four possible colours.
Similarly, either a white or a blue top part may be combined with each of the four
different coloured parts. Hence the number of different coloured toys is
3 × 4 = 12
Permutations: An arrangement of a set of n objects in a given order is called a
permutation of the objects (taken all at a time). An arrangement of any r ≤ n of these
objects in a given order is called an r-permutation or a permutation of the object’s taken r
at a time.
Example 3
Consider the set of letters a, b, c and d. then
i)
bdca, dcba and acdb are permutations of the 4 letters (taken all at a time).
ii)
bad, adb and bca are permutations of 4 letters taken 3 at a time.
iii)
ad, ca, da and bd are permutations of the 4 letters taken 2 at a time.
Example 4
The telephone switchboard in the company requires two operators whose chairs
(positions) are side by side. When the telephone operators go to lunch, two of the four
Secretaries take their places. If we make a distinction between the two operator’s
positions, in how may ways can the four secretaries fill them?
We can answer this question by determining the number of possible permutations of 4
things taken 2 at a time. There are 4 secretaries, A, B, C and D, to fill the first position.
Once this position has been filled, there are only 3 secretaries to fill the second positions.
36
The figure below:
Ways to fill
First position
Ways to fill second
position
A
B
B
1
C
2
D
3
A
4
C
5
D
6
A
C
D
Counting the number of
permutations
7
B
8
D
9
A
10
B
11
C
12
The tree diagram on the page illustrates that there are 4.3 = 12 possible permutations of
four things taken two at a time. Suppose that n is the number of distinct objects from
which an ordered arrangement is to be derived, and r is the number of objects in the
arrangement. The number of possible ordered arrangements is the number of
permutations of things taken r at a time. This is written symbolically as P (n, r ) in
general, or n Pr .
P (n, r ) = n(n − 1)(n − 2). . . (n − r + 1)
→
(1)
We multiply the right hand side of equation (1) by
(n − r )!/(n − r )!
This is equivalent to multiplying by 1, we obtain
37
P (n, r ) = n(n − 1)(n − 2). . . (n − r +1)
(n − 1)!
(n − r )!
n(n − 1)(n − 2) . . . (n − r + 1)(n − r )!
(n − r )!
n!
=
(n − 1)!
=
Example 5
i)
In a stock room, 5 adjacent bins are available for storing 5 different items. The
stock of each item can be stored satisfactorily in any bin. In how many ways can
we assign the 5 items to the 5 bins?
We get the answer by evaluating P(5, 5) which is
P (5,5) =
ii)
5!
= 5.4.3.2.1 = 120
(5 − 5)!
Suppose that there are 6 different parts to be stocked, but only 4 bins are
available.
To find the number of possible arrangements, we need to determine the number of
permutations of 6 things taken 4 at a time, which is
P (6,4) =
6!
6 .5 .4 .3 .2 .1
=
= 360
(6 − 4)!
2!
Example 6
How many permutation are there of 3 objects, say, a, b and c?
There are P (3,3) =
3!
= 3!= 1.2.3 = 6 such permutations.
(3 − 3)!
These are abc, acb, bac, bca, cab, cba.
Permutation with repetitions:
The number of permutations of n objects of which n1 are alike, n2 are alike of another
kind . . . . nr are alike of a further kind, is given by
38
n!
n1!n2!. . . n −!
where n = n1 + n2 + . . . + nr
Example 7
Find the number of permutation of the would “ACCOUNTANTS”
Total number of letters in “ACCOUNTANTS” is 11 out of which there are two C’s, two
N’s, and two t’s. So the required number of permutation s
=
11!
= 2494800.
2!2!2!2!
Combinations
A combination is an arrangement of objects without regard to order.
Example 8
The combinations of the letters a, b, c, d taken 3 at a time are
{a, b, c}, {a, b, d}, (a, c, d}, (b, c, d} or simply
abc, abd, acd, bcd, Observe that the following combinations are equal.
abc, acb, bac, bca, cab, cba.
That is, each denotes the same set a, b, c
The number of combinations of n objectives taken r at a time will be denoted by
C (n, r ) or nCr .
Example 9
We determine the number of combinations of the four letters, a, b, c, d taken 3 at a time.
Note that each combination consisting of three letters determine 3! = 6 permutations of
the letters in the combination.
Combinations
Permutations
abc
abc, acb, bac, bca, cab cba
abd
abd, adb, bad, bda, dab, dba
acd
acd, adc, cad, cda, dac, dca
bcd
bcd, bdc, cbd, cbd, dbc, dcb
39
Thus the number of combinations multiplied by 3! Equals the number of permutations
c(4,3).3!= P (4,3)orC (4,3)
=
P (4,3)
3!
Now P (4,3) = 4.3.2 = 24 and 3!= 6; henceC (4,3) = 4 as noted above.
Thus C (n, r ) =
n!
r!(n − r )!
Example 10
A perfume manufacturer who makes 10 fragrances wants to prepare a gift package
containing 6 fragrances. How many combinations of fragrances are available?
The answer is
C (10,6) =
10!
10.9.8.7.6!
=
= 210
6!(10 − 6) 6!.4.3.2.1
Tree Diagrams
A tree diagram is a device used to enumerate all the possible outcomes of a sequence of
experiments where each experiment can occur in a finite number of ways. The
construction of tree diagrams is illustrated in the following examples.
40
Example 11
Find the product A x B x C where
A = {1, 2}, B{a, b, c} and C = {3, 4}. The tree diagram follows:
3
(1, a, 3)
4
(1, a, 4)
a
1
b
3
(1, b, 3)
4
(1, b, 4)
3
(1, c, 3)
4
(1, c, 4)
3
(2, a, 3)
5
(2, a, 4)
3
(2, b, 3)
4
(2, b, 4)
3
(2, c, 3)
4
(2, c, 4)
c
0
a
b
2
c
Observe that the tree is constructed from left to right, and that the number of branches at
each prints corresponds to the number of possible outcomes of the next experiment.
41
Example 12
Mumba and Ened are to play a tennis tournament. The first person to win two games in a
row or who wins a total of three games wins the tournament. The following diagram
shows the possible outcomes of the tournament.
M
M
M
M
M
E
E
E
E
0
M
E
M
M
M
E
E
E
E
Observe that there are 10 end points which corresponds to the 10 possible outcomes of
the tournament.
MM, MEMM, MEMEM, MEMEE, MEE, EMM, EMEMM, EMEME, EMEE, EE
The path from the beginning of the tree to the end point indicates who won which game
in the individual tournament.
Basic Of Probability
Given a sample spaces S, we need to assign to each event that can be obtained from S a
number, called the probability of the event. This number will indicate the relative
likelihood of the various events.
For events that are equally likely, the probability of the event can be found from the
following basic probability principle. Then the probability that event E occurs, written P
(E), is
P(E) = m
(1)
n
42
This same result can also be given in terms of the cardinal number of a set. Where n (E)
represents the number of elements in a finite set E. With the same assumptions given
above,
P(E) = n(E) .
(2)
n(S)
Example 1
Suppose a fair coin is tossed twice. The sample space is S = (HH), (HT), (TH), (TT).
Set S contains 4 outcomes, all of which are equally likely. (This makes n = 4 in the
formula (1) above.) Find the probability of the following outcomes.
a)
E =  (HT), (TH) 
Event E contains two elements, so
P (E) = 2 = 1
4
2
By this result, a head or tail will show up 1/2 of the time when a fair coin is tossed
twice.
b)
Two heads
Let event F =  (HH) be the event” two heads are observed when a fair coin is
tossed twice. Event F contains one element, so
P (F) = ¼
c)
Three heads
A fair coin tossed twice can never show three heads. If G is the event, then G =
∅, and P (G) =
0
= 0.
4
The event is impossible.
Example 2
If a single paying card is drawn at random from an ordinary 52-card bridge deck,
find the probability of each of the following events.
a)
An ace is drawn
There are four aces on the deck, out of 52 cards, so
P(ace) =
4
1
=
52 13
43
b)
A face card is drawn
Since there are 12 face cards
P (face card) =
c)
12 3
=
52 13
A spade is drawn
The deck contains 13 spaces, so
P (spade) =
d)
13 1
=
54 4
A spade or heart is drawn
Besides the 13 spades, the deck contains 13 hearts, so
P (spade or heart) =
26 1
=
52 2
Example 3
The Manager of a department store has decided to make a study on the size of purchases
made by people coming into the store. To begin he chooses a day that seems fairly
typical and gathers the following data. (Purchases have been rounded to the nearest
Kwacha) with sales tax ignored.
Amount of purchase
Number of customers
Probability (relative
frequency)
K0 and under
160
0.280
K2250 and under
K11250
K11250 and under
84
0.147
50
0.088
136
0.239
77
0.135
63
0.111
570
1.000
K13500
K13500 and under
K20250
K20250 and under
K22500
K22500 and over
44
Probability Distributions.
In example 3 the outcomes were various purchase amounts, and a probability was
assigned to each outcome. By this process, a probability distribution can be set up; that is
to each possible outcome of an experiment, a number, called the probability of that
outcome, is assigned.
Example 4
Set up a probability distribution for the number of heads observed when a fair coin is
tossed twice.
_______________________________________
Number of heads
Probability
_______________________________________
0
1
4
1
2
4
2
1
4
_________
Total
1
_______________________________________
The probability distribution that were set up suggest the following properties of
probability.
Let S =  S1, S2, S3, …, Sn  be the sample space obtained from the union of n distinct
simple events S1 , (S2  , S3  ,…, Sn with associated probabilities P1, P2, P3, …,
Pn. Then
1.
0 ≤ P1 ≤ 1, 0 ≤ P2 ≤ 1, …, 0 ≤ Pn ≤ 1
(All probabilities are between 0 and 1 inclusive);
2.
P1 + P2 + P3 + … + Pn = 1;
(The sum of all probabilities for a sample space is 1.);
3.
P (S) = 1
45
Addition Principle
Suppose E =  S1, S2, ..., Sn  , where  S1 , S2 , S3 , ..., Sm  are distinct simple
events then
P (E) = P( S1  ) + P( S2  ) + ... + P ( Sm  )
Example 5
Refer to the previous Example and find the probability that a customer spends at least
K11,250 but less than K20250.
This event is union of two simple events spending K11,250 to K20,250. The probability
of spending at least K11,250 but less than K20,250 can thus be found by the addition
principle. Let this event A, then
P (A ) = P(Spending K11250 − K13500) + P(spending K13500 -K20250)
Addition for Mutually Exclusive Events .
For mutually exclusive events E and F
P (EUF) = P(E) + P(F)
Example 6
Use the probability distribution of Example 5 to find the probability that we get at least
one head on tossing a fair twice.
Event E “At least one head” is the union of three mutually exclusive events, two
heads, one head one tail and one tail one head.
P(E) = P(2 heads) + 2P(one head one tail)
=
1 2 3
+ =
4 4 4
Complement: P(E ') = 1 - P(E ) and P(E) = 1 - P(E ')
In a particular experiment, P(E) =
P(E') = 1 - P(E) = 1 −
3
.
8
Find P(E')
3 5
= .
8 8
46
Example 7
In example 3 above, find the probability that a customer spends less than K22500. Let E
to be the event “a customer spends less than K22500”.
P(E) = 0.281 + 0.147 + 0.088 + 0.2394 + 0.135 = 0.889
Alternatively E' is the event that “a customer spends K22500 and over” from the table.
P(E') = 0.111, and 1-P( E ′ ) = P(E) = 1 - 0.111 = 0.889
Odds
The Odds in favor of an event E is defined as the ratio of P(E) to P(E') , or P(E)
P(E')
Example 8
Suppose the weather forecaster says that the probability of rain tomorrow is =
2
.
5
Find
the odds in favor of rain tomorrow.
Let E be the event “rain tomorrow”. Then E ′ is the event “no rain tomorrow”. Since
2
5
P(E) =
3
We have P( E ′ ) = . By the definition of odds, odds in favor of rain
5
3 or 3:2
= 2/5 written 2 to
3/5 .
In general, if the odds favoring event E are m to n, then
P(E) =
m
m
and P( E ′ ) =
m+n
m+n
Example 9
The odds that a particular bid will be the low bid are 8 to 13. Find the probability that the
bid will be the low bid.
47
Solution
Odds of 8 to 13 show 8 favorable chances out of 8 + 13 = 21 chances altogether.
P (bid will be low bid) =
There is a
8
8
=
8 + 13 21
13
chance that the bid will not be the low bid
21
Extended Addition Principle
For any two events, E and F from a sample space S,
P(EUF) = P(E) + P(F) -P(E ∩ F)
Example 10.
If a single card is drawn from an ordinary deck, find the probability that it will be red or a
face card.
Let R and F represent the events “red” and “face card” respectively. Then
P(R) =
26
12
6
, P(F) =
, and P (R ∩ F) =
52
52
52
(There are six red face cards in a deck) By the extended addition principle,
P(R∪ F) = P(R) + P(F) - P(R∩ F)
= 26 + 12 - 6 = 32 = 8
52
52
52
52
13
48
Example 11
Suppose two fair dice are rolled. Find each of the following probabilities.
a)
The first die show a 2 or the sum is 6
A
B
(1,1)
(2,1)
(3,1)
(4,1)
(5,1)
(6,1)
(1,2)
(2,2)
(3,2)
(4,2)
(5,2)
(6,2)
(1,3)
(2,3)
(3,3)
(4,3)
(5,3)
(6,3)
(1,4)
(2,4)
(3,4)
4,4)
(5,4)
(6,4)
(1,5)
(2,5)
(3,5)
(4,5)
(5,5)
(6,5)
(1,6)
(2,6)
(3,6)
(4,6)
(5,6)
(6,6)
P(A) =
6
5
1
, P(B) =
, P(An B) =
36
36
36
By the extended addition principle
P(A∪B) = P(A) + P(B) – P(A∩ B)
=
b)
6
5
1 10 5
+
−
=
=
36 36 36 36 18
The sum is 5 or the second die is 4.
P(sum is 5) =
4
6
, P(second die is 4) =
36
36
P(sum is 5 and second die is 4) =
1
36
= 9 = 1
36
4
49
CONDITIONAL PROBABILITIES
Often we are interested in how certain events are related to the occurrence of
other events. In particular, we may be interested in the probability of the
occurrence of an event given that another related event has occurred. Such
probabilities are referred to as conditional Probabilities.
The conditional Probability of event E given event F, written P(EF), is
P(EF) = P(E∩ F), P(F) ≠ 0
P(F)
Example 11
The Training Manager for a large stockbrokerage firm has noticed that
some of the of firm’s brokers use the firm’s research advice, while other
brokers tend to go with their own feelings of which stocks will go up. To
see if the research department is better than just the feelings of the brokers,
the manager conducted a survey of 100 brokers, with results as shown in
the following table.
Picked stocks
Didn’t pick stocks
Total
That went up
That went up
15
Used research
30
45
Didn’t use research
30
25
55
Totals
60
40
100
Letting A represent the event “picked stocks that went up”, and letting B represent the
event “used research”, we can find the following probabilities.
P(A) =
60
= 0.6
100
P(A') =
40
= 0.4
100
P(B) =
45
= 0.45
100
P(B') =
55
= 0.55
100
Suppose we want to find the probability that a broker using research will pick stocks that
go up. From the table above, of the 45 brokers who use research, 30 picked stocks that
went up, with
P(broker who uses research picks stocks that go up) = 30 = 0.667.
45
50
This is a different number than the probability that a broker picks stocks that go up, 0.6,
since we have additional information (the broker uses research) which reduced the
sample space. In other words, we found the probability that a broker picks stocks that go
up, A, given the additional information that the broker uses research, B. This is called the
conditional probability of event A, given that event B has occurred, written P(A/B). In
the example above,
P(AB) = P(A∩ B)
P(B)
= 30 = 0.667.
45
Product Rule: For any events E and F
P(E∩F) = P(F). P(E/F)
Example 12.
A class is
2
3
women and men. Of the women, 25% are business majors. Find the
5
5
probability that a student chosen at random is a woman business major.
Solution
Let B and W represent the events “business major” and “woman”, respectively. We want
to find P(B ∩ W) . By the product rule,
P(B ∩W) = P(W). P(BW)
Using the given information, P(W) =
2
5
= 0.4 and P(BW) = 0.25.
Thus P(B∩ W) = 0.4(0.25) = 0.10
Example 13
Suppose an investment firm is interested in the following events:
A =  Common stock in XYZ Corporation gains 10% next year
B =  Gross National Product gains 10% next year
51
The firm has assigned the following probabilities on the basis of available information.
P(AB) = 0.8, P(B) = 0.3
That is, the Investment Company believes the probability is 0.8 that the XYZ common
stock will gain 10% in the next year assuming that the GNP gains 10% in the same time
period. In addition, the company believes the probability is only 0.3 that the GNP will
gain 10% in the next year. Use the formula for calculating the probability of an
intersection to calculate the probability that XYZ common stock and the GNP gain 10%
in the next year.
Solution.
We want to calculate P(A∩B). The formula is
P(A∩B) = P(B) P(AB) = (0.3) (0.8) = 0.24
Thus, the probability, according to this investment firm, is 0.24 that both XYZ common
stock and the GNP will gain 10% in the next year.
In the previous section we showed that the probability of an event A may be substantially
altered by the assumption that the event B has occurred. However, this will not always
be the case. In some instances the assumption that event B has occurred will not alter the
probability of event A at all. When this is true, we call events A and B independent.
Events A and B are independent if the assumptions that B has occurred does
not alter the probability that A has occurred, i.e
P(AB) = P(A)
When events A and B are independent it will also be true that
P(BA) = P(B)
Events that are not independent are said to be dependent.
Example 14
The probability that interest rates will rise has been assessed as 0.8. If they do rise, the
probability that the stock market index will drop is estimated to be 0.9. If the interest
52
rates do not rise, the probability that the stock market index will still drop is estimated as
0.4. What is the probability that the stock market index will drop?
Solution
P(A) = P(Interest rates rise) = 0.8.
P(B) = P(Stock market index drops) = ?
Then, the probability of A′ , the complement of A, “interest rates do not rise”’ is P( A′ ) =
1 – 0.8 = 0.2.
P(BA) = P(stock market index dropsinterest rates rise) = 0.9
P(B A′ ) = P(stock market index dropsinterest rates do not rise) = 0.4.
By the multiplication rule
P(B n A) = P(A) P(BA) = 0.8 x 0.9 = 0.72 and
P(B n A′ ) = P( A′ ) P(B A′ ) = 0.2 x 0.4 = 0.08
P(B) = 0.72 + 0.08= 0.80
Example 15
Suppose we toss a fair die, let B be the event observe a number less or equal to 4 and A to
be the event an even number is observed. Are event A and B independent?
P(B) =
4 2
= , since B = { 1, 2, 3, 4}
6 3
P(A) =
3 1
=
since A =  2, 4, 6
6 2
P(A ∩ B) =
2 1
= where A ∩ B =  2, 4 
6 3
Now given A has occurred
53
P(BA) = P(AU B) = 1/3 = 2 = P(B)
P(A)
Similarly P(AB)
P( A B) =
½
3
P( A ∩ B ) 1 / 3 2
=
= = P( B)
P ( A)
1/ 2 3
P( A ∩ B) 1 / 3 1
=
= = P( A)
P( B)
1/ 2 2
Therefore the events A and B are independent.
If events A and B are independent, the probability of intersection of A and B equals the
product of the probabilities of A and B, i.e,
P(A∩ B) = P(A) P(B).
In the toss experiment
P(A∩B) = P(A). P(B) =
1  2 1
= =
2  3 3
Bayes’ Theorem
A posteriori Probabilities
Suppose three machines, A, B, and C, produce similar engine components. Machine A
produces 45 percent of the total components, machine B produces 30 percent, and
Machine C, 25 percent. For the usual production schedule, 6 percent of the components
produced by machine A do not meet established specifications; for machine B of machine
C, the corresponding figures are 4 percent and 3 percent. One component is selected at
random from the total output and is found to be defective. What is the probability that
the component selected was produced by machine A?
The answer to this question is found by calculating the probability after the outcomes of
the experiment have been observed.
Such probabilities are called a posteriori
probabilities as opposed to a prior probabilities – probabilities that give the likelihood
that an event will occur.
54
C
B
A
D
A∩ D
B∩D
C∩D
D is the event that a defective component is produced by machine A, machine B or
machine C.
The three mutually exclusive events A, B and C form a partition of the sample spaces.
Apart from being mutually exclusive, their union is precisely S.
The event D may be expressed as:
1.
D = ( A ∩ D) ∪ ( B ∩ D) ∪ (C ∩ D)
2.
The event that a component is defective and is produced by machine A is given
by
A ∩ D.
Thus, a posterior probability that a defective component selected was produced by
machine a is given by P ( A / D ) =
n( A ∩ D )
n( D )
P( A ∩ D)
P( D)
P( A ∩ D)
=
P ( A ∩ D ) +P ( B ∩ D )+P (C ∩ D )
P( A / D) =
(1)
Next, using the product rule, we may express
55
P ( A ∩ D ) = P ( A) P ( D / A)
P ( B ∩ D ) = P ( B )P ( D / B ), and
P (C ∩ D ) = P (C ) P ( D / C )
so that (1) may be expressed in the form
P ( A) P ( D / A)
P( A / D) =
P ( A) P ( D / A) + P ( B ) P ( D / B )+P (C ) P ( D / C )
(2)
which is a special case of a result known as Bayes Theorem.
Observe that the expression on the right of (2) involves the probabilities P(A), P(B), P(C)
and the conditional probabilities P(D/A),P(D/B), and P(D/C), all of which may be
calculated in the usual fashion. Infact, by displaying these quantities on a tree diagram,
we obtain Figure 1.0. We may compute the required probability by substituting the
relevant quantities into (2), or we may make use of the following device.
P(A/D) = Product of probabilities along the limb through A
Sum of products of the probabilities along each limb terminating at D
Step 1
Step 2
Machine
Condition
P ( A) = 0.45
A
P ( D ∩ A) = P ( A).P ( D / A)
Probability of outcome
P ( D / A) = 0.06
D
P ( B ) = 0.30
P ( D / A) = 0.94
= 0.027
D
P ( D ∩ A).P ( D / A) =
0.423
B
P ( D / B ) = 0.04
D
P ( D ∩ B ) = P ( B ).P ( D / B )
= 0.012
P (C ) = 0.25
P ( D / B ) = 0.96
D
P ( D ∩ B ) = P ( B ).P ( D / B )
=0.288
C
P ( D / C ) = 0.03
D
P ( D ∩ C ) = P (C ).P ( D / C )
= 0.0075
P ( D / C ) = 0.97
D
P ( D ∩ C ) = P (C.).P ( D / C ).
=0.2425
In either case, we obtain
56
P( A / D) =
(0.45)(0.06)
(0.45)(0.06) + (0.3)(0.04) + (0.25)(0.03)
=
0.027
0.027 + 0.012 + 0.0075
=
0.027
= 0.581
0.0465
Before looking at any further examples, let us state the general form of Baye’s Theorem.
Let A1 , A2 , . . . , An be a partition of a sample space S and let E be an event of the
experiment such that P ( E ) ≠ 0. Then the posterior probability P ( Ai / E )(1 ≤ i ≤ n) is
given by
P ( Ai / E ) =
P ( A1 ) P ( E / A1 )
P ( A1 ).P ( E / A1 )+P ( E / A2 )P ( A2 ) + . . . + P ( An ).P ( E / An )
→
(3)
Problems
1)
In a certain city, 40 percent of the people consider themselves movement for
multiparty democracy (MMD), 35 percent consider themselves to be United Party
for Nation Development (UPND) and 25 percent consider themselves to be
independents (1). During a particular election, 45 percent of the MMDs voted, 40
percent of the UPND voted and 60 percent of the independents voted. Suppose a
person is randomly selected:
a)
b)
Find the probability that the person voted.
If the person voted, find the probability that the voter is
i)
ii)
iii)
2)
MMD
UPND
Independent.
Three girls Chanda, Mumba and Chileshe, pack okra in a factory. From the batch
allotted to them Chanda packs 55%, Mumba, 30% and Chileshe 15%. The
probability that Chanda breaks some okra in a packet is 0.7, and the respective
probabilities for Mumba and Chileshe are 0.2 and 0.1. What is the probability
that a packet with broken okra found by the Checker was packed by
a)
b)
c)
Chanda?
Mumba?
Chileshe?
57
3)
A publisher sends advertising material for an accounting text to 80%
Professors teaching the appropriate Accounting Courses. Thirty percent
Professors who received this material adopted the books, as did 10%
professors who did not receive the material. What is the probability
Professor who adopts the book has received the advertising material?
of all
of the
of the
that a
Solutions
MMD
P(M) = .40
P(V/M) = .45,
Independent
P(I) = .25
P(V/I) = .60
a)
P(V) + P(M).P(V/M) + P(U).P(V/U) + P(I)P(V/I)
= .40(.45) + .35(.40) + .25(.60)
= 0.18 + 0.14 + 0.15 = 0.47
b)
i)
ii)
iii)
2.
UPND
P(U) = .35
P(V/U) = .40,
P( M / V )
P (V )
P ( M ).P (V / M )
=
P ( M ).P (V / M ) + P (U ).P (V / U ) + P ( I ).P (V / I )
0.18
=
= 0.383
0.47
P( M / V ) =
P (U ∩ V )
P (V )
P (U ).P (V / U )
=
P (V )
0.14
=
= 0.298
0.47
P (U / V ) =
P( I / V ) =
0.15
= 0.319
0.47
Chanda,
(D)
Mumba
(M)
Chileshe
(H)
P ( D ) = .55,
P ( M ) = .30
P ( H ) = .15
P ( B / D ) = 0 .7 ,
P ( B / M ) = 0.2,
P ( B / H ) = 0 .1
P ( B ) = P ( D ).P ( B / D ) +P ( M ).P ( B / M ) +P ( H ).P ( B / H )
= .55(0.7) + .30(0.2) + .15(0.1)
= 0.385 + 0.06 + 0.015 = 0.46
P ( D ).P ( B / D ) 0.385
a)
P( D / B) =
=
≅ 0.837
P( B)
0.46
58
3.
b)
P( M / B) =
P ( M ).P ( B / M ) 0.06
=
≅ 0.1304
P( B)
0.46
c)
P( H / B) =
P ( H ).P ( B / H ) 0.015
=
≅ 0.0326
P( B)
0.46
Let R be the event the Professor received material. A be the even the Professor a
adopted the book
P(R).P(A/R)
P(A/R) = 0.30
P ( A /R) = 0.10
P(R) = 0.8
P(A/ R ) = 0.10
P( R ) = 0.2
P( A / R ) = 0.90
P ( R / A) =
P ( R ∩ A)
P ( R ).P ( A / R )
=
P ( A)
P ( R ).P ( A / R ) + P ( R ).P ( A / R )
=
0.8(0.30)
0.8(0.30) + 0.2(0.10)
=
0.24
0.24
=
0.24 + 0.02 0.26
= 0.923.
Learning Objectives
After working through this Chapter, you should be able to
•
List the rules of probability.
•
Explain conditional probability, independent events and mutually exclusive
events.
•
Apply the Baye’s Theorem to find conditional probabilities
•
Define combinations, permutation and be able to apply such results to problems.
59
CHAPTER 5
PROBABILITY DISTRIBUTION
Reading
Newbold Chapters 4 (not 4.4) and only 5.5 in Chapter 5
Wonnacott and Wonnacott Chapter 4
Tailoka Frank P Chapter 9
Introductory Comments
This Chapter introduces the three useful standard distributions for two counts (Discrete
Probability distribution) and one for (Continuous probability Distribution). These are so
often used that everyone should be familiar with them. We need to know the mean, the
variance and how to find simple probabilities.
5.0
Discrete Random Variables
A random variable maybe defined roughly as a variable that takes on different
numerical values because of chance. Random variables are classified as either
discrete or continuous. A discrete random variable is one that can take on only a
finite or countable number of distinct values.
A random variable is said to be continuous in a given range if the variable can
assume any value in that range. The term continuous random variable implies
that the variation takes place along a continuum. Examples of continuous
variables include weight, length, velocity, rate of production, dosage of a drug,
and the length of life of a given product. While discrete variables can be counted,
continuous variable can be measured with some degree of accuracy.
A probability distribution of a discrete random variable x whose value at x is
f( x ) possess the following properties.
1.
f ( x) ≥ 0 for all real values of x
2.
∑ f ( x) = 1
x
60
Property 1 simple states that probabilities are greater than or equal to zero. The
second property states that the sum of the probabilities in a probability
distribution is equal to 1. The notation
∑ F ( x)
x
means ‘sum of the values f() for all the values that x takes on”. We will
ordinarily use the term probability distribution to refer to both discrete and
continuous variables, other terms are sometimes used to refer to probability
distributions (also called probability functions).
Probability distributions of discrete random variables are often referred to as
probability mass functions or simply mass functions because the probabilities are
massed at distinct points, for example along the x axis.
Probability distributions of continuous random variables are referred to as
probability density functions or density functions.
5.1
Cummulative Distribution Functions
Given a random variable x , the values of the cumulative distribution function at
x , denoted F(x), is the probability that x takes on values less than or equal to x .
Hence
f ( x) = p( x) ≤ ( x)
→
(1)
In the case of a discrete random variable, it is clear that
f (c ) = ∑ f ( x )
→
( 2)
x≤c
The symbol
∑ f ( x)
x≤c
Means “sum of the values of f9cx0 for all values of x less than or equal to c”.
Example 1
Shoprite is interested in diversifying its product line into the soft goods market.
Mr Phiri, Vice president in charge of mergers and acquisitions, is negotiating the
acquisition of quick-save, a discount shop. To determine the price Shoprite would
have to pay per share for quick save, he sets up the probability distribution for the
stock price shown in the table below.
61
Probability distribution and cumulative distribution for the price of Quick
save common stock.
Price of Quicksave
Common stock (x)
K74 250
76 500
78 750
81 000
83 250
Probability
f(x)
0.08
0.15
0.53
0.20
0.04
Cumulative Probability
F(x)
0.08
0.23
0.76
0.96
1.00
The probability that the price would be K78 750 or less is
P ( x ≤ K 78 750) = F ( K 78750) = 0.08 + 0.15 + 0,53 = 0.76
P ( x ≤ K 76 500) = F ( K 76 500) = 0.23
A graph of the cumulative distribution function is a step function that is the values
change in discrete ‘steps’ at the indicated integral values of the random variable x.
F(x)
•
1.00
•
•
0.80
0.60
0.40
•
0.20
•
0.00
K74 250
76 500
78 750
81 000
83 250
x
Price of stock
Graph of cumulative distribution of the price of Quicksave common stocks.
62
5.2
Probability Distribution of Discrete Random Variables
We will discuss the binomial and Poisson probability distribution of discrete
random variables.
µ = E ( x) = ∑ xP( x)
All x
The variance of discrete random variable x is
σ 2 = E ( x − µ ) 2 = ∑ ( x − µ ) 2 p( x)
All x
In general, if g(x) is any function of the discrete random variable x, then
E[ g ( x)] = ∑ g ( x) P( X = x)
All x
For example
E (20 x) = ∑ 20 xP( X = x)
E ( x 2 ) = ∑ x 2 P( X = x)
E ( X − 5) = ∑ ( x − 5) P( X = x)
Example 2
The random variable X has the following distribution for x = 1,2,3,4.
X
1
2
3
0.02
0.35
0.53
P( X = x)
4
0.10
Calculate:
a)
E ( x)
b)
E (5 x − 3)
c)
E( X 2 )
d)
6 E ( x) + 8
e)
E (5 x 2 + 2)
Solution
a)
E ( x) = ∑ xP( X = x)
= 1(0.02) +2(0.35) + 3(0.53) + 4(0.10)
= 0.02 + 0.70 + 1.59 + 0.40
= 2.71
63
b)
E (5 x − 3) = 5 E ( x) − 3
= 5∑ xP ( X = x) − 3
= 5 [1(0.02) + 2(0.35) + 3(0.53) + 4(0.10)] − 3
= 5(2.71) − 3
= 13.55 − 3
= 10.55
c)
E ( X 2 ) = ∑ X 2 P( X = x)
= 12 (0.02) + 2 2 (0.35) + 32 (0.53) − 4 2 (0.10)
= 0.02 + 1.4 + 4.77 + 1.6
= 7.79
d)
6 E ( x) + 8 = 6∑ xP( X = x) + 8
= 6(2.71) + 8 = 16.26 + 8
= 24.26
e)
E (5 x 2 + 2) = 5 E ( x 2 ) + 2
=5 E ( x 2 )+ 2
5∑ x 2 P( X = x) + 2
= 5(7.79) + 2
= 40.95
In general, the following results hold when X is a discrete random variable.
1)
E (a ) = a where a is any constant.
2)
E (ax) = aE ( X ), where a is any constant
3)
E (aX +b) = aE ( x) + b, where a and b are any constants.
4)
E[ f1 ( x) + f 2 ( x)] = E[ f ( x)] + E[ f 2 ( x) where f1 and f 2 are functions of X.
Variance, Var (x)
As for the variance, the following results are useful.
1)
Var (a ) = 0 where a is any constant
2)
Var (ax) = a 2 var( x) where a is any constant
3)
Var (ax +b) = a 2 var( x) where a and b are any constants.
64
Example 3
For the data in Example 2, calculate the following:
a)
b)
c)
Var (5 x − 3) = 25 var( x)
Var (4 x)
Var (3 x + 2)
Solution
a)
Var (5 x − 3) = 25 var( x)
We will need to find Var ( x) = E ( x 2 ) − E 2 ( x)
E(X ) =
∑
xP ( X = x )
= 2 . 71 .
E ( X 2 ) = ∑ X 2 P ( X = x)
= 7.79
Var ( x) = E ( X 2 ) − E 2 ( x)
= 7.79 − (2.71) 2
= 0.4459
Var (5 x − 3) = 25 var( x)
= 25(0.4459)
Therefore var(5 x − 3) = 11.1475
b)
Var (4 x) = 16 var( x)
= 16(0.4459) = 7.1344
c)
Var (3 x + 2) = 9 var( x)
= 9(0.4459) = 4.0131
5.3
The Binomial Distribution
The Binomial distribution, in which there are two possible outcomes on
each experimental trial is undoubtedly the most widely applied probability
distribution of a discrete random variable. It has been used to describe a
large variety of processes in business and the social sciences as well as
other areas. The Bernoulli process after James Bernoulli (1654 – 1705)
gives rise to the Binomial distribution.
65
The Bernoulli process has the following characteristics.
a)
On each trial, there are two mutually exclusive possible outcomes, which
are referred to as “success” and “failures”. In somewhat different
language sample space of possible outcomes on each experimental trial is
S = (failure, success).
b)
The probability of a success, denoted P, remains constant from trial to
trial. The probability of a failure-denoted q, is equal to 1 − P .
c)
The trials are independent. That is, the outcomes on any given trial or
sequence of trials does not affect the outcomes on subsequent trials.
Suppose we toss a coin 3 times, then we may treat each toss as one Bernoulli trial.
The possible outcomes on any particular trial are a head and a tail. Assume that
the appearance of a head is a success. For example, we may choose to refer to the
appearance for a defective item in a production process as a success, if a series of
births is treated as a Bernoulli process, the appearance of female (male) may be
classified as a success.
Consider the experiment of tossing a fair coin three times, then the sequence of
outcome is
HTH, HHH, HHT, THH, TTT, THT, TTH, HTT
Since the probability of a success and failure on a given trial are respectively, P
and, the probability of the outcome for instance {HTH } = pqp = p 2 q where p is
the probability of observing a “head” and q is the probability of observing a “tail”.
Outcome
Probability
HTH
pqp = p 2 q
HHH
PPP = p 3
HHT
ppq = p 2 q
THH
qpp = qp 2
THT
qpq = q 2 p
TTT
qqq = q 3
TTH
qqp = q 2 p
HTT
pqq = pq 2
66
We can obtain the number of such sequences from the formula for the number of
combination of n objects taken x at a time. Thus the number of possible sequences in
 3
which two heads can occur is   .
 2
n!
Thus C (n, x) =
x!(n − x)!
3!
=3
2!1!
C (3,2) =
These are the events {HTH}, {HHT}, {THH}
Therefore the probability of exactly 2 heads p ( x = 2) = c(3,2)qp 2
In the case of the fair coin, we assign a probability of
1
1
to p and to q. Hence
2
2
P ( x = 2) = C (3,2)(1 / 2)(1 / 2) 2 = 3 / 8.
This result may be generalized to obtain the probability of (exactly) a success in n trials
of a Bernoulli process. Let us assume n – x failures occurred followed by x successes, in
that order. We may then represent this sequence as:
qqq
.
.
. q
n- x failures
ppp
x successes
The probability of this particular sequence is q n − x p x . The number of possible sequences
 n
of n trials resulting in exactly x success is   .
 x
Therefore, the probability of obtaining x successes in n trials of a Bernoulli process is
given by
F ( x) = (n, x)q n − x p x for x = 0,1,2, . . .,n
If we denote by x the random variable “number of successes in these n trials”, then
F ( x) = P( X ≤ x)
The fact that this is a probability distribution is verified by noting the following
conditions.
1)
f ( x) ≥ 0 for all real numbers of x
2)
∑ f ( x) = 1
x
Therefore, the term binomial probability distribution, or simply binomial distribution, is
usually used to refer to the probability distribution resulting from a Bernoulli process.
67
In problems where the assumption of a Bernoulli process are met, we can obtain the
probabilities of zero, one, or more successes in n trials from the respective terms of the
binomial expansion (q + p ) n , where q and p denotes the probabilities of failure and
success on a single trial and n is the number of trials.
Example 4
The tossing of a fair coin 3 times was used earlier as an example of a Bernoulli process.
Compute the probabilities of all possible numbers of heads and this establishes a
particular binomial distribution.
Solution
1
, n = 3. Letting x
2
represent the random variable “number of heads”, the probability distribution is as
follows:
This problem is an application of the binomial distribution for P =
(Number of heads)
X
P( x)
0
 3 1 
   
 0 2 
1
 3  1 
  
 1  2 
2
 3  1   1 
3
     =
8
 2  2   2 
3
 3  1 
  
 3  2 
3
0
3
1
1
  =
8
 2
1
3
1
  =
8
2
2
2
1
0
1
1
  =
8
 2
Example 5
A machine that produces stampings for car engines is not working properly and
producing 15% defectives. The defective and no defective stampings proceed from the
68
machine on a random manner. If 4 stampings are randomly collected, find the probability
that 2 of them are defective.
Solution
Let P = 0.15 be the probability that a single stamping will be defective and let X equal the
number of defective in n = 4 trials. Then,
q = 1 − p = 1 − 0.15 = 0.85 and
n
p( x) = 
x
=
 x n− x
 p q = 4(0.15) x (0.85) 4 − x

x
4!
(0.15) x (0.85) 4 − x ( x = 0,1,2,3,4)
x!(4 − x)!
Therefore, the probability of x = 2 defectives in a sample n = 4, substitute x = 2 into
the formula for P(X) to obtain
4!
(0.15) 2 (0.85) 2 = 0.01625625(6)
2!(4 − 2)!
= 0.0975375
P ( 2) =
= 0.0975
The mean, variance and standard deviation for a Binomial random variable is given by:
Mean
µ = np
Variance
σ 2 = npq
S tan dard deviation σ = npq
To calculate the values of µ and σ in example 5, substitute n = 4 and P = 0.15 unto the
following formula
µ = np = 4(0.15) = 0.60
σ = npq = (4)(0.15)(0.85) = 0.51 = 0.714
Example 6
Payani Serenje owns 5 stocks. The probability that each stock will rise in price is 0.6.
What is the probability that three out of the five stocks will rise in price?
69
Solution
n = 5 = 0.6,
q = 1 − P = 0 .4
Let x be the number of stocks, then
P( X = 3) = (5,3)(0.6)3 (0.4) 2
5!
=
.(0.216)(0.16)
3!2!
(5)(4)
≅
(0.216)(0.16)
2
≅ 0.3456
≅ 0.346
From the tables n = 5, P = 0 .6
P (3) = P ( X ≤ 3) − P ( X ≤ 2) = .663 − .317 = 0.34
5.4
The Poisson Distribution
The Poisson distribution is named after the eighteenth century (in the early 1800s)
French Physicist and mathematician. The Poisson distribution is a discrete
probability distribution which has the following formula.
P( X ) =
µ xeµ
x!
, forx = 0,1,2 . . .
where P(x) is the probability that a variable with a Poisson distribution equals x,
µ is the mean or expected value of the Poisson distribution, and e is
approximately 2.718 and is the base of the natural logarithms.
One reason why the Poisson distribution is important in statistics is that it can be
used as an approximation to the binomial distribution. If n (the number of trials)
is large and P(the probability of success) is small, the probability can be
approximated by the Poisson distribution where np = µ . Experience indicates
that the approximation is adequate for most practical purposes if n is at least 20
and P is no greater than 0.05.
The Poisson distribution has been used to describe the probability function of such
situations.
1)
2)
3)
4)
5)
Product demand
Demand for service
Number of telephone calls that come through a switchboard.,
Number of death claims per day received by an insurance company.
Number of breakdowns of an electronic computer per much.
70
All the preceding have two elements in common,
1)
The given occurrence can be described in terms of a discrete random variable,
which takes on values, 0, 1, 2, and so forth.
2)
There is some rate that characterizes the process producing the outcome. The rate
is the number of occurrences per interval of time or space.
For instance, product demand can be characterized by the number of units purchased in a
specified period. Product demand may be viewed as a process that produces random
occurrences in continuous time.
The characteristics of a Poisson distribution are as follows:1)
The experiment consists of counting the number of times a particular event occurs
during a given unit of time, or in a given area of volume (or any unit of
measurement,
2)
The probability that an even occurs in a given unit of time, area, or volume is
independent of the number that occur in their units.
Example 7
Suppose X the number of the company’s absent employees on Tuesdays has
(approximately) a Poisson probability distribution. Assuming that the average number of
Tuesday absentees is 3.4;
a)
Find the mean and standard deviation of x, the number of absent employees on
Tuesday.
b)
Find the probability that exactly 3 employees are absent on a given Tuesday.
c)
Find the probability that at least two employees are absent on a Tuesday.
Solution
a)
The mean and variance of a Poisson distribution are equal to µ . Thus for this
example
µ = 3.4,
σ 2 = 3 .4
Therefore the standard deviation is
σ = 3.4 = 1.84
b)
We want the probability that exactly three employees are absent on Monday. The
probability distribution for x is
71
P( X ) =
µ X e− µ
X!
Then µ = 3.4, X = 3, and e −3.4 = 0.033373
(from Table 2)
(3.4)3 e −3.4 (3.4)3 (0.033373)
=
= 0.2186.
Thus, P (3) =
3!
6
c)
To find the probability that at least two employees are absent on Tuesday, we
need to find
∞
P ( X ≥ 2) = P (2) + P (3) + . . . = ∑ P ( X )
x=2
Alternatively, we could find the complementary event
P( X ≤ 2) = 1 −P( X ≤ 1) = 1 − [ P(0) +P(1)]
 (3.4)0 e3.4 (3.4)1 e3.4 
= 1
+

0!
1!


= 1 − [0.033373 + (3.4)(0.03337]
= 1 − 0.1468412 = 0.8531588
= 0.8532
Example 8
On Saturdays at Southdown, a small airport in Kalulushi, airplanes arrive at an average of
3 for the one hour period 13 00 hours to 14 00 hours. If these arrivals are distributed
according to the Poisson probability distribution, what are the probabilities that:
a)
Exactly zero airplanes will arrive between 13 00 hours to 14 00 hours next
Saturday?
b)
Either one or two airplanes will arrive between 13.00 hours and 14 00 hours next
Saturday?
c)
A total of exactly two airplanes will arrive between 13 00 hrs and 14 00 hrs
during the next three Saturdays?
72
Solution
a)
µ = 3, and we let X be the number of arrivals during the specified time period.
30 e.− 3
≅ 0.049787068
0!
= 0.0498
P ( 0) =
(From the table, we have 0.049787).
b)
P ( X = 1 or X = 2) =P ( X = 1)+P ( X = 2)
31 e −3 32 e −3
+
1!
2!
9
= e − 3 (3 + )
2
15
= ( )(0.04978068)
2
= 0.37340301
= 0.3734.
=
c)
A total of exactly two arrivals in three Saturdays during the period 13 00 hours to
14 00 hours can be obtained. For example by having two arrivals on the first day,
none on the second day, and none on the third day during the specified one-hour
period.
The total number of ways in which the event in question can occur is shown in the
table below.
Saturday Day 1
2
0
0
1
1
0
Number of Arrivals
Saturday Day 2
0
2
0
1
0
1
Saturday Day 3
0
0
2
0
1
1
Number of ways of obtaining a total of exactly 2 arrivals in 3 Saturdays.
73
= 3[ P ( X = 2][ P ( X = 0)]2 + 3[ P ( X = 1)]2 [ P ( X = 0)]
=3
(32 e − 3 ) (30 e − 3 ) 2
(31 e − 3 ) 2 (30 e − 3 )
+3
2!
0!
1!
0!
−9
81
9
 81e
= 3e − 9  + 9  =
= (0.0001)
2
2
2

= 0.0049815
= 0.005
5.5
Continuous Random Variables
The probability distribution of continuous random variables are also important in
statistical theory. They are, a theoretical representation of a continuous random
variable such as the time taken in minutes to do some work, or the mass in
grammes of a bag of salt.
The continuous random variable is specified by its probability density function,
which is written f ( x) where f ( x) > 0 throughout the range of values for which
x is defined. The probability density function ( p.d . f ) can be represented by a
curve, and the probabilities are given by the area under the curve.
For a continuous random variable x that assumes a value in the interval a < x < b,
b
the P (a < x < b) = ∫ f ( x)dx , assuming the integral exists. Similar to the
a
requirements for a discrete probability distribution, require
f ( x) ≥ 0 and
b
∫ f ( x)dx = 1 .
If x is a continuous random variable – and with p.d.f. f ( x), then
a
b
b
var( x) = ∫ x f ( x)dx − µ where µ = E ( x) = ∫ xf ( x)dx, The standard deviation of
2
2
a
a
x is often written as σ = var( x) .
5.6
The Normal Distribution
The normal distribution plays a central role in statistical theory and
practice, particularly in the area of statistical inference.
Any important characteristic of the normal distribution is that we need to
know only the mean and standard deviation to compute the entire distribution.
The normal probability distribution is defined by the question.
74
1
F ( x) =
e− 2
(x − µ 2 )
σ2
2πσ 2
The normal distribution is perfectly symmetric about its mean µ .
Computing the area over intervals under the normal probability distribution is a
difficult task. As a result, we will use the computed areas listed in Table 3.
Example 1
Suppose you have a normal random variable x with µ = 50 and σ = 15.
Find the probability that x will fall within the interval 30 < × < 70.
Solution
We compute the Z-Score, (or standard score) for the measurement x, the
standard score is defined by:
Z=
Value − Mean
x−µ
=
S tan dard deviation
σ
thus Z =
30 − 50
= −1.33
15
Because x = 30 lies to the left of the mean, the corresponding Z-score
should be negative and of the same numerical value as the Z-score corresponding
to x = 50.
Z=
70 − 50 20 +
=
= 1.33
15
15
f ( x)
(4)
A
30
50
70
Normal frequency function: µ = 50, σ = 15.
75
To find the area corresponding to a Z-score of 1.33, we first locate the
value 1.3 in the left-hand column. Since this column lists Z values to one decimal
place only, we refer to the top row of the table to get the second decimal place,
0.03. Finally, we locate the number where the row labeled Z = 1.3 and the
column labeled 0.03 meet. This number represents the area between the mean, µ
and the measurement that has a Z-score of 1.33.
A = 0.4082
Or, the probability that x will fall between 50 and 70 is 0.4082. Thus the
required probability is 2(0.4082) = 0.8164.
Example 2
Use Table 1 to determine the area to the right of the Z-score 1.64for the standard normal
distribution, i.e., find P ( Z > 1.64) .
Solution
A
Standard Normal Distribution: µ = 0, σ = 1
The probability that a normal random variable will fall more than 1.64 standard deviation
to the right of its mean is indicated in the figure above. Because the normal distribution
is symmetric, half of the total probability (.5) lies to the right of the mean and half to the
left. Therefore, the desired probability is P ( Z > 1.64) = 0.5 = A. .
Where A is the area between µ = 0 and Z =1.64 as shown in the figure.
Referring to Table 1, the area A corresponding to Z = 1.64 is 0.4495, so,
P ( Z > 1.64)= 0.5 − A = 0.5 − 0.4495 = 0.0505.
Example 3
Find the probability that the value of the standard normal variable will be between –1.23
and +1.14.
Solution
76
Table 1 shows that the area under the standard normal curve between 0 and 1.23 is
0.3907, so the area between 0 and –1.23 must also be 0.3907. Table 1 shows that the area
between 0 and 1.14 is 0.3729. Thus, the area between –1.23 and +1.14 equals 0.3907 +
0.3729 = 0.7636, which means that the probability we want equals 0.7636.
-1.23
0
+1.14
Example 4
Find the probability that the value of the standard normal variable will be between 0.43
and 1.55.
Solution
0
0.43 1.55
From Table 1, the area between 0 and 1.55 is 0.4394 and that between 0 and 0.43 is
0.1664. Therefore the area between 1.55 is 0.4394 – 0.1664 = 0.2730.
The Normal Distribution As An Approximation To The Binomial Distribution
Normal Approximation to the Binomial Distribution. If n (the number of trials) is large
and P ( the probability of success) is not too close to 0 or 1, the probability distribution of
the number of successes occurring in n Bernoulli trials can be approximated by a normal
distribution. Experience indicates that the approximation is fairly accurate as long as
1
1
1
np > 5 when p ≤
and n(1 − p ) > when p > .
2
2
2
77
Example 5
1
. A firm has 100
2
such machines and whether one is down, is statistically independent of whether another is
not down. What is the probability that at least 60 machines will be down?
The probability that a machine will be down for repairs next week is
Solution
The number of machines down for repair has a binomial distribution with mean equal to
1 1
100     or 50. Because of the continuity correction, the probability that the
2 2
number down for repairs is 60 or more can be approximated by the probability that the
value of a normal variable with mean equal to 50 and standard deviation equal to 5
exceeds 59.50. The value of the standard normal variable corresponding to 59.50 is (5950) ÷ 5, or 1.9. Table 3 shows that the area under the standard normal curve between
zero is 1.9 is 0.4713, so the area to the right of 1.9 must equal 0.5000 – 0.4713 = 0.0287.
This is the probability that at least 60 machines will be down for repair.
Learning Objectives
After working through this Chapter, you should be able to:
•
Give the formal definition of a random variable, and distinguish between a
random variable and the values it takes.
•
Explain the difference between continuous and discrete random variables.
•
Discuss such distributions as Binomial, Poisson, Normal and calculate
probabilities of events for such random variables.
•
Find the mean and the variance of the binomial, Poisson and Normal distributions.
Sample Examination Questions
1.
a)
It is estimated that 75% of a grapefruit crop is good, the other 25% have
rotten centers that cannot be detected unless the grapefruit is cut open.
The grapefruit are sold in sacks of 6. Let r be the number of good
grapefruit in the sack.
i)
ii)
iii)
iv)
v)
Make a histogram of the probability distribution of r.
What is the probability of getting no more than one bad grapefruit
in a sack?
What is the probability of getting at least one grapefruit in a sack?
What is the expected number of good grapefruit in a sack?
What is the standard deviation of the r probability distribution?
78
2.
3.
b)
Let x have a normal distribution with µ = 10 and σ = 2. Find the
probability that an x value selected at random from the distribution is
between 11 and 14.
a)
In a lottery, you pay K12 500 to choose a number (integer) between 0 and
9999, inclusive. If the number is drawn, you win K12 500,000. What is
your expected gain (or loss) per play?
b)
A large hotel knows that on average 2% of its customers require a special
diet for medical reasons. It is hosting a conference for 500 people.
i)
Which probability distribution would you suggest for calculating
the exact probability that no customer at the conference will
require a special diet? Calculate this probability.
ii)
Which probability distribution do you suggest is an approximation
to this and why? Calculate an approximate probability that no
customers require a special diet.
iii)
Compare your answers to (i) and (ii).
iv)
From past records the hotel knows that 0.2% of its customers will
require medical attention while staying in the hotel. Calculate the
exact and approximate probability that no customer out the 500
will require medical attention while attending the conference. Is
this approximation better or worse that the approximation used in
(ii)? Why?
The Table below shows the probabilities for the number of complaints
received each day by a newspaper agency from customers not receiving a
paper.
a)
No. of complaints
Probability
b)
8
.35
9
.42
10
.18
11
.03
12
.02
i)
Find the mean and standard deviation of the number of complaints.
ii)
The agency state the cost (in kwachas) of daily complaints to be C
= 600 + 300x, where x is the number of complaints. Find the mean
and standard deviation of the cost of daily complaints.
A write has prepared to submit sit articles for publication. The probability
of any article being accepted is 0.20. Assuming independence, find the
probability that the writer will have
i)
exactly one article accepted.
ii)
At least two articles accepted
iii)
No more than three articles accepted
iv)
At most two articles accepted.
79
4.
a)
b)
5.
a)
b)
c)
A Toyota dealer wishes to know how many citations to order for the
coming month. Estimated demand is normally distributed, with a standard
deviation of 20 and a mean of 120.
i)
What is the probability that he will need more than 160?
ii)
What is the probability that he will eed less than 90?
A client wishes to know what price he might be able to get for a business
property. The realtor estimates that a sale price for that property of K600
million would be exceeded no more than 5% of time. A price at least
K420 million should be obtained at least 90% of the time.. Assuming the
distribution of sales prices to be normal, answer the following questions?
i)
What are µ and σ for this distribution?
ii)
What is the probability of a scale price greater than K540, less than
K640 million, and between K540 million and K600 million.
Which of the following are continuous variables, and which are discrete
variables.
i)
Number of traffic fatalities per year in the town of Livingstone.
ii)
Distance a ball travels after bring killed by a soccer player.
iii)
Time required to drive from home to campus on any given day.
iv)
Number of cars in Kitwe on any given day.
v)
Your weight before breakfast each morning.
The ABCD Mother-in-law sociologists say that 80% of married women
claim that their husbands’ mothers are the biggest bones of contention in
their marriages (sex and money are lower-rated areas of contention).
Suppose that five married women are having lunch together one
afternoon, what is the probability that:
i)
All of them dislike their mother-in-law
ii)
None of them dislike her mother-in-law?
iii)
At least four of them dislike their mother-in-law?
iv)
No more than three of them dislike their mother-in – law.
The Mulenga Café has found that about 6% of the parties who make
reservations don’t show up. If 90 party observations have been made, how
80
many can be expected to show up. Find the standard deviation of this
distribution.
6.
a)
b)
The mean and standard deviation on an examination are 85 and 15
respectively. Find the scores on standard units of students receiving
grades.
i)
65
ii)
89
Determine the probabilities
i)
P ( Z ≤ 2.12)
ii )
P (−16 ≤ Z < 1.13)
where Z is assumed to be normal with mean 0 and variance 1.
7.
c)
What is the probability of obtaining at least 1280 heads if a coin is tossed
2500 times and heads and tails are equally likely?
d)
The side effects of a certain drug cause discomfort to only a few patients.
The probability that any individual will suffer from the side effects is
0.005. If the drug is given to 35 000 patients, what is the probability that
three (3) will suffer side effects.
a)
The customer service center in a large Luksa department store has
determined that the amount of time spent with a customer with a
complaint is normally distributed with a mean of 9.3 minutes and a
standard deviation of 2.5 minutes. What is the probability that for a
randomly chosen customer with a complaint the amount of time sent
resolving the complaint will be:
b)
i)
less that 10 minutes?
ii)
more than 5 minutes
iii)
between 8 and 15 minutes.
A car rental company is determined that the probability a car will need
service work in any given month is 0.25. The company has 850 cars.
i)
What is the probability that more than 150 cars will require service
work in a particular month?
81
ii)
c)
What is the probability that fewer than 180 cars will need service
work in a given month? (Give reason for the method used to
calculate the probabilities in (i) and (ii).
A contractor estimates the probabilities for the number of days required to
complete a certain type of construction project as follows.
Time (days)
Probability
1
.04
2
.21
3
.34
4
.31
5
.10
i)
What is the probability that a randomly chosen project will take
less than 3 days to complete.
ii)
Find the expected time to complete a project.
iii)
Find the standard deviation of time required to complete a project.
iv)
The Contractor’s project cost is made up of two parts – a fixed cost
of K100,000,000 plus K10,000,000 for each day taken to complete
the project. Find the standard deviation of total project costs.
82
CHAPTER 6
SAMPLING AND SAMPLING DISTRIBUTION
Reading
Newbold Chapter 6
Wonnacolt and Wonnacolt Chapter 6
Tailoka Frank P Chapter 10
James T Mc Clave and P George Benson Chapter 7
Introductory Comments
We now start on the work that defines the subject Statistics as a different and unique
subject. The idea of sampling and sampling distribution for a statistic like the mean must
be clearly understood by all users of statistics. This is not an easy Chapter to understand.
6.
Sampling Theory
Sampling and Sampling Distribution
6.1
Sampling
If we draw an object from a box, we have the choice of replacing or not replacing
the abject into the box before we draw again. In the first case a particular object
can come up gain and again, whereas in the second it can come up only once.
Sampling where each member of a pollution may be chosen more than once is
called sampling with replacement while sampling where each member cannot be
chosen more than once is called sampling without replacement.
Random Samples. Random Numbers
Clearly the reliability of conclusions drawn concerning a population depends on
whether the sample is properly chosen so as to represent the population
83
sufficiently well, and one of the important problems of statistical inference is just
how to choose a sample.
The way to do this for finite population is to make sure that each members
of the population has the same chance of being in the Sample, which os often
called a random sample. Random sampling can be accomplished for relatively
small populations by drawing lots or equivalently, by using a table of random
numbers specially constructed for such purposes.
Because inference from sample to population cannot be certain we must use the
language of probability in any statement of conclusions.
6.2
Sampling Distributions
As we have seen, a sample statistic that is computed from X 1 , . . . , X n is a
function of these random variables and is therefore itself a random variable. The
probability distribution of a sample statistic is often called the sampling
distribution of the statistic.
Alternatively, we can consider all possible sample of size n that can be drawn
from the population, and for each sample we compute the statistic. In this manner
we obtain the distribution of the statistic, which is its sampling distribution.
For a sampling distribution, we can of course compute a mean, variance, standard
deviation, etc. The standard deviation is sometimes also called the standard error.
The Sample Mean
Let
X 1 , X 2 , . . . X n denote the independent, identically distributed random
variables for a random sample of size n as described above. Then the mean of the
sample or sample mean is a random variable defined by
x=
X1 +X 2 + . . . + X n
n
→
(1)
If x1 , x2 , . . ., xn denote the values obtained in a particular sample of size b, then the mean
x + x2 + . . . + xn
for that sample is denoted by x = 1
→
( 2)
2
Sampling Distributions of Means
84
Let f ( x) be the probability distribution of some given population from which we draw a
sample of size n. Then it is natural to look for the probability distribution of the sample
statistics x , which is called the sampling distribution for the sample mean, or the
sampling distribution of mean. The following theorems are important in this connection.
Theorem 6.1
The mean of the sampling means denoted by µ x = µ
→ (3)
Where µ is the mean of the population. Theorem 6 – 1 states that the expected value of
the sample mean is the population mean.
Theorem 6.2
If a population is infinite and the sampling ir random or if the population is finite and
sampling is with replacement, then the variance of the sampling distribution of means,
denoted by σ x2 , is given by
[
]
E ( x − µ ) 2 = σ x2 =
σ2
n
Theorem 6.3
If the population is of size N, if sampling is without replacement, and if the sample size
σ 2 N − n
is n ≤ N , then the previous equation is replaced by σ x2 =
→ (5)
n  N − 1 
While µ x is from Theorem 6.1.
Note that Theorem 6.3 is basically the same as 6.2 as N → ∞
Theorem 6.4
If the population from which samples are taken is normally distributed with mean µ and
variance σ 2 , then the sample mean is normally distributed with mean µ and variance
σ2
n
.
Theorem 6.5
85
Suppose that the population from which samples are taken has a probability with mean µ
and variance σ 2 , that is not necessarily a normal distribution. Then the standardized
variable associated with x , given by
Z=
x−µ
→
σ
( 6)
n
is asymptotically normal, i.e.
lim
n→∞
P( Z ≤ z ) =
1
2π
z
∫e
−
µ2
2
du
→
( 7)
−∞
Theorem 6.5 is a consequence of the Central limit theorem. It is assumed here that the
population is infinite or that sampling is with replacement. Otherwise, the above is
correct if we replace
σ
n
in Theorem 6.5 by σ x2 as given in theorem 6.3.
Example 1.0
Five hundred ball bearings have a mean weight of 5.02kg and a standard deviation of
0.30kg. Find the probability that a random sample of 100 ball bearings chosen from this
group will have a combined weight of more than 5.10kg.
For the sampling distributions of means, µ x = µ = 5.02kg , and
=
0.30
100
σ =
2
x
σ2
n
N −n
N −1
500 − 100
= 0.027
500 − 1
The combined weight will exceed 5.10kg if the mean weight of the 100 bearings exceeds
5.10kg.
5.10 in standards units =
5.10 − 5.02
= 2.96
0.027
The required probability is the area to the right z = 2.96.
2.96
Figure 6.1.
86
The probability is 0.5 – 0.4985 = 0.0015. Therefore, there are only 3 chances in 2000 of
picking a sample of 100 ball bearings with a combined weight exceeding 5.10 kg.
For finite populations in which samplings without replacement, the equation σ p̂ given
above, is replaced by σ x as given they Theorem 6.3 with σ pˆ =
pq
n
.
Sampling Distribution of Proportions
Suppose that a population is infinite and binomially distributed, with P and q = 1- p being
the respective probabilities that any given number exhibits or does not exhibit of a certain
property. For example, the population may be all possible tosses of a fair coin, in which
the probability may be all possible tosses of a fair coin, in which the probability of the
1
event heads is P = .
2
Consider all possible samples of size n drawn from this populations, and for each sample
determine the statistic that is the proportion P of successes. In the case of the coin, P
would the the proportion of heads turning up in n tosses. Then we obtain a sampling
distribution whose mean µ p̂ and standard deviation σ p̂ are given by
µ pˆ = P
σ pˆ =
pq
=
n
p (1 − p )
n
→
(8)
For large values of n(n ≥ 30) the sampling distribution is very nearly a normal
distribution, as seen from Theorem 6.5.
Sampling Distribution of Differences and Sums
Suppose that we are given two populations. For each sample size n1 drawn from the first
population, let us compute a statistic S1. This yields a sampling distribution for S1.
whose mean and standard deviation we denote by µ s1 and σ s1 , respectively. Similarly for
each sample of size n2 drawn from the second population, let us compute a statistic S2
whose mean and standard deviation are µ s 2 and σ s 2 respectively.
Taking all possible combinations of these samples from the two populations, we can
obtain a distribution of the differences of the statistics. The mean and standard deviations
are µ s 2 and σ s 2 respectively.
Taking all possible combinations of these samples from the two populations, we can
obtain a distribution of the differences S1 − S 2 , which is called the sampling distribution
of differences of the statistics. The mean and standard deviation of this sampling d,
denoted respectively.
87
By µ S1 − S 2 = µ S1 − µ S 2
σS
1 − S2
= σ S21 + σ S22
(9)
Provided that the samples chosen do not in any way depend on each other, i.e., the
samples are independent (in other words, the random variables S1 and S 2 are
independent.
If, for example S1 and S 2 are the sample means from two populations, denoted by x1 , x2 ,
respectively, then the sampling distribution of the differences of means is given for
infinite population with mean and standard deviation µ1 , σ 1 and µ 2 , σ 2 , respectively by
µx
1 − x2
= µ x1 − µ x 2 = µ1 − µ 2 ,
1 − x2
= σ x2 + σ x2 =
σx
1
σ 12
n1
2
→
and
+
σ 22
→
n2
(10)
(11)
Using Theorems 6.1 and 6.2. This result also holds for finite populations if sampling is
done with replacement. The standardized variable
Z=
( X 1 − X 2 ) − ( µ1 − µ 2 )
σ 12
n1
+
σ 22
n2
in that case is very nearly normally distributed if n1 and n2 are large (n1 , n2 ≥ 30).
Similar results can be obtained for infinite populations in which sampling is without
replacement by using Theorems 6.1 and 6.3.
Corresponding results can be obtained for sampling distributions of differences of
proportions from two binomially distributed populations with parameters
P1 , q1 , and P2 , q2 , whose mean and standard deviation of their difference is given by
µ P − P = µ P = P1 − P2
1
2
σ P − P = σ P2 + σ P2
1
2
→
2
1
2
=
P1q1 P2 q2
+
n1
n2
→
(13) and
(14)
Instead of taking difference os statistics, we sometimes are interested in the sum of their
statistics. In that case, sampling distribution of the sum of statistics S1 and S 2 has mean
and standard deviation given by
µS
1 + S2
= µ S1 + µ S 2
σS
1 +S2
= σ S21 + σ S22
→ (15)
88
assuming the samples are independent. Results similar to µ x
1 − x2
and σ x
1 − x2
can be
obtained..
Example 2
It has been found that 2% of the tools produced by a certain machine are defective. What
is the probability that in a shipment of 400 such tool, 3% or more will prove defective?
µ p = P = 0.02,
σp =
0.02(0.98) 0.14
=
= 0.007
400
20
pq
=
n
0.03 − 0.02 

P ( P > 0.03) = P Z >

0.007 

= P ( Z > 1.43)
= 0.5000 − 0.4236
= 0.0764
1.43
Learning Objectives
After working through this Chapter, you should be able to:
•
Give the formal definition of a random variable, and distinguish between a
random variable and the values it take,
•
Explain the difference between continuous and discrete random variables.
•
Discuss such distribution as binomial, poisson normal and calculate probabilities
of event for such random variables.
•
Find the mean and the variance of the binomial, Poisson and normal distribution.
89
CHAPTER 7
ESTIMATION
Reading
Newbold Chapter 7
Wonnacott and Wonnacott Chapter 7
Tailoka Frank P Chapter 10
Introductory Comments
We need to know how the mean of the population is related to the sample mean
What characteristics must the sample mean have. We need to know whether the sample
is likely to give us an estimate close to the population value. To tell us this , we use
confidence intervals.
7.
Estimation Theory
7.1
Unbiased Estimates and Efficient Estimates
A statistic is called unbiased estimator of a population parameter if the mean or
expectation of the statistic is equal to the parameter. The corresponding value of
the statistic is then called unbiased estimate of the parameter.
If the sampling distribution of two statistics have the same mean, the statistic with
the smaller variance is called a more efficient estimator of the mean. The
corresponding value of the efficient statistic is then called an efficient estimate .
Clearly one would in practice prefer to have estimators that are both efficient and
unbiased, but this is not always possible.
7.2
Point estimates and Interval Estimates
An estimate of a population parameter given by a single number is called a point
estimate of the parameter. An estimate of a population parameter given by two
numbers between which the parameter may be considered to lie is called an
interval estimate of the paratmeter.
Example 7.0
If we say that a distance is 34.5km, we are giving a point estimate. If, on the
other hand, we say that the distance is 34.5 ± 0.04km, i.e., the distance lies
between 34.46 and 34.54km, we are giving an interval estimate.
A statement of the error or precision of an estimate is often called reliability.
90
7.3
Confidential Interval Estimates of Population Parameters.
Let µ s and σ s be the mean and standard deviation (standard error) of the
sampling distribution of a statistic S. Then if the dampling distribution of S is
approximately normal (which we have seen is true for many statistics if the
sample size n ≥ 30), we can expect to find S lying in the interval µ s − σ s to
µ s + σ s , µ s − 2σ s to µ s + 2σ s or µ s − 3σ s , to µ + 3σ s , about 68%, 95% and
99.7% of the time respectively.
Equivalently, we can expect to find, or we can be confident of finding µ in the
intervales S − σ s , to S +σ s , S − 2σ , to S + 2σ , S − 3σ s to S + 3σ s about 68%,
95% and 99.7% of the time respectively. Because of this, we call these respective
intervals 68%, 95% and 99.7% confidence intervals for estimating µ s (i.e., for
estimating the population parameter, in this case of an unbiased S). The end
number of these intervals ( S ± σ s S ± 2σ s , S ± 3σ s ) are then called the 68%, 95%
and 99.7% confidence limites.
Similarly, S ± 1.96σ s and S ± 2.58σ s are 95% and 99% confidence limits for
µ s . The percentage confidence is often called the confidence level. The numbers
1.96, 2.58, etc., in the confidence limits are called critical values and are denoted
by Z c . From confidence levels, we can find critical values.
7.4
Confidence Intervals for Means
We shall see how to create confidence intervals for the mean of a population
using two different cases. The first case shall be when we have a large sample
size ( n ≥ 30), and the second case shall be when we have a smaller sample
n < 30) and the underlying population is normal.
Large samples ( n ≥ 30)
If the statistic S is the sample mean x , then the 95% and 99% confidence limits
for
estimation
of
the
population
mean
µ
are
given
by
x ± 1.96σ x , and x ± 2.58σ x , respectively.
More generally, the confidence limits are given by x ± Z cσ x where Z c which
depends on the particular level of condience desired. The confidence limits for
the population mean are given by
x ± Zc
σ
n
→
(1)
91
In case of sampling from an infinite population or if sampling is done with
replacement from a finite population, and by
x ± Zc
σ
n
N −n
N −1
→
( 2)
If sampling is done without replacement from a population of finite size N.
In general, the population standard deviation σ is unknown, so that to obtain the
above cnfidence limits we use the estimator Sˆ or S .
Example 2
Find a 95% confidence interval estimating the mean height of the 1546 male
students at XYZ University by taking a sample size 100. (Assume the mean of
the sample, x , is 67.45 and that the standard deviation of the sample Ŝ , is
2.93cm).
The 95% confidence limits are x ± 1.96
σ
n
Using x = 67.45cm and Ŝ = 2.93 as an estimate of σ , the confidence limits are
 2.93 
67.45 ± 1.96

 100 
or
67.45 ± 0.57
Then the 95% confidence interval for the population mean µ is 66.88 to 68.02
cm, which can be denoted by 66.88 < µ < 68.02.
We can therefore say that the probabilit that the population mean height lies
between 66.88 and 68.02 cm is above 95%.
In symbols, we write
P (66.88 < µ < 68.02) = 0.95% . This is equivalent to saying that we are 95%
confident that the population mean (true mean) lies between 66.88 and 68.02cm.
7.5
Sample Sample (n < 30) and Population Normal
In this case use the distribution to obtain confidence levels. For example, if
− t0.025 and t0.025 are values of T for which 2.5% of the area lies in each tail of the t
distribution, then a 95% confidence interval for T is given by
− t0.025 <
(x − µ )
S
n
< t0.025
→
(3)
92
From which we can see that µ can be estimated to lie in the interval
x − t0.025
Sˆ
S
< µ < x + t0.025
n
n
→
( 4)
With 95% confidence. In general the confidence limits for population means are
given by
x ± tc
S
n
→ (5)
where the tc values can be read from Table 2.
7.6
Confidence Intervals for Proportions
Suppose that the statistic S is the proportional of “successes’ in a sample of size
n ≥ 30 drawn from a binomial population in which P is the proportion of
successes (i.e. the probability of success). Then the confidence limits for P are
given P ± Z cσ p , where P denotes the proportion of success in the sample of size
n. Using the value of σ p obtained in chapter 6, we see that confidence limits for
the population are given by:
P ± Zc
pq
P(1 − P)
= P ± Zc
n
n
→
( 6)
In case sampling from an infinite population or if sampling is with replacement
from a finite population. Similarly, the confidence limits are:
P ± Zc
pq
n
N −n
N −1
→
( 7)
If sampling n without replacement from a population of finite size N. Note that
these results are obtained from (1) and (2) on replacing x by P and σ by Pq .
To compute the above confidence limits, we use the sample estimate P for p.
Example 3
A sample roll of 100 votes chosen at random from all voters in a given district
indicate that 55% of them were in favour of a particular candidate. Find the 99%
confidence limits for the proportion of all voters in favour of this candidate.
93
The 99% confidence limits for the population P are
P+ 1.58σ p = P ± 2.58
P (1 − p )
n
055(0.45)
100
= 0.55 ± 2.58
= 0.55 ± 0.13
7.7
Confidence Intervals for Differences and Sums
S1 and S2 are two sample statistics with approximately normal sampling
distributions, confidence limits for the differences of the population parameters
corresponding to S1 and S2 are given by
(S1 − S2 ) + Z cσ s + s
1
2
= (S1 − S 2 ) ± Z c σ s21 +σ s22
→
(8)
While confidence limits for the sum of the population parameters are given by
(S1 − S 2 ) + Z cσ s + s
1
2
= (S1 − S 2 ) ± Z c σ s21 +σ s22
→
(9)
provided the samples are independent.
For example, confidence limits for the difference of two population means, in the
case where the populations are infinite and have known standard deviations
σ 1 ,σ 2 , are given by
(x − x ) ± Z σ
1
2
c
x1 − x 2
(
)
= x1 − x 2 ± Z c
σ s2 σ s2
1
n1
+
n2
→
(10)
where x1 , n1 and x 2 , n2 are the respective means and sizes of the two samples
drawn from the populations.
Similarly, confidence limits for the difference of two population proportions,
where the populations are infinite, are given by
(P1
− P
)2
± Zc
P(1 − p1 )
P (1 − p2 )
+ 2
n1
n2
→
(11)
94
When P1 and P2 are the two sample proportions and n1 and n2 are sizes of the
two samples drawn from the populations.
Example 4
In a random sample of 400 adults and 600 teenagers who watched a certain
television program, 100 adults and 300 teenagers indicated that they like it.
Construct the 99.7% confidence limits for the difference in proportions of all
adults and all teenagers who watched the program and liked it.
Confidence limits for the difference in proportions of the two groups are given by
911), where subscripts 1 and 2 refer to teenagers and adults, respectively, and
Q1 = 1 − p1 , Q2 = 1 − p2. Here P1 − 300 / 600 = 0.5 and P2 = 100 / 400 = 0.25 are
respectively, the proportions of teenagers and adults who liked the program.
The 99.7% confidence limits are given by
0.50 − 0.25 ± 3
(0.50)(0.50) (0.25)(0.75)
+
600
400
0.25 ± 0.09
→
(12)
Therefore, we can be 99.7% confident that the true difference in proportions lies between
0.16 and 0.34.
Learning Objectives
After working through this Chapter you should be able to:
•
.
•
Explain a point estimate and confidence interval.
•
Confidence intervals for proportions and differences of proportions.
Find confidence intervals for means of normal populations, and for differences of
means of two normal populations, both when variance (s) are known and when
they are unknown..
95
CHAPTER 8
HYPOTHESIS TESTING
Reading
Newbold Chapter 9
Wonnacott and Wonnacott Chapter 9
Tailoka Frank P Chapter 10
Introductory Comments
We often need to answer questions about a population such as “Is the mean of the
population less 5?” or “Is there any difference between two means?” In statistics we try
to answer these questions based on the information in samples. There is useful
information in this Section of this subject for everyday life.
The theory of tests of hypothesis is necessarily linked to that for confidence intervals.
8.0
Test of Hypothesis and Significance
8.1
Statistical Decisions
Very often in practice we are called upon to make decisions about
populations on the basis of sample information. Such decisions are called
statistical decisions. For example, we may wish to decide on the basis of sample
data whether a new serum is really effective in curing a disease, whether one
educational procedure is better than another, or whether a given coin is loaded.
8.2
Statistical Hypothesis
In attempting to research decisions, it is useful to make assumptions or
guesses about the populations involved. Such assumptions, which may or may
not be true, are called Statistical hypotheses and in general are statements about
the probability distribution of the populations. For example, if we want to decide
whether a given coin is loaded, we formulate the hypothesis that the coin is fair,
i.e, p = 0.5, where p is the probability of heads. Similarly, if we want to decide
whether one procedure is better than another, we formulate the hypothesis that
there is no difference between the two procedures (i.e., any observed differences
are merely due to fluctuations in sampling from the same population). Such
hypotheses are often called null hypotheses, denoted by H o .
96
Any other hypothesis that differs from a given null hypothesis is called an
alternative hypothesis. For example, if the null hypothesis is p = 0.5, possible
alternative hypotheses are p = 0.7, P ≠ 0.5 or P > 0.5. A hypothesis alternative
to the null hypothesis is denoted by H1 .
8.3
Type I and Type II Errors
If we reject a hypothesis when it happens to be true, we say that a Type I error
has been made. If, on the other hand, we accept a hypothesis when it should be
rejected, we say that a Type II error has been made. In either case, a wrong
decision or error in judgement has occurred.
In order for any tests of hypotheses or decision rules to be good, they must
be designed so as to minimize errors of decision. This is not a simple matter
since, for a given sample size, an attempt to decrease one type of error is
accompanied in general by an increase in the other type of error. In practice one
type of error may be more serious than the other, and so a compromise should be
reached in favour of a limitation of the more serious error. The only way to
reduce both types of errors is to increase the sample size, which may or may not
be possible.
8.4
Level of Significance
In testing a given hypothesis, the maximum probability with which we
should be willing to risk a type I error is called the level of significance of the
test. This probability is often specified before any samples are drawn so that
results obtained will not influence our decision.
In practice a level of significance of 0.05 or 0.01 is customary, although other
values are used. If for example a 0.05 or 5% level of significance is chosen in
designing a test of a hypothesis, then there are about 5 chances in 100 that we
would reject the hypothesis when it should be accepted; i.e., whenever the null
hypothesis is true, we are about 95% confident that we would make the right
decision. In such cases we say that the hypothesis has been rejected at a 0.05
level of significance, which means that we could be wrong with probability 0.05.
8.5
Test Involving the normal Distribution
To illustrate the ideas presented above, suppose that under a given
hypothesis, the sampling distribution of a statistic S in a normal distribution with
mean µ s and standard deviation σ s . The distribution of that standard variable
Z = ( S − µ s ) / σ s is the standard normal distribution (mean 0, variance 1) shown
in Figure 8.1, and the extreme values of Z would lead to the rejection of the
hypothesis.
97
Critical
region
Critical
region
0.95
0.25
0.25
Z = -1.96
Z = 1.96
As indicated in the figure, we can be 955 confident that, if the hypothesis
is true, the Z-score of an actual sample statistic S will be between –1.96 and 1.96
(since the area under the normal curve between these values is 0.95).
However, if on choosing a single sample at random we find that the Z Score of its
statistic lies outside the range –1.96 to 1.96, we would conclude that such an
event could happen with the probability of only 0.05 (total shaded area in the
figure) if the given hypothesis was true. We would then say that this Z-Score
differed significantly from what would be expected under the hypothesis, and we
would be inclined to reject the hypothesis.
The total shaded area 0.05 is the level of significance of the test. It represents the
probability of our being wrong in rejecting the hypothesis, i.e., the probability of
making a Type I error. Therefore, we say that the hypothesis is rejected at a 0.05
level of significance or that the Z Score of the given sample statistic is significant
at 0.05 level of significance.
The set of Z Scores outside the range –1.96 to 1.96 constitutes what is called the
critical region or region of rejection of the hypothesis of the region of
significance. The set of Z Scores inside the range –1.96 to 1.96 could then be
called the region of acceptance of the hypothesis or the region of non significance.
On the basis of the above remarks, we can formulate the following decision rule:
a)
Reject the hypothesis at a 0.05 level of significance of the Z Score of the
statistic S lies outside the range –1.96 to 1.96 (i.e., if either Z > 1.96 or
Z < −1.96). This is equivalent to saying that the observed sample statistic
or significant at the 0.05 level.
b)
8.6
Accept the hypothesis 9or, if desired, no decision at all) otherwise.
One-Tailed and Two-Tailed Tests
In the above test we displayed interest in extreme values of the statistic S
or its corresponding Z Score on both side of the mean, i.e., in both tails of the
98
distribution. For this reason such tests are called two-tailed tests or two-sided
tests.
Often, however, we may be interested only in extreme values to one side
of the mean, i.e., on one tail of the distribution, as for example, when we are
testing the hypothesis that one process is better than another (which is different
from test whether one process is better or worse than the other). Such tests are
called one-tailed tests or one-sided tests. In such cases, the critical region is a
region to one side of the distribution, with area equal to the level of significance.
8.7
P-Value: The P-value is the smallest value of α which will lead to the rejection
of the null hypothesis.
8.8
Special Tests
For large samples, many statistics share nearly normal distributions with
mean µ s and standard deviation σ s . In such cases we can use the above results
to formulate decision rule or tests of hypotheses and significance. The following
special cases are just a few of the statistics of practical interest. In each case the
results hold e for infinite populations or for sampling with replacement. For
sampling without replacement from finite populations, the result must be
modified.
1.
Means Here S = X , the sample mean; µ s = µ x = µ , the population mean;
σ s = σ x = σ n , where σ is the population standard deviation and n is
the sample size. The standardized variable is given by
Z=
x−µ
σ/ n
(1)
for n ≥ 30 .
for n < 30 ,
tc =
2.
x−µ
S n
Proportions: Here S = P, the proportion of “successes” in a sample;
µ s = µ p = P, where p is the population proportion of successes and n is the
sample size; σ s= σ p =
given by
pq / n , where q = 1 – p. The standardized variable is
99
Z=
P− p
pq / n
(2)
In case P =
x
, where x is the actual number of successes in a sample, (2)
n
becomes
Z=
3.
X − np
npq
(3)
Differences of means let X 1 and X 2 be the sample means obtained in large
samples of sizes n 1 and n2 drawn from respective populations having means
µ1 and µ 2 and standard deviations σ 1 and σ 2 . Consider the null hypothesis that
there is no difference between the population means, i.e., µ1 = µ 2 .
µx− x = 0
σx
1 −x
=
2
σ 12
n1
+
σ 22
(4)
n2
The standardized variable is given by
Z=
4.
X1 − X 2 − 0
σX
1−X2
=
X1 − X 2
(5)
σ X −X
Difference of proportions let P1 and P2 be the sample proportions obtained in
large samples of sizes n 1 and n2 drawn from respective proportions P1 and P2 .
Consider the null hypothesis that there is no difference between the population
proportions, i.e., P1 = P2 , and thus the samples are really drawn from the same
population.
1 1
+ 
 n1 n2 
µ p − P = 0, σ P − P 2 = P(1 − P)
1
where P =
2
1
n1P1 + n2 + P2
is used as an estimate of the population proportion P.
n1 + n2
By using the standardized variable Z =
P1 − P 2 −0
σ P −P
1
2
=
P1 − P2
σ P −P
1
we can observe
2
differences at an appropriate level of significance and thereby test the null
hypothesis.
Tests involving other statistics can similarly be designed.
100
Example: The mean lifetime of a sample of 100 fluorescent light bulbs produced
by a company is computed to be 1570 hours with a standard deviation of 120
hours. If µ is the mean lifetime of all the bulbs produced by the company, test the
hypothesis µ ≠ 1600 hours. Use a significance level of 0.05 and find the P value
of the test.
1.
Ho :
µ = 1600
H a : µ ≠ 1600
2.
This is a two tailed test.
.025
.025
.95
-1.96
1.96
we reject H o if Z c is either > 1.96 or < -1.96
3.
n = 100, Z c =
X −µ
S
n
X = 1570, µ = 1600, S = 120
Zc =
1570 − 1600 − 30
=
120
12
100
= -2.5
4.
Since Z = -2.5, < -1.96, we reject H o .
P-value = 2 P ( Z ≤ −2.5) = 2(0.0062) = .0124.
101
Learning Objectives
After working through this chapter you should be able to:
•
Define and use the terminology of statistical testing.
•
Carry out statistical tests of all the types covered in this Chapter.
•
Calculate the P-value of the simpler tests.
•
Explain the way in which the rejection regions of tests follow from the
distributional results, taking into account the level and considerations of power.
Sample Examination Questions
1.
2.
A finite population consisting of the numbers 6, 7, 8 10 and 11 can be converted
into an infinite population if we take a random of size 2 by first drawing one
element and then replacing it before drawing the second element.
(a)
Determine how many different samples of size 2 can be drawn from this
infinite population and list them.
(b)
Determine the means of the samples of part (a). What is the probability
assigned to each mean? Construct the sampling distribution to the mean for
random samples of size 2 drawn from this infinite population.
(c)
Calculate the mean and the standard deviation of the probability distribution
of part (b) and compare the value of the standard deviation with the
corresponding result obtained from the standard error of the mean formula.
(a)
Explain briefly with examples:
(i)
population parameter
(ii)
sample statistics
(iii)
population
102
3.
(b)
Chisha is a cocktail hostess in a very exclusive private club. The Zambia
Revenue Authority is auditing her tax return this year. Chisha claims that
her average tip last year was K23, 750. To support her claim, she sent the
ZRA a random sample of 52, credit card receipts showing her bar tips.
When ZRA got the receipts, they computed the sample average and found it
to be x = K 26,250 with sample standard deviation S = K 5,750 . Do these
receipts indicate that the average tip Chisha received last year was more
than K23,750. Use a 1% level of significance. Also find the P-value.
(a)
Briefly define each of the following terms:
(b)
4.
(a)
(b)
(i)
Finite population correction factor
(ii)
Simple random sampling
(iii)
Standard error
A government agency recently found that an artificial sweetener used in diet
soft drinks may have harmful side effects. Therefore, it sets limits on the
amount that each can may contain at 0.1 ounce. The manage of a local soft
drink company, thinking that the mixing machine may not be staying within
the tolerate limit, runs a test on 100 cans. The test shows the cans to have an
average of 0.13 ounce of artificial sweetener. The population standard
deviation is 0.06.
(i)
Should the manager adjust the machine if α = 0.05 ?
(ii)
If α = 0.02, should the manager adjust the machine?
(iii)
Which value of α would you pick for this problem?
(iv)
What if x = 0.12 (α = 0.02 )?
(v)
At what value of x should he keep the machine (α = 0.02) as it is?
Define each of the following:
(i)
The power of a test.
(ii)
A student’s test.
The table below shows the annual salaries in millions of kwacha of
randomly selected faculty in public educational institutions and private
educational institutions.
103
Public
private
5.
(a)
(b)
6.
(a)
(b)
80
86
90
95
100 110 85 75
105 115 92 74
65
64
85
92
72
73
74
(i)
Find a 90% confidence interval for the difference between
population mean annual salaries in the public and private
institutions.
(ii)
Test the null hypothesis that the mean salary for the private
institutions is K5,000,000 more than in the public institutions against
the alternative that the mean for the private institutions is more than
K5, 000,000 greater.
(iii)
State carefully the assumptions you have made in arriving at the test
and confidence interval.
Explain the following terms used in statistical hypothesis testing:
(i)
Rejection region
(ii)
Significance level of the test.
A random sample of 25 engineers in company A produces a mean salary of
K90,000,000 with standard deviation of K15,000,000; and a random sample
of 86 engineers in company B produces a mean salary K110, 000,000 with a
standard deviation of K20, 000, 000.
(i)
Can we conclude that company B pays its engineers more than
company A? Use an α = 0.05 level of significance.
(ii)
What is the P-value for this test?
Define each of the following:
(i)
The power of a test
(ii)
Rejecting a null hypothesis
(iii)
The Central Limit Theorem
An Air Force base mess hall has received a shipment of 10 000 gallon size
cans of cherries. The supplier claims that the average amount of liquid is
0.25 gallon per annum. A government inspector took a random sample of
100 cans and found the average liquid content to be 0.28 gallon per can
with a standard deviation of 0.10.
(i)
Does this indicate that the supplier’s claim is too low? (Use 95%
level of significance).
104
(ii)
7.
(a)
Compute the P-value.
A consumer group is testing camp stores. To test the heating capacity of a
store, the group measures the time required to bring 2 litres of water from
10°c to boiling (at sea level).
Two competing models are under consideration. Thirty-six
stores of each model are tested and the following results are obtained:
x1 = 11.4 min ; Standard deviation S1 = 25 min
Model 1:
Mean time
Model 2:
Mean time x2 = 9.9 min ;
Standard deviation S2 = 30 min
Is there any difference between the performance the performances of these
two models? (Use a 5% level of significance). Also find the P-value for the
sample test statistic.
(b)
Define briefly the following terms:
(i)
Type I error
(ii)
Decision
(iii)
Type II error
105
CHAPTER 9
ANALYSIS OF VARIANCE
Reading
Newbold Chapter 15
Wonnacott and Wonnacott Chapter 10
Tailoka Frank P Chapter 13
Introductory Comments
Analysis of Variance (ANOVA) is a popular tool that needs some time and effort to
appreciate. The idea of analysis of variance is to investigate how variation in structured
data can be split into pieces associated with components of the structure. Here we cover
one-way and two-way cases. Both tests and confidence intervals are widely used in
applications.
Analysis Of Variance
Use of F-distribution. The F-distribution is used to test the hypothesis that the variance
of one normal population equals the variance of another normal population.
The second use of the F-distribution involves the analysis of variance techniques,
abbreviated ANOVA. Basically, analysis of variance uses sample information to
determine whether or not three or more treatments produce different results. A treatment
is a cause, or specific source, of variation in a set of data. Following are several cases to
expand on the meaning of a treatment.
Do different treatments of fertilizer affect yield? Do different grades of gasoline affect
performance? Do four different assembly methods result in different population means?
106
Assumptions Underlaying The Anova Test
Before we actually conduct a test using the ANOVA techniques, the assumption
underlying the test will be examined. If the following assumptions cannot be met,
another analysis of variance technique may be applied.
1.
The three or more populations of interest are normally distributed.
2.
These populations have equal standard deviations
3.
The samples we select from each of the populations are random and independent
that is they are not related.
Analysis Of Variance Procedure:
The ANOVA procedure can best be illustrated using an example. Suppose the manager
of ABC resigned and three sales people at the branch are being considered for the
position. All three have about the same length of service, education and so on. In order
to make a decision, it was suggested that each of their monthly sales are shown in Table
1. The “treatments” in this problem are sales people.
Table 1.0
Monthly Sales of appliances for three sales People.
Sample
Ms Banda
Monthly Sales (K000)
Mr Mwenya
Mr Chisenga
25
25
19
15
15
17
14
17
13
10
16
11
21
17
12
18
14.4
17
Mean
107
The ANOVA procedure calls for the same hypothesis procedure outlined in the lecture
notes of Estimation and hypothesis testing.
STEP 1
The null hypothesis H o states that there is no significant difference among
the mean sales of the three salespeople; that is µ1 = µ 2 = µ3 . H a states that
at least one mean is different. As before, if H o is rejected, H a will be
accepted.
STEP 2
The level of significance is selected. In our case we choose 0.05 level.
STEP 3
The test statistic. The appropriate test statistic is the F-distribution.
Underlying this procedure are several assumptions.
1) The data must be at least interval level.
2) The actual selection of the sales must be chosen using a probabilitytype procedure.
3) The distribution of the monthly sales for each of the populations is
normal.
4) The variance of the three populations are equal, i.e. σ 2 1 = σ 2 2 = σ 23 .
F is the ratio of two variances.
F =
Estimatedpopulation var iancebasedonthe var iationbetweenthesamplemeans
Estimatedpopulation var iancebasedon var iationwithinsamples
MST
=
MSE
The numerator has k-1 degrees of freedom. The denominator has N-K
degrees of freedom, where k is the number of treatments and n is the number
of observations.
STEP 4
The Decision Rule.
As noted previously the F-distribution and
accompanying curve are positively skewed and dependent on:
1)
2)
The number of treatments, K, and
The total number of observations, N.
For this problems we have K-1=3-1=2 degrees of freedom in the numerator.
There are 15 observations (three samples of five each). Therefore there are
N-K=15-3=12 degrees of freedom in the denominator
108
In using the predetermined 0.05 level, the decision rule is to accept the null hypothesis
H o if the computed F value is less than or equal to 3.89; we reject H o if the computed F
value is greater than 3.89. The decision rule is shown diagrammatically.
Region of rejection
Region of acceptance
3.89
Distribution of F for a k of 3 and an N of 15.
α = 0.05
F scale critical value
α = 0.05
STEP 5
Compute F, and arrive at a decision. The first step is to set
up an ANOVA table. It is merely a convenient form to record the sum of
squares and other computations. The general format for a one-way
analysis of variance problem is shown in table 2.0
Table 2.0
A general format for Analysis of Variance Table.
Source of
variation
(1)
Sum of Squares
(2)
Degrees of
freedom
(3)
Mean squares (1)/(2)
K-1
SST
= MSTR
K −1
N-K
SSE
= MSE
N −K
SST
Between
Treatments
SSE
Error(within
treatments)
SS Total
Total
109
Formula For
SST
K −1
SSE
N −K
F=
=
MSRT
MSE
Where
MSTR is the mean square between treatments.
MSE is the mean square due to error. It is also referred to as the mean
square within treatments.
SST
is the abbreviation for the sum of square treatment and is found by:
SST =
2
(T 2 ) (∑ X )
−
∑ n
N
SSE is the abbreviation for the sum of square error.
Where:
T
N
Is the number of observations for each respective treatment
=
=
∑x
∑x
=
2
=
Treatment total
is the sum of all the observations (sales)
is the square of each observation (sales) and then the sum
of the squares.
K
=
is the number of treatments (sales people)
N
=
is the total number of observations
Compute SST
SST =
2
(T 2 ) (∑ X )
−
∑ n
N
 (75) 2 (90) 2 (72) 2  (247) 2
= 
+
+
−
5
5
5
15


= 4101.8 – 4067.27
= 34.53
Compute SSE
[T ]
∑ (X ) − ∑ N
2
SSE =
2
110
 (85) 2 (90) 2 (72) 2 
= (25) 2 + (15) 2 + ....(12) 2 − 
+
+

5
5 
 5
= 4.355 – 4.101.8 = 253.2
Total variation (SS total) is the sum of the between-columns and the between-rows
variation, that is SS total = SST + SSE = 34.53 + 253.2 = 287.73.
As a check
SS Total =
∑ (X ) −
(247)
= 4 355 15
2
(∑ X )
2
N
2
= 4.355 – 4067.27
= 287.73
Three sums of squares and the calculation needed for F are transferred to the ANOVA
Table 3.
Table 3.0
ANOVA Table for the Store Managers problem
Source of
variation
(1)
Sums of square
(2)
degrees of freedom
Between
treatment
SST = 34.53
K-1=3-1=2
Error (within
253.2
SSE =
SS Total)
287.73
Computing F: F =
N-K = 15-3 = 12
1
Mean squares  
2
SST 34.53
=
= 17.265
k −1
2
SSE
253.2
=
= 21.1
N −K
12
SST
MSRT 17.265
=
=
= 0.818
K −1
MSE
21.1
SSE
N −K
The decision rule states that if the computed value of F is less than or equal to the critical
value of 3.89, the null hypothesis is accepted. If the F value is greater than 3.89, H o is
rejected and H a is accepted. Since 0.818 < 3.89, the null hypothesis is accepted at the
0.05 level. To put it another way, the differences in the mean monthly sales (K17,000,
111
K18,000 and K14,000) are due to chance (sampling). From a practical standpoint, the
levels of sales of the three salespeople being considered for Store manager are the same.
No decision with respect to the position can be made on the basis of monthly sales.
Inferences About Treatmenat Means
Suppose in carrying out the ANOVA procedure, we make the decision to reject the null
hypothesis. This allows us to conclude that all treatment means are not the same.
Sometimes we may be satisfied with this conclusion, but in other instances we may want
to know which treatment means differ. Let us consider the following example:
Four groups of students were subjected to different teaching techniques and tested at the
end of a specified period of time. As a result of dropouts from the experimental groups
(due to sickness, transfer, and so on), the number of students varied from group to group.
Do the data shown below present sufficient evidence to indicate a difference in the mean
achievement for the four teaching techniques? Use 0.05 level of significance.
1
65
67
73
79
81
69
2
75
69
83
81
72
79
90
549
454
SS (total) =
= 139511 -
∑∑ X
(1779)
23
2
ij
3
59
78
67
62
83
76
4
94
89
80
88
425
351
−
(∑∑ X ij )
2
N
2
= 139511 – 137601.78
= 1909.22
T2
− CM
∑
i =1 ni
K
SST =
(454) 2 (549) 2 (425) 2 (351) 2
=
+
+
+
− 137601.78
6
7
6
4
= 34352.667 + 43057.29 + 30104.17 + 30800.25 – 137601.7826
= 13814.377 – 137601.783 = 712.594
SSE = SS total – SST = 1909.22 – 712.59 = 1196.63
112
Table 4.0
Anova Table For Students
Source of
Variation
SST
SSE
SS Total
Sums of square
712.59
1196.63
1909.22
Degrees of
Freedom
3
19
22
Mean square
237.53
62.98
F
237.53
= 3.77
62.98
Reject H o if the computed F value is greater than F.05, 3, 19 = 3.13. Since F = 3.77 ,
3.13 we reject H o .
Recall in the Stores manager data there was no difference in the treatment means. In this
case further analysis of the treatment means is not warranted. However, in the foregoing
example, regarding mean achievement for the four teaching techniques, there was a
difference in the treatment means. That is, the null hypothesis is rejected and the
alternate hypothesis accepted. If the achievement do differ, the question is between
which groups do the treatment means differ?
Several procedures are available to answer this question. Perhaps the simplest is through
the use of confidence intervals. A confidence interval for the difference between two
population means is found by:
(X
1
)
− X 2 ± tα
2
N −K
1 1
MSE  + 
 n1 n2 
Where:
X1
is the mean of the first treatment
X2
t
MSE
n1
n2
is the mean of the second treatment
is obtained from the table. The degrees of freedom are equal to N-K
is the mean square error term obtained from the ANOVA table (SSE/N-K)
is the number of observations in the first treatment
is the number of observations in the second treatment.
If the confidence interval includes 0, we conclude there is no difference in the pair of
treatment means. However, if both end points of the confidence interval are of the same
sign, it indicates that the treatment means differ.
The 0.95 level of confidence for the difference between µ1 and µ 2 is found by
113
(X
1
)
− X 2 ± tα
2
,N −K
1 1
MSE  + 
 n1 n2 
= (75.67 – 78.43) ± 2.093
1 1
62.98 + 
6 7
= -2.76 ± 9.24
= --12.00 and 6.48
where
X 1 = 75.67, X 2 = 78.43
t = 2.093 from Appendix A table A.6 (N-K = 19 degree of freedom).
MSE = 62.98 from the ANOVA Table
n1 = 6, n2 = 7
Similarly, consider X 1 = 75.67 and X 4 = 87.75
We found that the 95 percent confidence interval ranges from –22.8 up to –1.36. Both
end points are negative: we can conclude these treatment means differ significantly. That
is students subjected to teaching techniques 4 have higher score than those subjected to
teaching technique 1.
Caution
The investigation of differences in treatment means is a sequential process. The initial
step is to conduct the ANOVA test. Only if the null hypothesis that the treatment means
are equal is rejected should any analysis of the treatment means be attempted.
Two-Way Anova:
In the appliance sales, example, we were unable to show that a difference exists among
the mean sales of the three salespeople. In the computation of F- statistic, variation was
considered as originating from two sources. First, variation within each of the treatment
was considered. The variation either originated from the treatment or was considered
random. There are other possible sources of variation, such as the training the sales
people had, the days of the week on which the sample data were obtained, etc. Two-way
analysis of variance allows us to consider at least one other of these possibilities.
Example:
EUROAFRICA, is expanding bus services from the Capital City into the heart of the
Copperbelt. There are four routes being considered from Kitwe to the other four towns.
The travel times in minutes along each of the four routes are given below.
114
Travel Time From Kitwe To Other Four Towns
DAY
Monday
Tuesday
Wednesday
Thursday
Friday
LUANSHYA
40
38
38
37
41
NDOLA
45
42
40
43
41
CHINGOLA
46
44
44
42
40
MUFULIRA
34
30
33
40
32
At the 0.05 significance level, can it be concluded there is a difference among the four
routes? Does it make a difference which day of the week it is?
The null hypothesis is that the mean time is the same along the four routes, then this
requires the one-way ANOVA approach. The variation that occurs because of
differences in the days of the week is considered random and is included in the MSE
term. Thus the F ratio is reduced,. If the variation due to the day of the week can be
removed, the denominator or the F ratio will be reduced. In this case, the day of the week
is called a blocking variable. Hence, we have variation due to treatment and due to
blocks. The sum of squares due to block (SSB) is computed as follows:
SSB
=
∑B
K
2
−
(∑ X ) 2
N
Where B refers to the block total, that is, the total for each row, and K refers to the
number of items in each block.
The same format is used for the two-way ANOVA Table as was used in the one-way
ANOVA case. SST and SS total are computed as before. SSE is obtained by subtraction
(SSE = SS Total – SST-SSB). Table 4.0 shows the necessary calculations.
Calculations Needed For Two-Way ANOVA
Day
Monday
Tuesday
Wednesday
Thursday
Friday
Column Total
Sum of Square
Sample size
Luanshya
40
38
38
37
41
194
7538
5
Ndola
Chingola
Mufulira
46
44
44
42
40
216
9352
5
34
30
33
40
32
169
5769
5
45
42
40
43
41
211
8919
5
Row
Sum
165
154
155
162
154
790
31578
Analogous to the ANOVA Table for a one-way analysis, the two way general format is:
Source of
(1)
Sum of Squares
(2)
Degrees of
freedom
(3)
Mean squares (1)/(2)
115
SST
K −1
Treatments
SSB
n −1
Blocks
SSE
SSTotal
Error
( K − 1)(n − 1)
SST
= MSTR
K −1
SSB
= MSB = meansquare
n −1
=
SSE
= MSE
( K − 1)(n − 1)
Total
As before, to compute SST
SST =
2
(T 2 ) (∑ X )
−
∑ n
N
 (194) 2 (211) 2 (216) 2 (169) 2  (790) 2
= 
+
+
+
−
5
5
5 
20
 5
= 31474.8 – 31205
= 269.8
SSB is found by:
SSB
=
∑
[B ] − (∑ X )
2
2
K
N
 (165)2 (154 )2 (155)2 (162 )2 (154 )2 
= 
+
+
+
+
 − 31205
4
4
4
4 
 4
= 31231.5 – 31205 = 26.5
The remaining sum of squares are
SS Total =
∑ (X ) −
2
(∑ X )
2
N
116
(790)
= 31578 20
2
= 31578 – 31205
= 373
SSE = SS total – SST – SSB
= 373 – 269.8 – 26.5
= 76.7
The values for the various components of the ANOVA Table are computed as follows:
Source of
variation
(1)
Sum of Squares
(2)
Degrees of
freedom
(3)
Mean squares (1)/(2)
3
89.933
4
6.625
12
6.392
269.8
Treatments
26.5
Blocks
76.7
Error
Total
373
19
There are two sets of hypothesis being tested:
1.
2.
Ho
The treatment means are the same. µ1 = µ 2 = µ3 = µ 4
Ha
The treatment means are not the same.
Ho
The block means are the same. µ1 = µ 2 = µ3 = µ 4 = µ5
Ha
The block means are not the same.
First we all test the hypothesis concerning the treatments means. There are K-1 = 4-1 = 3
degrees of freedom in the numerator and (n-1) (K-1) = (4-1)(5-1) = 12 degrees of
freedom in the denominator. Using the 0.05 significance level, the critical value of F is
3.49. The null hypothesis that the mean times for the four routes are the same is rejected
if the F ratio exceeds 3.49.
117
F=
MSTR 89.933
=
= 14.07
MSE
6.392
The null hypothesis is rejected and the alternate accepted. It is concluded that mean
travel time is not the same for all routes. EUROAFRICA will want to conduct some tests
to determine which treatment means differ.
Next, we test to find out if the travel time is the same for different days of the week. The
degrees of freedom in the numerator for blocks is n-1 = 5-1 = 4. The degrees of freedom
in the denominator is the same as before: (n-1) (K-1) = (5-1) (4-1) = 12. The null
hypothesis that the block means are the same is rejected if the f ratio exceeds 3.26.
MSB 6.625
F=
=
= 1.04
MSE 6.392
The null hypothesis is accepted. The mean travel time is the same for the various days of
the week.
Problems
1)
Suppose that we want to compare the cholesterol contents of four competing diet
foods on the basis of the following data (in milligrams per package) which were
obtained for three 6-ounce packages of each of the diet foods.
Diet Food
A
3.6
4.1
4.0
nA = 3
B
3.1
3.2
3.9
nB = 3
C
3.2
3.5
3.5
nC = 3
D
3.5
3.8
3.8
nD = 3
The means of these four samples are YA = 3.9, YB = 3.4 , YC = 3.4 and Y4 = 3.7 .
We want to know whether the differences among them are significant or whether
they can be attributed to chance, use 0.05 level of significance.
2)
Of the three banks in Kitwe, customers are randomly selected from each bank and
their waiting times before service are recorded.
Bank
ZNCB
4.8
Standard Chartered 6.9
bank
7.1
Barclays bank
Waiting time (minutes)
5.5
6.3
8.5
5.3
4.3
3.5
118
Do these data indicate a significant difference among the mean waiting times of
these banks? Use the 0.05 significance level.
3.
4)
5)
A Wholesaler is interested in comparing the weight in grammes of tomatoes from
Lusaka, Ndola and Kitwe.
Lusaka
Ndola
Kitwe
5.6
8.8
9.0
7.8
8.2
7.4
8.2
11.0
10.1
8.9
9.3
10.0
a)
What are the null and alternate hypothesis?
b)
Fill in an ANOVA Table
c)
What is the critical value of F, assuming a 0.01 level of significance?
d)
What decision should the wholesaler make?
Refer to problem 3. Let µ A and µ B respectively, denote the mean weights in
grammes of tomatoes from Lusaka and Ndola.
a)
Find a 95 percent confidence interval for µ A
b)
Find a 95 percent confidence interval for µ B
c)
Find a 95 percent confidence interval for µ A − µ B
d)
What conclusion can you draw from the interval in c.
An experiment was conducted to complete the effect of four different chemicals,
A, B, C and D. In producing water resistance in textiles. A strip of materials,
randomly assigned to receive one of the four chemicals, A, B, C, or D. This
process was replicated three times, thus producing a randomized block design.
The design, with moisture-resistance measurement, is as shown in the
accompanying diagram (low readings indicate low moisture penetration).
119
a)
Do these data indicate a significant difference among the mean waiting
times of these banks? Use the 0.05 significance level.
b)
Do the data provide evidence to indicate that blocking increased the
amount of information in the experiment?
c)
Find a 95% confidence interval for the difference in mean moisture
penetration for fabric treated by chemicals A and D.
d)
Interpret the interval.
1
C
9.9
A
10.1
B
11.4
D
12.1
2
D
13.4
B
12.9
A
12.2
C
12.3
3
B
12.7
D
12.9
C
11.4
A
11.9
ANSWERS
Diet Food:
A
3.6
4.1
4.0
Total ∑ X 11.7
∑X
2
B
3.1
3.2
3.9
10.2
35.06
45.77
C
3.2
3.5
3.5
10.2
34.74
D
3.5
3.8
3.8
11.1
41.13
(∑ X )
∑ (X ) −
2
SS Total =
2
(43.2)
N
2
= 156.7 -
12
= 156.7 – 155.52
= 1.18
120
(∑ T ) − (∑ X )
SST =
2
2
n
=
N
(11.7) 2 + (10.2) 2 + (10.2) 2 + (11.1) 2
3
- 155.52
136.89 + 104.04 + 104.04 + 123.21
− 155.52
3
= 156.06 – 155.02
=
= 0.54
SSE = SS Total – SST = 1.18 – 054 = 0.64
___________________________________________________
Source of
Degree of
Mean square
F
Variation
Freedom
____________________________________________________
SST = 0.54
3
0.18
SSE = 0.64
8
0.08
2.25
___________________________________________________
SS Total = 18
11
____________________________________________________
F.05, 3,8 = 4.07, Accept H o
2)
_______________________________________________
Bank
Waiting
Sample
∑X
∑X
2
Time
Size
________________________________________________
ZNCB
4.8, 5.5, 6.3
3
16.6
92.98
Standard
Chartered
Bank
6.9, 8.5, 5.3, 4.3
4
25
166.44
Barclays
7.1, 3.5
2
10.6
62.66
________________________________________________
(52.2) 2
= 322.08 − 302.76 = 19.32
SS Total = 322.08 −
9
121
(16.6) 2 (25) (10.6 )
SST =
+
+
− 302.76
3
5
2
2
2
= 91.853 + 156.25 + 56.18 – 302.76
= 304.283 – 302.76 = 1.523
SSE = SS Total – SST = 19.32 – 1.523 = 17.797
Source of
variation
SST
SSE
Sum of Square
1.523
17.797
Degree of
freedom
2
6
SS Total
19.32
8
Mean square
F
0.7615
2.966
0.257
F.05, 2, 6 = 5.14. Accept H o
3.
H o : µ1 = µ 2 = µ3
H a : not all equal
Reject H a is F is greater than 8.02.
SS Total = 428.59 −
(104.3) 2
12
= 928.59 – 906.54 = 22.05
(23.4) 2 (31.6) 2 (49.3) 2
SST=
+
+
− 906.54
3
4
5
= 11.718
SSE = 22.05 – 11.718 = 10.332
Source of
variation
SST
SSE
Sum of Square
11.718
10.332
Degree of
freedom
2
9
Mean square
F
5.859
1.148
5.10
122
SS Total
22.05
11
H o cannot be rejected. The evidence does not suggest any differences jin weights
of tomatoes.
4)
a)
for a simple treatment T1 ± tα S / n1
2
2
where S =
S =
MSE
7.8 ± t.025 , 9 (1.07)/ 3
7.8 ± 2.262 (0.618)
(6.402, 9.198)
b)
7.9 ± (2.262)
(1.071)
4
7.9 ± 1.2
(6.7, 9.1)
(T − T ) ± t
1
j
α
S
2
c)
1 1
+
ni n j
(7.8 – 7.9) ± (2.262) (1.071)
-0.1 ± 1.85
1 1
+
3 4
(-1.95, 1.75)
This interval traps 0 which implies there is no significant difference
between the two means.
2
5)
2
(43.5)
(50.8)
(48.9) 2 (143.2)
SSB =
+
+
−
4
4
4
12
2
= 473.0625 + 645.16 + 596.8025 – 1708.85 = 7.175
SS Total
=
1721.76 -
(143.2)
12
2
= 1721.76 – 1708.85 = 12.91
123
SST =
(34.2)2 + (37 )2 + (33.6)2 + (38.4)2
3
33
3
− 1708.85
3
= 39.88 + 456.33 + 376.32 + 491.52 – 1708.85
= 5.2
SSE = 12.91 – 7.175 – 5.2 = 0.535
Source of
variation
SST
SSB
SSE
SS Total
5)
a)
Sum of Square
5.2
7.175
0.535
12.91
Degree of
freedom
3
2
6
11
Mean
square
1.7333
3.5875
0.0892
F
19.43
40.22
F.05, 3, 6 = 4.76 reject H o
H a : µ A = µ B = µC = µ D
H a : not all equals.
F.05, 2, 6 = 5.14 reject H o
b)
H o : µ1 = µ 2 = µ3
H a : not all equal.
c)
(11.4 – 12.8)
±
t.025, 6
2.447
-1.4
±
1 1
0.0892 + 
3 3
0.2439
0.597
(-1.997, -0.803)
Learning Objectives
124
After working through this Chapter you should be able to:
•
Explain the purpose of analysis of variance
•
Carry out small examples of one way and two-way analysis of variance with a
hand calculator, presenting in an ANOVA table.
•
Carry out tests of hypothesis, and to write down confidence intervals as in this
Chapter.
Sample Examination Questions
1.
a)
A restaurant owner operates three restaurant within a city. One in a major
shopping centre (A), one near the college campus (B), and one at the park
area (C). The management has collected the following data on daily sales
(in thousands of kwachas).
A
B
C
Monday
10.5
8.4
5.9
Tuesday
8.4
9.3
7.1
Friday
12.6
11.4
6.7
Saturday
18.3
7.9
14.2
Sunday
10.8
6.3
13.7
Day
(i)
What type of experimental design is represented here?
125
2.
(ii)
Construct an ANOVA summary table for this experiment.
(iii)
Is there evidence of a difference in mean sales among the
restaurants? (Use α = 0.05 ).
(iv)
Is there evidence (at α = 0.05 ) of a difference in the mean sales for
the five days.
(v)
Estimate the difference in mean sales between the restaurant created
at the shopping center and near the college campus. Use a 90%
confidence interval.
(vi)
State the assumptions required for the validity of the procedures used
in parts (ii) to (v).
(b)
A major appliance dealer wishes to compare his mean television sales
during three different periods of the week. Beginning (Monday, Tuesday),
Mddle (Wednesday, Thursday), and End (Friday, Saturday). His plan is to
select random samples of sales records from each period, and record the
number of television sets sold. What type of experimental design is this?
(a)
What is a two-way ANOVA test?
(b)
A power plant, which uses water from the surrounding bay for cooling its
condensers, is required by the Environmental Protection Agency (EPA) to
determine whether discharging its heated water into the bay has a
detrimental effect on the flora (plant life) in the water. The EPA requests
that the power plant make its investigation at three strategically chosen
locations, called stations. Stations 1 and 2 are located near the plants
discharge tubes, while station is further out in the bay. During one
randomly selected day in each of 4 months, a diver is sent down to each of
the stations, randomly samples a square meter area of the bottom, and
counts the number of blades of the different types of grasses present. The
results are as follows for one important grass type.
Month
Station
1
2
3
May
28
31
53
June
25
22
61
July
37
30
56
August
20
26
48
126
3.
(i)
Is there sufficient evidence to indicate a difference among the mean
numbers of blades found per square meter per month for the three
stations? Use α = 0.05 .
(ii)
Is there sufficient evidence to indicate a difference among the mean
numbers of blades found per square meter for the 4 months? Use
α = 0.05 .
(c)
Place a 90% confidence interval on the difference in means between stations
1 and 3.
(a)
An advertising firm is studying the effects of four different kinds of displays
of a product in a grocery store in three different sales areas in the city.
Within each sales area, four stores are selected, and each receives one of the
four displays. Over the duration of the experiment, the number of units of
the product sold is recorded. The data are shown in the table.
Display
4.
Sales Area
1
2
3
A
120
76
95
B
114
60
102
C
140
85
122
D
102
80
85
(i)
Which model is appropriate for analyzing these data? Explain.
(ii)
Do the four displays result in different averages? Use α = 0.05 to
reject.
(b)
State the three assumptions of the error term in the analysis of variance
models. Which of the three assumptions is most critical in validating an
analysis of variance model fitted to a data set?
(a)
What is an ANOVA test?
127
(b)
A supermarket chain conducted a study to determine where to place its
generic brand products in order to increase sales. Sales (in thousands of
kwacha) for one were as follows:
Store 1
Store 2
Store 3
High shelf
60
56
52
Eye-level shelf
53
58
56
Low shelf
55
55
59
Perform a two-way analysis of variance. Using the level of significance
α = 0.05 .
5.
(a)
Three of the currently most popular television shows produced the following
ratings (percentage of the television audience tuned into the show) over a
period of four weeks:
Week
1
2
3
4
Totals
A
34.7
38.1
35.1
30.4
138.3
SHOW
B
28.4
32.2
32.4
28.2
121.2
C
23.8
20.7
25.8
29.9
99.2
Totals
86.9
91.0
93.3
87.5
358.7
(i)
Is there evidence (at α
for the three shows.
(ii)
Is there evidence (at α =0.01) that the use of weeks as blocks is
justified in this experiment.
(iii)
Construct a 95% confidence interval for the difference in mean
ratings between shares B and C.
(iv)
State the assumptions necessary for the validity of the procedure
used in (i) to (iii).
=
0 . 01 ) that the mean ratings differ
128
(b)
Independent random samples of six assistant professors, four associate
professors and five full professors were asked to estimate the amount of
time outside the classroom spent on teaching responsibilities in the last
week. Results, in hours are shown in the accompanying table.
Assistant
8
13
12
16
10
12
Associate
16
13
16
9
Full
12
8
7
10
8
(i)
What type of experiment design is represented here.
(ii)
Set out the analysis of variance table.
(iii)
Test the null hypothesis that the three population times are equal.
Use α = 0.05 .
129
CHAPTER 10
TIME SERIES
Reading
Newbold Chapter 17
Tailoka Frank P Chapter 6
Plane and Oppermann 395
Introductory Comments
This Chapter follows from the Index and allows the understanding of some alternative
ways of presenting the results. Index numbers plays an important role in forecasting and
here models of forecasting are presented.
10.1
Introduction
Any variable that is measured overtime in sequential order is called a time series.
The primary characteristic of a time series is the assumption that the observations
have some form of dependence on time. Since this time dependence may take on
any number of possible patterns, the problem becomes one of identifying the most
important factors.
Business people, economists, and analysts of various kinds all look back at the
sequence of events that occurred over the past year or years in order to understand
what happened and thereby (they hope) to be in a better position to anticipate
what may happen in the future.
A leveling-off long-term population growth, for example, may indicate to a
particular firm that future market expansion may not be unlimited and that more
careful attention should be paid to increasing the firm’s market share. Even with
a general slowdown in population growth, the gradual aging of the population
may imply to another firm – one concentrating in consumer goods for older
people – that its total market potential is growing substantially year after year,
other types of time – dependent patterns may exist, as well. In looking at a time
series of monthly or quarterly beer sales, for example, we may discover a regular
seasonal pattern in which beer consumption peaks. Other regular periodic or
seasonal variation can be observed in sales of college textbooks, and in the
observance of such social customs as giving Christmas gifts and Valentine’s Day
flowers.
130
The task of time – series analysis can therefore be thought of quite generally as a
matter of identifying and isolating the various major time dependent patterns on a
given time series data array. Once accomplished, this analysis should enhance the
user’s ability to forecast variables of interest over the future.
The classical time-series model focuses on the decomposition of the timedependent variable into four component parts: trend (T), cycle (C), seasonal
variation (S), and residual or irregular variation (I).
The model may be additive in its component parts:
Yt = Tt + St + I t + Ct
or multiplicative in its component parts,
Yt = Tt × C1 × S1 × I1
The movements of a time series may be classified as follows:
1.
A trend (also known as a secular trend) is a long-term relatively smooth pattern
or direction that the series exhibits. By definition, it has a duration of more than
one year. For example, data for beer sales show them to have an upward trend to
the right, whereas birth rates over the last few years seem to have a downward
trend to the right.
2.
A cycle is a wavelike or oscillatory pattern about a long-term trend that is
generally apparent over a number of years. By definition, it has a duration of
more than one year. Examples of cycles are well known business cycles that
record periods of economic recession and inflation, long-term product demand
cycles and cycles in the monetary and financial sectors.
3.
Residual or Irregular Variation is the random movement that a series exhibits
after the trend, cycle, and seasonal variation are removed. For example, daily
centimeters of rainfall in a particular urban setup during a given month is often
random in this sense. Notice that all time series exhibit random variation while
they may not have a trend, a cycle, or seasonal variation. Moreover, whether or
not a particular trend, cycle, or seasonal variation is present in a given time series
critically depends on the time period chosen for observation.
4.
Seasonal – these are the oscillations, which depend on the season of the year.
Thus, employment is usually higher at harvest time at Nakambala Sugar Estate in
Mazabuka. Rainfall will be higher at some times of the year than at others.
The motivation behind decomposing a time series is twofold. On the one hand,
we wish to see whether a particular component is present in a given time series
and to understand the extent to which it explains some of the movements in the
131
variable of interest. On the other hand, if we wish to forecast a particular
variable, we can usually improve our forecasting accuracy by first breaking it into
component parts, then forecasting each of these parts separately, and finally
combining the individual effects to produce the composite overall forecast.
Business Forecasting is concerned with estimating the future value of some
variable of interest. This may be done for the short-term or for the long-term, and
different forecasting models are more appropriate for one case than for the other.
Forecasting may be done in any of three possible ways. Using regression models,
using time series models, and using forecasting models especially created for a
specific purpose. Indeed, quantitative forecast models have even been designed
for cases in which historical databases are not available – such as when a firm
wishes to forecast sales of a new product or the expected profitability or market
share for such a product.
Today, forecasters have developed a specialized terminology or jargon and many
forecasting models require a level of mathematical sophistication and the
availability of computers and specialized computer software that go far beyond
the scope of this book. As such, our objective in this course is to provide the
student with a basic understanding of the underlying issues about the use of
various types of forecasting models, rather than to provide a sophisticated level of
hands – on experience.
10.2
Trend Analysis
The first component of a time series that we will consider is the long-term trend.
A trend can be linear or nonlinear and, indeed, can take on a whole host of other
functional forms such as polynomials and logarithmic trends, among others. We
shall begin by working through an example using a linear model.
Example
Annual sales for a pharmaceutical company have been recorded over the past 10
years; they are shown in Table 1.1. Calculate a linear trend of the data.
Table 1.1 – Annual Data for Pharmaceutical example.
How we measure time along the horizontal axis (it turns out) is irrelevant in timeseries analysis. We can suit ourselves, picking whatever numbers serve to reduce
the computational burden. A common practice is to measure the time periods
consecutively (1, 2, 3, ….), and we shall do so here.
Table 1.2 – Calculations for Example 1.1
132
SALES
Y
18.0
19.4
18.0
19.9
19.3
21.1
23.5
23.2
20.4
24.4
207.2
YEAR
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
TOTAL
YEAR
1975
TIME
X
1
2
3
4
5
6
7
8
9
10
55
X2
1
4
9
16
25
36
49
64
81
100
385
XY
18.0
38.8
54.0
79.6
96.5
126.6
164.5
185.6
183.6
244.0
1,191.2
SALES
(in K millions)
18.0
1976
19.4
1977
18.0
1978
19.9
1979
19.3
1980
21.1
1981
23.5
1982
23.2
1983
20.4
1984
24.4
Least Squares Method: The simplest method of fitting a linear trend is to use the
least squares approach we discussed in the handout on Regression Analysis. In
this method, the formulas for the slope and intercept are:
133
b=
∑ xy − ∑
∑x
a = Y − bx
b=
a=
2
−
∑y
x
n
(∑ x )2
n
−
1191.2 −
(55)(207.2)
10
2
(
55)
385 −
10
=
51.6
= .6255
82.5
(55) = 17.28
207.2
− .6255
10
10
and the following trend equation can be written as:
Y = 17.28 + .6255 x
= 17.28 + .6255(11)
= 24 .1605
= 24.16
Similarly, forecasting 2 years ahead would involve setting x equal to 12; and so on.
Both confidence and prediction intervals can be constructed to give us a bound of
confidence about our forecast. The Caveat about forecasting outside the data range must
be emphasized here-especially if forecasting for more than one time period is being
contemplated.
Example 1.2
Among the more common functional forms used in trend analysis are the following three:
1.
A linear model,
y = P0 + P1 × x
134
which is appropriate if the first differences are roughly equal (first differences are
between success values in time series).
2.
A polynomial form,
y = P0 + P1 x + P2 x 2
( parabola )
or
y = P0 + P1 x 2
( parabola )
which is appropriate of differences between successive first differences).
3.
A logarithmic trend or exponential trend,
Y = P0 ( P1 ) x
or
log y = log P0 + (log P1 ) x
which is appropriate if neither A linear or polynomial form fits but there
nonetheless appears to be a constant rate of increase over time.
10.3
Moving Averages
An alternative approach to trend-cycle analysis is to use moving averages. In a
sense, the moving average, MA, takes away the short-term seasonal and irregular
variation, leaving a combined trend widely used to remove seasonal variation,
irregular variation (or “noise”, as it is also called), or both.
Example 1.2
Monthly sales figure for gasoline were recorded at all the gas stations in a
particular town, as shown in table 1.3. Calculate the three-month and five month
moving averages.
Example 1.3
Monthly Regional Gasoline Sales
MONTH
1
2
3
4
5
6
7
8
9
10
11
12
GASOLINE SALES
(1000s of kilograms)
37
70
45
26
60
45
31
79
24
61
25
44
135
Solution
A moving average is a simple arithmetic average computed over any number of time
periods. For a three period moving average, we would take the first three months (1, 2,
and 3) and average them. Then we would move to the next month grouping (2, 3 and 4)
and averaging them; and so on. In a similar fashion, we can compute 5 month moving
averages, as shown in table 1.4, or any other number – of month’s averages.
Table 1.4 – Calculations for Moving Averages for Gasoline Sales Example
3 month MA
5 month MA
Month
Gasoline Moving
÷3
Moving
÷ 3 Moving
Sales
Total
Moving
Total
= Average
=
Average
1
37
2
70
152
50.7
3
45
141
47.0
238
47.6
4
26
131
43.7
246
49.2
5
60
131
43.7
207
41.4
6
45
136
45.3
241
48.2
7
31
155
51.7
239
47.8
8
79
134
44.7
240
48.0
9
24
164
54.7
220
44.0
10
61
110
36.7
233
46.6
11
25
130
43.3
12
44
Notice that, the longer the time period, over which we average, the smoother the series
becomes. Eventually it becomes a straight line moving average. Reducing the number
of observation points for the 3 month moving average, we lose the first and last month;
for the 5 month moving average, we lose both the first 2 and the last 2 months.
In general, if we set the period of the moving average exactly equal to the number of
seasonal variations that occur in a given time series, we exactly remove that seasonal
variation. For example, if we have quarterly observations and wish to remove the four
seasons, we choose a 4 – period moving average. Here (and in general) when the number
of periods chosen is even – numbered we must compute a centered moving average.
Example 1.3
Historical occupancy rates for a Kasaba resort hotel have been compiled by the
government tourism office; these are shown in Table 1.5 calculate 4 – quarter moving
average.
Solution
To remove the seasonal variation, we need to compute a 4 – period moving average.
This, however, would place the moving average exactly between the two quarters.
Consequently, we next take a 2 period moving average of all 4 period moving averages,
thereby centering the final moving average on a particular quarter. Our calculations
appear in Table 1.6.
136
Notice that we first calculated the 4-quarter moving and then centered it by determining
the averages of each pair of adjacent moving averages. For example, the moving average
of the first four quarters is 1972.75. The moving average of quarters (1980 and 1981) II,
III, IV and I are 1983.50. The centered moving average is (1972.75 + 1983.50)/2 =
1978.1. The remaining centered moving averages are computed in a similar manner.
Table 1.5 – Hotel Occupancy Rates
Year
1980
1981
1982
1983
1984
Quarter
Hotel Occupancy
I
II
III
IV
I
II
III
IV
I
II
III
IV
I
II
III
IV
I
II
III
IV
1682
2105
2401
1703
1725
2215
2603
1815
1783
2215
2187
1801
1867
2124
2417
1896
1995
2504
2619
2011
Moving averages are specifically designed to remove seasonal and/or irregular variations.
As such, they can be thought of as serving three purposes. First, they are one of several
types of smoothing techniques that remove short-term variation and leave only a
combined trend-cycle. In other words, if we think of the classical multiplicative time –
series model, we have
Y = T .C.S .I
by dividing both sides by (S.I.), we get
Y
T .C.S .I .
=
= T .C = MA
S .I
S .I
That is, we are left with the moving average series, which is composed solely of the trend
and cycle.
137
second, we can set the period of the moving average exactly equal to the number of
seasonal effects we wish to remove. In that sense, we have deseasonalized our time
series.
Table 1.6 – Centered Moving Average Calculation for Hotel Occupancy
Year
1980
1981
1982
1983
1984
Quarter
Occupancy
I
II
III
IV
I
II
III
IV
1682
2105
2401
1703
I
II
III
IV
1725
2215
2603
1815
1783
2215
2187
1801
I
II
III
IV
I
II
III
IV
1867
2124
2417
1896
1995
2504
2619
2011
4 Quarter
Moving
Average
1972.75
1983.50
2011.00
2061.50
2089.50
2104.00
2104.00
2 Term
Moving
Total
3956.25
3994.50
4072.50
4151.00
4193.50
4208.00
Centered
Moving
Average
1978.1
1997.3
2036.3
2075.5
2096.8
2104.0
2000.00
1996.50
2017.50
1994.75
4104.00
3996.50
4014.00
4012.25
2052.0
1998.3
2007.0
2006.1
2052.25
2076.00
21.08.00
2203.00
2253.50
2282.25
-
4047.00
4128.25
4184.00
4311.00
4456.00
4535.75
-
2023.5
2064.1
2092.0
2155.5
2228.3
2267.9
-
This is one of the simplest methods of forecasting but it is only appropriate for series with
no trend or seasonal effect. It is often used to predict the demand for a product in the
next time period so that sufficient stock can be kept to supply it. (This is called demand
forecasting.)
10.4
Irregular Variation
Irregular or random variation remains after the trend, cyclic and seasonal variation
have been removed. One way of removing it is through smoothing techniques,
such as the moving average we discussed in section 1.3. another popular
technique is exponential smoothing, which we shall look at shortly.
By definition, irregular variation is unpredictable and random, can only
sometimes be identified through examination of major external events that might
have influenced the time series, and often tend to cancel each other out over time.
Although certain mathematical techniques (such as spectral analysis) address
themselves to irregular variation and movements in residual error terms, they are
beyond the scope of this course.
138
Exponential Smoothing – Exponential smoothing offers an alternative to moving
averages as a way of smoothing a exponential smoothing.
st = αYt + (1 − α )Yt −1 + α (1 − α ) Yt − 2 + ...
2
This formula states that the current period’s smoothed value of the time series, St
depends on all past values of the dependent variable, although these are weighed
progressively less the farther back they go. We set the smoothing constant α such that
2
0 ≤ α ≤ 1 , which means that the successive values of α ,α (1 − α ),α (1 − α ) ..., get smaller
and smaller. There is a mathematical procedure for selecting the best or optimal value of
the smoothing constant, but it is beyond the level of this course. In fact, selecting small
values for α straightens out the time series more completely than selecting large values
of α does. By simple mathematical derivation, it can be shown that the extended
exponential smoothing equation just described reduces to a computationally simpler
form, called the basic exponential smoothing equation:
St = αYt + (1 − α )St −1
or
St = α (Yt − St −1 ) + St −1
for
0 <α <1
(1)
Note that St is the forecasted value and Yt is the actual value.
We begin the smoothing procedure by initially setting S1 = Y1 in the first period.
Successive values are individually computed as:
S 2 = αY2 + (1 − α )S1
S3 = αY3 + (1 − α )S 2
and
so on.
Setting the smoothing constant to either of its extremes yields one of two cases. When
α = 0, then
St = (0. yt ) + (1 − 0 )St −1
= St −1
Since we set S1 = Y1 , it follows that St = Y1 for all t . Thus smoothed values are simply
equal to the initial value of the time series. Setting α = 1, then
St = (1. yt ) + (1 − 1)St −1
= Yt
Thus, the smoothed value of the series is just the most recent observation, and all earlier
observations are ignored. Such a series is called a random walk or a naïve forecasting
model. Here, the forecast value in any particular year is simply the previous year’s
value.
The layout for working out problems using equation (1) is as follows:
139
Time Period
(t)
1
Actual alues
(Yt)
Y1
(
α Y − S0
Forecasted Values
St
)
S1
Y3
α (Y2 − S1 )
α (Y2 − S2 )
S3 = S 2 + α (Y2 − S1 )
.
.
.
.
.
.
2
Y2
3
.
.
t
Yt
S 2 = S1 + α (Y1 − S0 )
α (Yt − St −1 )
St = St −1 + α (Yt −1 − St − 2 )
The forecasts of values X t +1 are obtained by the series St +1 = St + α (Yt − St −1 ). This single
value is then used as the forecast value in all future years, i.e., for t = 2,3,...
Example 1.3
Consider the example used by Roger C. Pfaffenberger and James H. Patterson, book, Statistical
Methods (1987) page 899. information on monthly sales of computer software from Daltons
Software, Inc., in Fortworth, Texas, for 1986 is given in Table 1.0 using α - values of
α = 0.1 and α = 0.9 and forecast of sales for January 1986 of $2,100, forecast sales for
February 1986 through January 1987.
t
Yt
Month 1986
January
1
$1,800
February
2
2,000
March
3
1,800
April
4
3,000
May
5
2,700
June
6
1,900
July
7
3,000
August
8
2,600
September
9
1,700
October
10
1,200
November
11
2,400
December
12
1,500
Actual Sales
Time Period
t
1
2
3
4
5
6
7
8
9
10
11
12
Yt
$1,800
2,000
1,800
3,000
2,700
1,900
3,000
2,600
1,700
1,200
2,400
1,500
α = 0 .1
α (Yt − St −1 )
Forecast Sales
-30
-7
-26
96
57
-29
84
36
-58
-102
28
-65
2,100
2,070
2,063
2,037
2,133
2,190
2,161
2,245
2,281
2,223
2,121
2,149
2,084
140
2
, where n is the
n +1
number of periods in the equivalent moving average. For example, for a 4quarterly moving average over 1 year (n = 4) , α = 0.4. The larger the value of
n , of course, and the smaller the value of α , the greater will be the smoothing
effect.
A useful rule for finding α is given by the formula α =
Worked Examples
1.
Exponentially Smooth the following observed series of values:
45, 43, 46.
40, 35, 39, 44,
The old forecast for the first observed value should be taken as 40 with α − 0.2 .
St = α (Yt − St −1 ) + St −1
α = 0 .2
Yt
(Yt − St −1 )
t
1
2
3
4
5
6
7
2.
40
35
39
44
45
43
46
St −1
0
-1
0
1
1
0.4
0.92
40
40
39
39
40
41
41.4
42.32
Exponentially Smooth the following data what is the new forecast for the
production of aircraft in 1971? (Take α = 0.25 ).
Year 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970
Production
of New
Aircraft
518
Year
t
1
2
3
4
5
6
7
8
9
10
11
395
Yt
518
395
487
450
319
415
431
312
278
500
450
487
450
319
415
431
α = 0.25
St −1
0
-31
0
-9
-40
-6
-0.25
-30
-31
32
12
518
518
487
487
478
438
432
432
402
371
403
415
(Yt − St −1 )
312
278
500
The new forecast for the production of aircraft in 1971 is 415.
141
450
Problems:
1.
The accompanying table shows earnings per share of a corporation over a
period of 18 years.
Year
1
2
3
4
5
6
2.
Earnings
3.63
3.62
3.66
5.31
6.14
6.42
Year
7
8
9.
10.
11
12.
Earnings
7.01
6.37
5.82
4.98
3.43
3.40
Year
13.
14.
15.
16.
17.
18.
(a)
Using Smoothing Constants α = 0.3, 0.5,
forecast based on simple exponential Smoothing.
(b)
Which of the forecasts would you choose to use?
Earnings
3.54
1.65
2.15
6.09
5.95
6.26
0.7
and
0.9, find
Manufacturer Sales of Women’s Footwear (m. pairs)
1st Quarter
2nd Quarter
3rd Quarter
4th Quarter
1966
20.9
17.3
15.6
13.9
1967
17.5
14.7
13.5
13.1
1968
17.0
13.5
13.5
13.7
Is there any evidence that manufacturers’ sales of women’s footwear are subject
to seasonal variation? Predict manufacturers’ sales during the first quarter of
1969.
New Forecasted
Value = old forecasted value + α (actual observation – old forecasted value).
α = 0 .2
Period
Reference
1
2
3
4
5
6
7
Actual
Demand
16
20
15
19
17
21
25
Old
Forecast
16
16
16.80
16.44
16.95
16.96
17.77
19.22
New
Forecast
16.00
16.80
16.44
16.95
16.96
17.77
19.22
142
y2 = 16 + 0.2(20 − 16 ) = 16 + 0.2(4 ) = 16.80
y3 = 16.80 + 0.2(15 − 16.80 ) = 16.80 + 0.2(− 1.8) = 16.44
y4 = 16.44 + 0.2(19 − 16.44 ) = 16.44 + 0.2(2.56 ) = 16.952
y5 = 16.95 + 0.2(17 − 16.95) = 16.95 + 0.2(0.05) = 16.96
y6 = 16.96 + 0.2(21 − 16.96 ) = 16.96 + 0.2(4.04 ) = 17.77
y7 = 17.77 + 0.2(25 − 17.77 ) = 17.77 + 0.2(7.23) = 19.22
Learning Objectives
After working through this Chapter you should be able to:
•
Define the term time series
•
Discuss appropriate model to use when forecasting, least squares method, moving
average method, exponential smoothing method.
Sample Examination Questions
1.
Live birth in XX and XY 2002 – 2004
(in thousands)
Quarters
Year
1
2
3
(a)
(b)
(c)
2.
4
2002
262
263
264
250
2003
255
256
253
240
2004
251
250
247
237
By means of a moving average, find the trend and seasonal adjustments.
Forecast the number of live births in XX and XY for the first two quarters of
2005.
Discuss briefly the accuracy of your forecasts.
The sales (X K1 000 000) of golf equipment by a large sport shop is shown for each
period for four years as follows:
Quarter
First
2002
-
2003
10
2004
25
2005
45
Second
-
35
55
67
Third
-
65
85
97
Fourth
29
25
45
-
143
3.
(a)
Using an additive model, find the centred moving average trend.
(b)
Find the average seasonal variation for each quarter.
(c)
Predict sales for the last quarter of 2005 and the first quarter of 2006 stating
any assumptions.
In the series below the trend is linear and is given by the equation
Yt = 5.42 + 1.49 X t
Where t indicates the quarter, taking the values 1, 2, 3, . . . 16
Year
(a)
(b)
(c)
4.
(a)
Quarterly Sales of ‘Musenge’
Sales are in K ‘000’ 000
3
1
2
4
1999
8.2
5.8
8.7
13.9
2000
15.5
12.2
13.5
20.0
2001
21.2
17.0
20.2
26.2
2002
25.4
23.5
25.3
32.6
Plot the data and a bend line a graph.
Using the graph (or otherwise) obtain the deviations from the trend line and
hence obtain estimates of the seasonal variations. Assume that the additive,
model Yt = Tt + St + Rt is appropriate where St and Rt are respectively the
seasonal and the residual components of the series.
Obtain a forecast of the sales for the quarters of 2003. State clearly the
assumptions underlying the procedure that you have used.
The number of prescriptions dispensed by chemists under the Health Board
in country X during the five-year period is shown in the table below.
Health Board Prescriptions (millions)
Quarter
Year
1
2
3
4
62
73
1
(i)
(ii)
(iii)
2
71
69
64
71
3
75
68
64
70
4
71
68
67
69
77
5
Plot the data on a graph.
Calculate and plot on the same graph using a suitable moving
average.
Using the additive model, obtain the average seasonal variation.
144
(b)
With which characteristics movement of a time series would you mainly
associate each of the following:
(i)
a boom in business
(ii)
an increase in employment during the harvest season
(iii)
a minor fire delaying production for a month
(iv)
decrease in the sales of black and white television sets.
.
145
CHAPTER 11
INDEX NUMBERS
Reading
Newbold Chapter
Plane and Oppermann Chapter 16
Tailoka Frank P Chapter 5
Introductory Comments
This Chapter looks at an index number which is useful in describing the way in which
the economic changes from period to period using prices, quantities etc.
A device constructed by statisticians which attempts to explain the magnitude of
economic changes overtime is called an index number.
An index number shows the rate of change of a variable from one specification to
another.
You will realize that the index of retail prices attempts to measure the change in the price
of a whole range of goods and services that we regularly buy. So you can see that it is
attempting to measure the cost of living – something that concerns us all. In times of
inflation, the retail price index is probably more important than at any other time in its
existence. *Developed countries.
-
increase in pay
pensions index-linked
*However, to do this we need to know what an index is, how it is calculated and what its
limitations are. The primary function of price index is to compare prices in one year with
those in some other years. Technically prices in a given year are to be compared with
prices in the base year which are taken as standard. Conventionally P1 refers to the price
in the given year and P0 refers to the price in the base year.
A Price Index: measures the change in the money value of a group of items overtime. If
only one item such as bread is being considered the comparison between years may be
made by the calculation of price relatives, i.e., the prices in the given year relative to the
base year.
P
Price relative =  1 .100
 P0 
146
e.g., if the price of a loaf was K1100 in 1999 and K1700 in 2000, the 2000 price relative
 1700 
to 1999 was 
 100 = 154.5 . The interpretation of price index is straight
 1100 
forward. The price index for 2000 is 154.5. this means that the 2000 price of a loaf of
bread is 154.5 percent of the 1999 (base year) price of a loaf of bread.
Four Main Considerations to be borne in Mind When Constructing an Index
Number
i)
ii)
iii)
iv)
The purpose of the Index Number.
Selection of items for inclusion.
Selection of appropriate weights.
Selection of a base year
If more than one item of commodity is to be considered to give an overall impression of
rising or falling prices it becomes necessary to combine the prices of these items into
some form of a weighted average or index number. The most commonly used form is
that calculated by the laspeyres formula.
I1 =
∑ Pq
∑Pq
1 0
.100
0 0
where I1 = index number for the given year
q0 = weight applied to each price calculated from the base year.
∑
= sum to be taken over all the items.
Po = price calculated from the base year.
Consider question one (Tailoka Frank P, Chapter 5). I1 is then the index number for
1991. , I 0 = 1990
p1
75
90
100
A
B
C
I1 =
q0
500
35
65
P0
45
50
55
P1q0
37500
3150
6500
47150
P0 q0
22500
1750
3575
27825
∑ Pq
∑Pq
1 0
0 0
I1 =
47150
.100 = 169.5
27825
147
It may now be stated that prices have risen by 69.5% overall from 1990 to 1991
based on the evidence of these three commodities.
This index is a reasonable measure of the change in prices over a short period of,
say, two years, but if the given year is a longer period in time from the base year,
the weights used tend to become out of date as spending habits change and no
longer give a realistic comparison between the two years. This disadvantage may
be overcome by using a given year weighted index as calculated by the Paasche
formula.
I1 =
∑ Pq
∑Pq
1 1
0 1
This index gives the change in the total value of the given year consumption from
the value it would have had in the base year.
p1
75
90
100
I1 =
q1
800
150
80
P1q1
60000
13500
8000
81500
P0
45
50
55
P0 q1
36000
7500
4400
47900
81500
.100 = 170
47900
From this calculation prices may be said to have risen 70% overall. However, this
formula is equally unrealistic in that it compares hypothetical past quantities with
current real quantities rather than vice versa. One suggested way out of the
dilemma is to calculate an average index number which is the geometric mean of
the Laspeyres and the Paasche index numbers.
I F = I L .I p × 100
2.
=
∑ Pq . ∑ Pq
∑Pq ∑Pq
1 0
1 1
0 0
0 1
.100
Changing the Base
The base of an index number series is changed by taking proportions as illustrated
below. Index A has 1971 as a base year and Index B has 1976 as a base year. To
convert Index A to Index B each index. A value was divided by 150. It can be
seen that the numbers for each year are in the same proportions for both Index A
and Index B.
Year
1971
1972
1973
1974
1975
1976
BASE CHANGE
Index A
100
110
120
130
140
150
Index B
66.7
73.3
80.0
86.7
93.3
100
148
3.
Chain Index Numbers
In a chain base index the base period progresses by one time period each time,
therefore each index number is interpreted relative to the previous period.
Chain index = Price/Quantity at time n
Price/Quantity at time n-1 × 100
Example: The table below shows the week ending share price on the stock
exchange over a period of four weeks for a local company’s shares:
Week
Price (K)
1
250
2
300
3
350
5
225
Calculate and interpret a chain base index using week 1 as the base.
Index
( wk1) = 100
Index
( wk 2) =
Pr ice
Pr ice
wk 2
300
× 100 =
× 100 = 120
wk1
250
Index
( wk 3) =
Pr ice
Pr ice
wk 3
350
× 100 =
× 100 = 116.67
wk 2
300
Index
( wk 4) =
(to 2 d . p.)
(to 2 d . p.)
225
× 100 = 64.29 (to 2 d . p.)
350
At the end of the second week the share price had increased by 20% from the end
of the first week. By the end of the third week the share price had increased again
but at a slower rate (16.67%) when compared with week 2. In week 4 the price
had dipped with a 35.71% decrease from week 3.
4.
Splicing Overlapping Series of Index Numbers
Suppose index A has a base of 1972 and that in 1974 it becomes necessary to alter
the weights used; thus producing a new index, B, based on 1974. However, it is
not very meaningful to have an index series covering only three years such as A,
but continuity would be maintained. If the new series B could be expressed in
terms of the series A.
The process is really one of taking proportions using a chain index and it is
illustrated using the data in Table 2.0.
149
Year
240
200
180
1972
1973
1974
1975
1976
A:
Series
Series
B:
Index B
∑ Pq68
Index A
∑ Pq66
I 72 =
240
.100 = 100
240
I 73 =
200
.100 = 83.3
240
I 74 =
180
.100 = 75
240
I 74 =
200
.100 = 100
200
I 75' =
180
.100 = 90
200
I 76' =
160
.100 = 80
200
'
200
180
160
The chain index numbers for series B are:
'
I 75
, 74 =
180
= 0 .9
200
'
I 76
, 75 =
160
= 0.89
180
One can expect the ratio 1975 to 1974 to be the same for both index A and index
B,
I 75 I '75
=
I 74 I '74
I 75 =
I 75
90
.I 74 =
.75 = 67.5
I 74
100
150
I '75
is the definition of the chain index I '75,74 and therefore the
I '74
formula for calculating I 75 may be rewritten as:
It can be seen that
I 75 = I 74 .I '75, 74
= 75 × 0.9 = 67.5
In general the next value in series A, I K +1 , may be obtained by multiplying the
previous value, I 'K +1, K .
I K +1
= I K .I 'K +1, K
The index series B came into being because the weights were changed in 1974. It
would of course be possible to change the weights every yea and using the chain
index technique relate that year back to the original base Series A. This is the
method used in calculating the index of retail prices.
5.
Deflating Prices and Incomes
Indicators of inflation are rising prices and incomes. The question sometimes
asked is: by how much has real income increased in, for example the past two
years? It may be answered by deflating the income figures by dividing by the
retail price index. Prices of individual commodities may be deflated in the same
manner, thus showing the increased in real price.
Table 3.0
Deflating Income
Year
1974
1976
Income
K2,610,000
K3,150,000
Price Index
100
157
Real Income
K2,610,000.00
K2,006,369.43
Example:
Suppose that the income column in table 3.0 shows the incomes from a sales
representative in 1974 and 1976, the base year of the index of retail prices has
been taken as 1974 and the value for 1976 is 157. Real income may be calculated
by dividing actual income by the price index.
1974 real income =
K 2,610,000
= K 2,610,000
1.00
1976 real income =
K 3,150,000
= K 2,006,369.43
1.57
151
It may be said that the salesman’s real income has decreased by K603,630.57 over
the two years.
Learning Objectives
After working through this Chapter you should be able to
•
Explain what an index number is.
•
Compute simple index number and interpret them
•
Calculate the Paasche, Laspyre’s and Fisher’s index number
•
Change index from one base to another.
Sample Examination Questions
1.
The following figures give the distribution of income percentages for an average
family:
Food
%
45
Fuel and light
15
Clothing
05
Rent
20
Other items
15
Average prices (K’000) for three successive years as follows:
Food
Fuel and Light
Clothing
Rent
Other Items
2003
180
40
95
50
65
2004
200
45
80
55
80
2005
215
42
95
60
80
(i)
Calculate a cost of living index for the years 2004 and 2005, taking 2003 as
a base year.
(ii)
Comment briefly on the problem of the choice of items and weights when
constructing an index number.
152
2.
(a)
What are the main considerations to be borne in mind when constructing an
index number?
b)
The following table shows the total weekly expenditure on four
commodities in July 2001 and July 2002 based on a representative sample of
1000 households:
Commodities Quantities Purchased (Kg) Total Expenditure (K)
July 2001:
Butter
5 500
2 500 000
Potatoes
10 500
600 000
Apples
4 000
800 000
9 500 000
meat
8 000
28 000
13 400 000
July 2002:
Butter
5 500
3 400 000
Potatoes
9 500
900 000
Apples
3 500
850 000
Meat
8 500
1 250 000
27 000
6 400 000
You are required to compute a paasche index showing the extent of the use
in prices of all four commodities.
(c)
3.
Explain briefly the major weakness of the paasche index in this case and
suggest an alternative.
The following figures give the distribution of income percentages for an average
family:
%
Food
25
Fuel and light
20
Clothing
25
Rent
10
Other items
20
Average prices for the successive years were as follows:
Food
Fuel & light
Clothing
Rent
Other Items
1999
180
35
100
45
65
2000
195
34
90
45
75
2001
210
30
95
50
75
(a)
Calculate a cost living index for the years 2000 and 2001, taking 1999 as a
base year.
(b)
Comment briefly on the problem of the choice of items and weights when
constructing an index number.
153
4.
(a)
Define what is meant by a ‘fixed base index number’ and a ‘chain based
index number’ and explain the different ways in which these alternatives
have to be interpreted.
(b)
From the following data, calculate:
i)
ii)
A Laspeyre price index for 2003.
A Paasche quantity index for 2003.
In each case using 2001 as the base year.
5.
Commodity
A
2001
Average price (K)
18 250
B
(a)
(b)
Quantity
155
2003
Average price (K)
1 8 750
Quantity
195
39 100
275
46 000
310
C
7 000
120
9 000
195
D
14 750
435
22 700
380
E
74 200
95
101 800
130
What are the main considerations to be borne in mind when constructing an
index number?
The following table shows the total weekly expenditure on four
commodities in July 1993 and July 2004, based on a representative sample
of 1000 households.
Commodities
July 1993
Butter
Quantities Purchased
(Kg)
4 500
Total Expenditure
K’000
1 680
Potatoes
9 500
510
Apples
3 000
600
7 000
24 000
7 200
9 990
July 2004
Butter
4 500
4 200
Potatoes
8 500
4 200
Apples
3 500
1 500
Meat
Meat
7 500
19 500
24 000
29 400
You are required to compute a laspeyre index showing the extent of the rise in
prices of all four commodities.
(c)
Explain briefly the major weakness of the laspeyre index in this case and suggest
an alternative.
154
CHAPTER 12
ASSIGNMENTS
Methods of organizing and presentation of data description measures.
1. The work required on two types of machine, X and Y, has been categorized of routine
maintenance, part replacement and specialist repair. Records kept for the past 12
months provide the following information:
Work Required
Routine Maintenance
Part Replacement
Specialist Repair
Frequency
Type X
11
5
4
Type Y
15
2
3
Present this information using:
(a)
(b)
2.
Pie Charts
Appropriate bar charts.
The average weekly household expenditure on a particular range of products has
been recorded from a sample of 20 households as follows:
K42,000
K45,500
K35,550
K48,150
K37,450
K51,000
K25,600
K55,600
K22,500
K65,600
K43,100
K79,600
K46,400
K39,450
K52,950
K29,000
K49,900
K39,500
K73,050
K41,550
Tabulate as a frequency distribution and construct suitable diagram.
3.
The number of new orders received by a company over the past 25 working days
were recorded as follows:
4
5
5
4
5
1
3
6
1
3
2
6
2
3
4
5
4
5
1
4
5
7
3
6
2
Determine the mean, median and mode.
4.
Determine the mean, median and mode from the following information given on
journey distance to work:
155
Kilometers
Under 1
Percentages
20
1 and under 3
26
3 and under 10
35
10 and under 15
9
15 and over
10
5.
Write a brief statement explaining the meaning of the standard deviation to
someone who knows nothing about statistics.
6.
In question two: Determine the range, quartile deviation and standard deviation.
7.
A survey of workers in a particular industrial sector produced the following table:
Income (Weekly)
Under
Under K500,000
175
K500,000 but under K750,000
240
K750,000 but under k1,000,000
230
K1,000,000 but under K2,000,000
160
Over K2,000,000
125
Estimate the standard deviation, range and quartile deviation.
Probability
Probability Distributions
156
(1)
What is the probability that two successively chosen random digits are the same?
(2)
A letter is chosen at random from the alphabet. What is the probability that it is:
(a)
(b)
(3)
Assuming that any arrangement of one or more letters forms a word, how many
words can be formed from:
(a)
(b)
(4)
(5)
A vowel?
A consonant?
CAT
STRANGE?
Two fair dice are thrown
(a)
If it is known that the total score was 7, what is the probability that the
difference between the scores on the two dice was 2?
(b)
If it known that the difference between the scores was 3, what is the
probability that the total score was 8?
Events A and B are such that
P ( A) =
(
)
5
7
1
, P A / B = , P( A ∩ B ) =
12
12
8
Find:
(a)
(b)
(c)
(d)
P (B )
P( A / B )
P ( B / A)
P( A ∪ B )
State whether events A and B are:
(6)
(a)
Mutually exclusive
(b)
Independent
Three machines A, B and C produce respectively, 40 percent, 10 percent and 50
percent of the items in a factory. The percent of defective items produced by the
machines is, respectively 1 percent, 3 percent and 4 percent. An item from the
factory is selected at random.
(a)
Find the probability that the item is defective.
157
(b)
(7)
If the item is defective, find the probability that the item was produced by:
(i)
Machine A
(ii)
Machine B
(iii)
Machine C
Find the mean µ , variance σ 2 , and standard deviation σ of each distribution.
(a)
x
f (x)
2
3
8
19
30
1
5
1
6
(b)
2
1
4
x
f (x)
3
1
2
8
1
4
(c)
x
f (x)
1
0.4
2
0.1
3
0.2
4
0.3
8.
A player tosses 2 fair coins. The player wins K12, 000 if 2 heads occur and
K4, 000 if 1 head occurs. For the game to be fair, how much should the player
lose if not heads occur?
9.
A fair coin is tossed 3 times. Find the probability that there will appear:
(a)
(b)
(c)
(d)
3 heads
Exactly 2 heads
Exactly 1 head
No heads
10.
The probability is 0.02 that an item produced by a factory is defective. A
shipment of 10, 000 items is sent to a warehouse. Find the expected number of
defective items and the standard deviation.
11.
One-fifth of all accounts are found to contain errors. In a batch of eight accounts
find the probability that the number of accounts containing errors is:
(a)
(b)
Less than 2
More than 2
158
Find the mean and standard deviation of the number of accounts containing
errors.
12.
An assembly line produces approximately 3% defective items. In a batch of 150
items, find the probability of obtaining:
(a)
(b)
13.
Serious accidents occur at random in a particular manufacturing industry at the
rate of 2.6 per week. Find the probability of less than 3 accidents occurring
during:
(a)
(b)
14.
A given week
A five-week period
Given a normal distribution with mean = 60 and standard deviation = 10. Find
the areas under the normal curve:
(a)
(c)
(e)
15.
Only two defective
Less than two defective
Over 72
Over 50
Between 52 and 82
(b)
(d)
Under 60
Between 60 and 75
Hourly wage rates for unskilled workers in a particular nationwide industry are
normally distributed with a mean of K10, 000 and a standard deviation of
K1, 750.
(a)
Find the probability that an employee selected at random will earn a basic
rate of between K9, 500 and K11, 000 per hour.
(b)
In a group of 500 unskilled employees how many would you expect to
earn more than K12, 500 per hour?
(c)
Approximately 20% earn less than the recommended minimum basic rate.
What is this minimum rate?
Sampling and sampling distribution
Estimation, Hypothesis testing
1.
The values of orders received by a company are normally distributed with a mean
of K17, 000, 000 and a standard deviation of K5, 750, 000 in a batch of 25 orders.
Find the probability that the average value is:
(a)
In excess of K20m
159
(b)
(c)
2.
Below K15
Between K17.5m and K18.5m
The average income tax allowance for employees in a company is K19.5m per
annum, with a standard deviation of K3.75m.
(a)
Find the probability that a group of 60 employees selected at random will
have an average income tax allowance of:
(i)
(ii)
(b)
Over K2m per annum; and
Between K14.4m and K19m
Find the 95% confidence limits for the average tax allowance in such a
group.
3.
In a random sample of 200 employees, 55% were found to be in favour of strike
action. Find the 95% confidence limits for the proportion of all employees in the
company who are in favour of such action.
4.
An assessment test is given to all prospective employees in a company. The test
scores are known to be normally distributed. A random sample of five
participants obtained the following results: 50, 65, 70, 72, 76
Test the assumption that the mean test score is 55 using the 5% significance level.
5.
A random sample of twelve blue-collar employees in large manufacturing plant
found the following figures for number of hours overtime worked in the last
month:
23, 15, 30, 15, 20, 35, 24, 12, 40, 30, 27, 32
Use an unbiased estimation procedure to find estimate for each of the following:
(a)
(b)
(c)
6.
The population mean
The population variance
The variance of the sample mean
Let x1 , x2 , and x3 be random sample from a population with mean µ and
variance σ . Consider the following two point estimators µ :
µ (1) =
x1 + 2 x2 + 3 x3 ( 2) x1 + 4 x2 + x3
,µ =
6
6
(a)
Show that both estimators are unbiased.
160
(b)
What estimator is more efficient?
(c)
Find the relative efficiency
(d)
Find an unbiased estimator of the population mean that is more
efficient than either of these estimators.
7.
Consider a normal population with a standard deviation = 20. A random sample
of 16 items is found to have a mean of 112. Test the assumption at the 5%
significance level that the population has a mean of 100.
8.
Define α and β for statistical test of hypotheses.
9.
A vice president in charge of sales for a large corporation claims that salesmen are
averaging no more than 15 sales contacts per week. (She would like to increase
this figure). As check on her claim, n = 36 salesmen are selected at random, and
the number of contacts is recorded for a single randomly selected week. The
sample reveals a mean of 17 contacts and a variance of 9. Does the evidence
contradict the vice president’s claim? Use α = 0.05 .
10.
A psychological study was conducted to compare the reaction times of men and
women to a certain stimulus. Independent random samples of 50 men and 50
women were employed in the experiment. The results are shown in Table 1.0.
Do the data present sufficient evidence to suggest a difference between time mean
reactions times for men and women? Use α = 0.05 .
Table 1.0
11.
Men
n1 = 50
Women
n2 = 50
y1 = 42 seconds
y2 = 38 seconds
S1 = 18
S2 = 14
Suppose that the vice president of question 9 wants to be able to detect a
difference equal to one call in the mean number of customer calls per week. That
is, she is interested in testing H o : µ = 16 . With the data as given in question 9
for this test, compute the p-value.
Analysis of variance
Time series
Index numbers
161
1.
Four groups of students were subjected to different teaching techniques and tested
at the end of a specified period of time. As a result of dropouts from the
experimental groups (due to sickness, transfer and so on), the number of students
varied from group to group. Do the data shown in Table 1.2 present sufficient
evidence to indicate a difference in the mean achievement for the four teaching
techniques?
Table 1.2
1
60
80
82
84
90
2.
Data for question 1
2
62
50
65
85
63
45
3
58
54
80
62
72
4
95
80
85
72
In an investigation of college major and initial salary at first job, a random sample
of two students from business, engineering, history, journalism and pharmacy was
taken. The initial monthly salary for those in the sample was: the salaries are in
millions of Kwacha.
Business
Engineering
History
Journalism
Pharmacy
3.6
4.0
3.2
3.6
3.6
4.2
4.8
2.8
3.6
4.0
4.2
4.4
2.4
4.8
4.8
Take the hypothesis that the average initial salary is the same for all majors and
report your results. Use α = 0.05 .
3.
Articles manufactured by a company are produced by three operators using three
different machines. The manufacturer wishes to determine whether there is a
difference.
(a)
Between the operators and (b) between the machines. An experiment is
performed to determine the number of articles per day produced by each
operator using each machine, the results are shown in Table 1.3. Provide
the desired information, using a significance level of 0.05.
Table 1.3
Machine A
Machine B
Machine C
OPERATOR
1
25
36
30
2
29
32
27
3
26
30
29
162
4.
The manager of a cycle and small motorcycle business examining his sales of
mopeds over the past three years finds the data to be as follows:
No. of Machines Sold
1st Quarter
2nd Quarter
10
12
17
11
22
15
24
1997
1998
1999
2000
5.
Use the data to make a reasonable estimate and the number of machines,
which should be ordered for the first half of 2002.
(b)
What factors do you think are likely to affect the trend and seasonal
variation in sales of mopeds?
The sales (xK1 000 000) of golf equipment by a department store is shown for
each period of three months as follows:
(a)
(b)
(c)
1996
25
1999
9
31
61
21
2000
22
52
82
42
2001
43
65
95
-
Using the additive model, find the centered moving average trend.
Find the average seasonal variation for each quarter
Predict sales for the last quarter of 2001 and the first quarter of 2002,
stating any assumptions.
By using exponential smoothing with α = 0.025 , smooth our weekly turnover
figures for a company tabulated below:
Week
Turnover
(x K1 000)
7.
4th Quarter
8
10
17
-
(a)
Quarter
First
Second
Third
Fourth
6.
3rd Quarter
11
20
27
-
1
27
2
33
3
30
4
24
5
30
6
35
7
40
8
30
9
38
10
45
40
38
44
52
Illustrate weekly turnover together with the smoothed values comment on whether
these smoothed value give a reliable indication of the tread.
For the following data, calculate:
(i)
A laspeyre price index for 2002
(ii)
A purchase quantity index for 2002 in each case using 2000 as the
base year.
2000
2001
Commodity
A
B
C
D
E
Average
Price (K)
K18, 250
K39, 100
K7, 000
K14, 750
K74, 200
Quantity
Average
Quantity
155
273
114
430
90
K18, 750
K46, 000
K9, 000
K22, 700
K101, 800
195
300
190
360
130
163
8.
Splice the old and new indexes, so that 1999 = 100
Year
1999
2000
2001
9.
Old index
231
215
-
In 2000, a particular index series was reset to 100; the following table shows the
values of the old index and the new index:
Year
1996
1997
1998
1999
2000
2001
(a)
(b)
10.
New index
100
108
Old year
100
95
101
110
115
New index
100
110
What is the value of the old index in 2001?
What would have been the value of the new index 1997?
The CP: for wage and clerical workers (base year 2000 = 100) are 205.3. What
was the purchasing power of a Kwacha in terms of 2000 prices?
164
Download