Uploaded by Jacob Ayubu Churi

Introduction to Statistics

advertisement
I
VOLUME: II
N
CHAPTER 9: WHAT IS STATISTICS?
T
9.1 Introduction
R
In the ordinary English language, the term statistics is often referred to as an aggregate of items with numerical
quantification. For example, one might say “Trade statistics referring to a mass of import and export figures ”. Although this
conception is not far from the truth, still it does not give us the real meaning of the term statistics as far as scope of its study
is concerned.
O
D
U
C
T
O
R
Y
Statistics is a science of collecting data, summarising, analysing and presenting the information collected. In general,
statistics is a field of study concerned with mathematical characterisation of a given aggregates of items.
Statistics as a science is essentially a branch of Applied Mathematics just as we have other branches in Mathematics such as
Mechanics or Calculus. Statistical Applications find their way into almost every field of study such as business, economics,
sociology, agriculture, education and other related fields. Statistics when applied to biological field is termed as Biostatistics.
The subject matter of statistics can roughly be divided into two categories: 1.Descriptive Statistics
2.Inferential Statistics
Descriptive Statistics is primarily concerned with organisation and presentation of collected data. On the other hand
Inferential Statistics, as opposed to descriptive statistics deals with decision making/judgements based on the collected data
using advanced mathematical methods. In the midst of the two categories we have Mathematical or Theoretical statistics,
which can be thought as an Engine in the study of the two fields.
9.2 Data Collection
In any scientifical research, data element is an important part of the exercise. Scientifical inquiries and findings normally start
by observing some facts (collected data/information) from which certain conclusions may be drawn. After the information
has been collected, it is then processed and analysed. Data which has just been collected and not yet organised in any form is
referred to as raw data, while if at all the collected data has been altered in one way or another is called an organised
data/information
In the first place, data can be classified in two groups. Quantitative and qualitative type of data. Data on, say, height, students
score, number of plants are examples of quantitative data. Data on things such as beauty, goodness, are examples of
qualitative type of data. Statistics deals with both types of these categories.
S
T
Further more, the quantitative data can be viewed in two categories by the virtue of the values they take. Data that take only
exact values, such as 5,6, 8, are known as discrete data. For example, consider the number of tomatoes on each plant in a
green house. Data, which cannot take exact values but can be given only within a certain range or measured to a certain
degree of accuracy, is called continuous data. For example, consider the heights or weights of say school children.
A
T
I
S
1
9.2.1 Population
The whole purpose of collecting data is to have information on the items of interest in a particular study. The set of all items
under discussion in a particular survey or experiment is called a population of the study. e.g. consider a survey on the
performance of all horticultural produces in Morogoro. All the items belonging to the set of horticultural produces may be
considered as the population under study.
9.2.2 Sample
A sample is a set of few items taken from a population, i.e. a subset of the population. Consider say a set of 10 scores by 10
students out of a class of 100 students. In this particular case, the scores by ten students can be thought as a sample of
student’s marks out of that class.
9.2.3 Why should one take a sample?
It is in the interest of every surveyor to have as much information as possible on every item of a population. But this is
practically not possible due to the following reasons:(i)
(ii)
(iii)
It is costly to study the whole population
A sample shortens time for carrying out a survey
The information in a sample can be analysed with greater accuracy
9.3Types of Sampling Methods
A sample needs to be a representative of the population. This is a general and chief characteristic of a sample. There are
different types of samples depending on the nature of the population, and in the interest of the surveyor /researcher. However,
we can generally have two groups of samples basing on the sampling methods, normally random sampling and non-random
sampling. A sample, which is selected in such a way that every member in the population has an equal chance of being
selected, is called a random sample. Some types of random samples include: -
9.3.1 Simple Random Sampling
When items are selected at random, each member of the population has an equal chance of being selected. Each member of
the sampling frame is allocate3d a number and the sample is selected using random numbers obtained either from random
number tables or generated by a computer or calculator. This can be done with or without replacement. Suppose we draw a
number from a hat. We have a choice of replacing the number in the hat before drawing again, or of not replacing it.
Sampling where each item may be chosen more than once is called sampling with replacement.
Sampling where each item may not be chosen more than once is called sampling without replacement
2
9.3.2 Systematic Sampling
Random sampling from a very large population is cumbersome. An alternative way is to list the population elements in some
order and then choose every kth member from the list, after obtaining a random starting point.
Example
Describe how to choose a systematic sample of 30 from a population of 100 items
Sol: k = 100/30= 3.3 so every time we select an item we need to move 3.3 places along the list. A random start between 1 and
3 inclusive is chosen. Let this be 2. So ,we would select the 2nd item
Then 2+3.3= 5.3 → 5th item
5.3 +3.3=8.6→9th item
9.3.3 Stratified Sampling
Stratified sampling is a kind of sample taken from a population with different strata (layers), where separate random samples
from each stratum are combined to form the sample. The allocation of units from the different stratas to the sample can be
approached in two ways, usually through proportional allocation and Neyman allocation. In the first method, units from
different strata are allocated according to the stratum share out of the total population. For example if a sample of 100
students is to be taken from a colleague with say 500 1st year, 300 2nd year and 200 3rd year, the units to be taken from each
stratum will accordingly be, (5/10) x100, (3/10) x300 and (2/10) x100. As for the second method, cost and accuracy rather
than proportionality is considered.
9.4 Methods of Data Capture
After you have determined the population elements and the type of sample you are going to take, the next important step is to
think about how you can capture the data (information).
9.4.1 Interview
If the information sought is to be obtained from persons, oral interview can be done. One of the chief advantages of this
method is that a surveyor has an opportunity to scrutiny the information given and question back the respondent. Hence,
under normal circumstances the information collected through this method may generally be regarded to be correct. However
the method is cost full in terms of time and money, as it requires a physical contact of the interviewer and the respondent.
9.4.2 Mailing
Sometimes, it is impossible to visit every person, and instead questionnaires can be sent by mail. The chief disadvantage in
this method is low response rate and high risky of having less correct information given.
3
9.4.3 Measurements/Experiments
It sometimes happen that the information sought cannot be obtained from an individual person unless measured by the
surveyor himself. This method is particularly significant to researchers in the natural sciences. For example, if one is
interested to find out the effects of the yield of oranges trees after being treated by a certain fertilizer. This information
cannot easily be obtained by interviewing the illiterate orange growers in the rural area. Alternatively, measurements should
be taken .The method tends to give you the most accurate information if carried out correctly. The great disadvantage in the
method is the cost in terms of money, trained personnel and time. In statistics, measurements are often involved in a special
branch of study called “ design of Experiments”
9.4.4 Available literatures/Secondary data sources
The information so seeked might be already available in the documentation. So one may simply deep from a literature instead
of capturing data afresh. The method saves both time and financial resources. However it has a great disadvantage that the
information collected may not exactly fit the study area of a researcher as it may be out dated or incomplete
Exercise: 9
1. You are a surveyor at Sokoine University of Agriculture investigating reasons for student’s mass failure.
With concrete reasons, suggest an appropriate kind of sampling methods.
2. Discuss the merit of measurements method against the interviewing method in data capturing.
3. With an example, explain the role of Mathematics in Statistics
4. With typical examples, explain the relevance of statistics in our daily life
5. The judges intend to find the most beautiful girl in a beauty contest. Explain how may the exercise be carried
out? What kind of data is going to be used in this exrcise?
6. Suppose you want to compare the efficiency of nitrogen fertilizer on maize yields at the Tanzanian rural
areas. Explain how you are going to carry the exercise. What kinds of sampling methods will you employ
and how is the data going to be taped?
4
CHAPTER: 10 BASIC MEASURES OF CENTRAL TENDENCY AND VARIABILITY
10.1 Introduction
The need for the average in statistics comes from the very meaning of the term statistics. After data has been collected, we
need to summarise the information in such a way that one might easily see what is depicted by the data. Therefore, as a
matter of necessity we need a single value to represent the entire picture, and this is what we refer to it as an “average”. But
how should an average be? What is its chief characteristic? An answer to these questions should serve to be a corner stone in
all the properties constituting an average. Obviously, an average in any field of study must be as representative as possible of
the entire observation. Think of a situation where you have a student’s scores in three subjects as 52, 60 and 78.Then a
question is raised as to what is the average performance of the student? Certainly by practice the average performance would
be =[54+60+78]/3=64. But, why 64? This is a very important question in the understating of a statistical average. The
average is 64 because there is no any other value besides 64, which is numerically more close to all the observations
taken at a time. This kind of an average in statistics is known as an “arithmetic mean”.
There is always a tendency for most students when first introduced to the notion of an average to think that always an
average means an arithmetic mean-this is quite wrong! The nature and form of an average will very much depend on the
nature and form of the data concerned. Think of a situation where you have a beauty contest involving 5 girls and a group of
judges rank the girls in terms of beauty so that we have the following order in beauty
1. Elizabeth
2.Asha 3.Nyanzobe
4.Nyange
5.Tabitha
Then, someone who never attended the contest but who fortunately is familiar with Nyanzobe ask you a question, how
beautiful were the girls compared to other beauty contests? In whichever thinking this question asks for an average beauty
among the five girls as perceived by the judges. In response to this question, a knowledgeable statistician would say the girls
were more or less like Nyanzobe in terms of beauty. This is so because she (Nyanzobe) is in the midst of the Judge’s ranks.
So, even though we do not have quantitative data we have been able to establish a statistical average in the light of position
consideration. This kind of an average in statistics is often known as a median, and it is applicable for both quantitative and
qualitative type of data.
From this discussion, it would seem appropriate to establish the qualities of a good statistical average bearing in mind that an
average is basically a representative of the entire attributes of a population. In general an average in statistics should at least
posses the following characteristics.
1. It should be based on all observations made.
2. The average should rigidly be defined and not left to the mere estimation of the
Observer.
3. It should posses some simple and obvious properties and not mathematical abstractions.
4. It should be calculated with reasonable ease and rapidity.
5. It should be as stable as possible.
6. It should lend itself readily to algebraic treatments.
5
Below are the commonly known and daily applicable statistical averages.
10.2 commonly known measures of central tendency
10.2.1 The Arithmetic Mean
The arithmetic mean
=
1
N
of a set of variables x1 x2, x3,
………. xN
is defined by the formulae
N
 xi
i 1
Example
Find the mean yield of the five plots of maize if the yield per plot were, 2, 4, 5, 1.5, and 2.5 sacks of maize.
Sol:
=
1
N
N
 xi = =
i 1
1 5
 xi =1/5[2+4+5+1.5+2.5]=15/5=3
5 i 1
10.2.1.1 The Mathematical Properties of an Arithmetic Mean
1.
The sum of the deviations of a set of variables from its arithmetic mean is zero
Consider the following observations on the number of patients who died of cancer intestine in 5 years period
Table10.1
Year
Deaths
1
22
The arithmetic mean is given as x =
1
N
2
26
N
3
40
 xi =1/5
i 1
4
48
5
30
1 5
 xi = [22+26+40+42+30]/5
5 i 1
=160/5=32
Table 10.2
XI
X I- x
-10
-6
8
10
-2
0
22
26
40
42
30
N

(xi - x )
i 1
From table 10.2 above, we can clearly see that the sum of the deviations from the mean is zero.
6
Exercise 10.2.1(a)
Show that algebraically that
1
N
N

i 1
di =
1
N
N

(xi - x ) = 0
i 1
If di =Xi - C where C is a constant then x = C + d
2.
Consider the previous example. Let c = 30
Table10.3
Xi
22
26
40
42
30
 di
di = xi-C
-8
-4
10
12
0
10
Therefore d = 10/5 =2
x = C + d =30+2=32
Exercise10.2.1(b)
If di =Xi - C where C is a constant show algebraically that x = C + d
10.2.2 The Weighted Arithmetic Mean
The simple arithmetic mean assumes that all the observations have the same weight. i.e. the same contribution to the total
distribution which is not often the case in some situations. Suppose a student receives grades 70, 60, 75, 84, and 65 in
courses carrying credit hours of 1, 3, 2, 2 and 4 hours respectively. His average grade cannot simply taken to be
[70+60+75+84+65]/5. The hours should be taken into consideration and the average is found as follows: -
x = [(70x1) +(60x3) +(75x2) +(84x3) +(65x4)]/(1+3+2+2+4)=76
Therefore the weighted arithmetic mean may be defined as
n
w
i 1
x=
n

i
Where w i. is the weight assigned to x i.
I 1
7
10.2.3 The Geometric Mean
In some kind of data we have values occurring in a sequence of geometric progression
and it has been found that the proper average for this kind of data is the geometric mean
given by G.M = n x1 x 2 x3  x n  . For example, the geometric mean for 2,4, 8 is
3
2 4  8  4
10.2.4 The Harmonic Mean
The harmonic mean (H) of a given set of observation is defined as the reciprocal of
the arithmetic mean of the reciprocals of the observations. Given a set of variables
x1 x2, x3, ………. Xn.The harmonic H is given as
1
1
1


x
x2
xn
1
n

n
1
1
1


x1
x2
xn
10.2.5 The Median
The median is a middle value in a set of observations arranged in the order of magnitude of values. Given x 1, x2 … xN such
that x1,  x2  x3…  xN , then the median of these observations is M = x(N+1)/2 if N is odd and M=
1
(XN/2 + X(N+2)/2 ) if N
2
is even.
Example: Find the median in the following set of observations
(a)
30, 35, 7, 6, 20, 40, 15
(b)
13, 7, 8, 11, 5, 4
Sol:
(a)
First arrange the data in ascending order as follows
6, 7, 15, 20, 30, 35, 40.
Since N =7 is odd then M = x(N+1)/2 = x(7+1)/2 = x4 = 20.
(b)
Again arrange the data in order so that you have the following array
4,5,7,8,11,13. Since N = 6 is even then M=
1
1
(XN/2 + X (N+2)/2 ) = (X6/2 + X(6+2)/2 ) =
2
2
1
1
(X3 + X4) = (7 + 8) =7.5
2
2
10.2.6 Mode
In a given set of observations a value that occurs frequently than any other is termed as the mode of the distribution.
Example
Find the mode in the following records of a peasant’s sales of horticultural produces in a 7 days period
8
Table 10.4
Days
Sales in Tshs
1
1600
2
500
3
700
4
500
5
800
6
900
7
500
Sol: 500 occur 3 times while the other values occur only once. So the mode is 500.Tshs
10.3.Measures of Dispersation/Variability
10.3.1 Introduction
The measures of dispersion come into existence as a necessity of complementing the study on measures of central tendency.
If we refer back to the discussion in section 10.1.0 on statistical average where we had a student scoring 54, 60 and 78 in
three different subjects we can clearly see the need of having a study on measures of dispersion. We had an average score of
64 per subject. But as a matter of fact it is not true that the score was 64 in every subject! The subject deviations from the
average were +6 for the first, +4 for the second and –14 for the third. With such deviations one may be a bit pessimistic to
say that the student’s score in any of the three subject is likely to be 64.To be safe, he would like first to set aside a certain
amount of variation from the score of 64 before making such a general statement. Since the variation from the average is
different from one score to another, then such a precaution would bring a need of looking for an average deviation in absolute
terms of the subjects from the mean and this is =[6+4+14]/3=8. Such kind of a measure of dispersion is called Mean
deviation. With this value, someone would be safer to conclude that the student score in a particular subject is likely to fall
between -8+64 and 64+8 i.e. 56 and 72 Below are some commonly known measures of dispersion.
10.3.2 Range
Is the difference between the highest and lowest values in the observations? Consider the observations on the sales by the
peasant in the previous example. Range= 100-50 =50
10.3.3 Mean deviation
As introduced before the mean deviation (M.D) of a set of N values, x1, x2 … xN is defined as
M.D
=
1
N
N

│xi
- x │.In
the
example
in
section
i 1
3.2.6
x=
1
N
N

i 1
xi
=
1
7
7
x
=
i 1
[500+1000+500+700+500+800+900]/7 =4900/7=700
M.D = [│500-700│+│100-700│+│500-700│+│700-700│+│500-700│+│800-700│+│900-700│]/7=[│-200│+│300│200│+│0│+│-200│+│100│+│200│]/7
=[200+300+200+0+200+100+200]/7 =1200/7=171.42
10.3.4 The Standard Deviation
The mean deviation which was supposed to be a standard measure of dispersion has one major weakness that “, it is not
easily amendable for further algebraic treatments”. Thus, another measure was introduced to cater for such a weakness and
this is the standard deviation (S.D) defined as a non-negative square root of the variance (v). This measure of dispersion is
the most frequently used in statistics than any other measure of dispersion.
9
v where v =
Normally S.D =
1
N
N

(xi - x ) 2
i 1
Example Find the standard deviation of the following observations, 45, 32, 38, 48, 60, 75.
Sol:
Table 10.5
xI
45
32
38
48
60
75
298
x=
v =
N
1
N

1
N
N
xi =
i 1

(xi - x )2
21.8089
312.2289
136.1889
2.7889
106.7089
641.6089
1221.1334
xi - x
-4.67
-17.67
-11.67
-1.67
10.33
25.33
1
6
6

xi = 298/6 = 49.67
i 1
(xi - x )2 =
i 1
1
(1221.1334) = 203.5555663
6
Therefore S.D = v =
203.5555663 = 14.267 ≈ 14.27 ( 2 decimals)
However, this method is relatively tedious and less accurate. Another formulae for the variance is given as
V=
1
N
x
2
i
- ( x )2
The formula is in essence the same as the previous one. Indeed, we can show algebraically that this is deduced from
the other formulae.
Exercise 10.3.4: Show that
V =
N
1
N

(xi - x )2 =
i 1
1
N
x
2
i
- ( x )2
Using this new formula the standard deviation could have easily been calculated as follows: -
V=
1
6
xi
x i2
45
32
38
48
60
75
2025
1024
1444
2304
3600
5625
16022
6
x
i 1
2
i
- ( x )2 =
1
(16022) – (49.67) 2 =14.26 ( 2 decimals)
6
10
Table 10.6
10.3.5 The Quartile Deviation (semi- inter quartile range)
A quartile refers to one fourth of the entire observations. If the values in the observations are arranged in an ascending order
and divided in four equal parts in terms of number of observation then we have accordingly, the 1 st, the 2nd and the 3rd
quartiles. Infact the second quartile is the median of the observation. For example, consider the following set of observations,
5,6,9,11,12,15,18,19,20,22,23,23.6,25,26,26.7. The 1st quartile ( Q1 ) will be 11, the second ( Q 2 ) is 19, and the third ( Q3 ) is
23.6. Alongside quartiles we have also deciles which refer to one tenth, percentiles referring to one of a hundredth and so on.
Again, the fifth deciles and the fifty percentile would be equal to the median of the distribution.
The Quartile deviation or a semi-interquatile range is defined as one half of the difference between the 3rd and the 1st
quartiles. It is given by Q= Q3  Q1  /2. In the example given Q=(23.6-11)/2 =6.3
11
CHAPTER 11: FREQUENCY DISTRIBUTIONS
11.1 Introduction
The collected data in most cases come to us as large samples and in form unsuitable for immediate interpretation. In such
cases, it is always important to group the data into appropriate number of classes before their general characteristics can be
detected and measured. Lets for instance consider the raw data relating the weights (in Kg) of 50 animals measured after
being fed with special animal feed.
50, 60, 71, 85, 83, 67, 68, 62, 63, 95, 45, 74, 95, 68, 75, 90, 91, 69, 71, 84,
82, 72, 76, 83, 63, 74, 88, 62, 65, 45, 58, 61, 60, 79, 80, 88, 93, 59. 60, 83,
78,62, 88, 57, 53, 67, 77, 74, 75, 75,
It is difficult to trace some of the most general characteristics of the given data unless it is grouped. This data can
conveniently be grouped and shown in a tabular form through the following procedures: 1.
Find the range.
In this case, range= 95-45=50
2.
Decide on the number of classes to be formed. In practice there is no rule for the number of classes to be
formed. However, we prefer a range of between 10 to 20 classes. Suppose we want to form 11 number of
classes, then the class size C= 50/11 = 4.54  5
3. Consider the smallest value, which is 45, so that you can start with 45-49, 50-54 and so on.
Table 11.1
Classes
Frequencies
45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99 Total
2
2
3
9
6
6
7
6
4
3
2
50
N:B



When data have been organised in such a way that they may be described in terms of class frequencies the
is called a frequency distribution.
The width of the class is called a class interval.
Associated with every class, are the class boundaries which can be found as follows;
result
Table 11.2
45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99 Total
Classes
44.5- 49.5- 54.5- 59.5- 64.5- 69.5- 74.5- 79.5- 84.5- 89.5- 94.5Class
49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5
boundaries
2
2
3
9
6
6
7
6
4
3
2
Frequencies
50
Remember that we determined the measures of central tendency and those of dispersions for ungrouped data. We can as
well do the same for the grouped data. The formulas for those measures remain essentially the same, with few alterations
due the presence of frequencies. The midpoints from each of the classes will act as the values in the ungrouped data being
considered in terms of the number of times they occur.
12
11.2 Measures of central tendencies and variability in a grouped data
We shall consider only the Arithmetic mean, Standard Deviation, Mode and the Median. The reader is advised to find out for
himself about the Geometric mean, Harmonic Mean, Mean Deviation, the Range, and the Quartile deviation. Some of these
will be treated in exercise.11.
11.2.1 The Arithmetic Mean x
N
The arithmetic mean x in grouped data is given as, x =

N

xifi
i 1
fi Where fi is the frequency of the ith group.
i 1
11.2.2 The Standard Deviation
N
S.D = v Where V=

N
(xi - x )2fi
i 1

i 1
N
fi =

N
x2ifi
i 1

fi
¯
( x )2
i 1
th
Where fi is the frequency of the i group.
11.2.3 Mode
X =L +  1 (h)
1 +2
Where L is the lower limit of the class containing the mode
is the excess of the modal frequency over the preceding class
1
is the excess of the modal frequency over the following class
2
11.2.4 Median
Median =L +
Where
f
h
C
(N/2 -C) h
f
l is the lower limit of the median class
is the frequency of the median class
is the width of the median class
Cumulative frequency up to the class preceding the median class
Let us now consider the above measures in the example concerning the animal weights distribution given in section 4.0.0
13
Table 11.3
Frequencies
( fi )
Midpoint
( xi )
xi f i
xi2 f i
45-49
50-54
55-59
60-64
65-69
70-74
75-79
80-84
85-89
90-94
95-99
Total
2
2
3
9
6
6
7
6
4
3
2
50
47
52
57
62
67
72
77
82
87
92
97
792
94
104
171
558
402
432
539
492
348
276
194
3610
4418
5408
9747
34596
26934
31104
41503
40344
30276
25392
18818
268540
1.
i
x
Classes
The mean
N
x =

N

xifi
i 1
2.
= 3610 50  72.2
fi
i 1
The standard deviation
N
S.D = v Where V=

N

x2ifi
i 1
Therefore S.D =
3.
¯
i 1
157.96  12.57
Mode
x̂ =L +
4.
( x ) 2 = 268540 50  72.2  157.96
2
fi
 1 (h)
 1 +  2 = 59.5
+
(6) x 5
6 +3
=
62.8
Median
Median =L +
(N/2 -C) h
f
=
69.5 + (50/2 -22) 5
6 = 72
11.3 Moments:
Some of the measures defined in the preceding section fall into a rather broad category of statistics known as moments .The
r moment about a of a given distribution is defined  r 
th
1
N
n
 x
i 1
n
 a  f i where N=  f i . Of particular interest in this
r
i
i 1
study are the moments about the mean and also about the origin. The implication from the given formula is that the
arithmetic mean is the 1st moment about the origin while the variance is the second moment about the mean. In general
the rth moment about the origin would be given as  r 
1
N
n
 x 
i 1
r 
14
i
r
f i whereas the rth moment about the mean would be
1
N
 x
n
i 1

r
i
 x f i . Moments have wide applications in the study of probability distributions as they serve to define the
distinguishing characteristics of one distribution from the other
11.3.1 The relationship between moments about the mean and moments about any point “a”
Consider the rth moment about the mean, which is
r 
By
1
N
 x
n
i 1
letting
i
x


r
=
d= a  x

1
N
 ( x
n
i 1
and
i
 a )  (a  x)
 r 
1
N
n
 x
i 1

r
 a  , using the binomial expansion we have the following
r
i

1 r
result;  r 
C0 d 0 ( x  a) r  r C1 d 1 ( x  a) r 1  ... r C r d r ( x  a) 0
N
=  r  r C1d 1  r1  r C2 d 2  r2 r Cr d r .

As an example let us compare the 2nd moment about the mean (which is the variance) and the moments about the origin i.e.
about a=0.We have  2   2  2C1d 1  2 1  2C2 d 2  2 2 =
1
N
x
2
i
f i  2x
1
N
x
2
i
fi  x 0 =
1
N
x
2
i
2
f i  x = Var (x)
Exercise: 11
1.Express the moments about the origin in terms of the moments about the mean
2.Show that the G.M in a set of grouped data with midpoints as x1 x2, x3,
,satisfies the following equation
log(G.M ) 
……
Xn
and frequencies f1 , f2, f3,
…fn
1
 f1 log x1  f 2 log x2   f n log xn 
n
3.Find the Harmonic Mean of the following, 30,60,90
4.Establish a formulae for an Harmonic Mean of a grouped data with midpoints as x1 x2, x3,
f1 , f2, f3, …fn .
… Xn
and frequencies
5. The mean and variance of 40 students grouped in class-intervals of 10 marks are 40
and 49 respectively. It was later observed that two observations belonging to the class-interval 21-30 were included
in the class-interval 31-40 by mistake. Find the new mean and standard deviation
15
CHAPTER 12: DATA PRESENTATION
12.1 Introduction
It is always important to present the collected information in such a way that even one who is not familiar with Statistics can
easily grasp the information. One of the most common ways of data presentation is though a diagram. Diagrams should aid
the reader by saving his time as well as easily identifying some of the salient features manifested in the data. There are
different forms of diagrammatic presentation. Some of them are outlined below:12.2 Pie-charts
This involves presentation of data coming from a population of different categories, which are non-overlapping. The main
reason for pie charts is to show the sharing composition of the population categories. For example, consider the following
composition in the results of the 1st year B.Sc.ESM in the 2001 University Examination at Sokoine University of Agriculture.
S/N
1
2
3
4
Category
Passing
Repeating
Disco
Total
Table 12.1
Number
47
8
17
72
Percentage
65%
11%
24%
100%
University Examination results for BSC.ESM
students in the year 2001
Passing
24%
Repeating
Disco
11%
65%
Source:
Figure12.1 University examinations
results for Bsc.Esm students in the
year 2001
From Sokoine University Examination Statistics for the year 2001
12.3 Bar graphs
Another way of presenting data is though the use of bar graphs, which may roughly be divided into two types, one for noncontinuous data and the other for continuous data.
16
12.3.1 Bar graphs for discrete data
Consider again the Sokoine University Examination results for five 1st year B.Sc. degree programmes for the year 2001.
Table 12.2
S/N Degree Programme
Failure (%)
1
B.Sc Animal Science
43%
2
B.Sc ESM
35%
3
B.Sc Agric. General
31%
4
B.Sc Horticulture
29%
5
B.Sc Education and Extension
23%
Failure rate (%)
Failure rate(%) in five 1st year Bsc.degree program m es in 2001
50
45
40
35
30
25
20
15
10
5
0
Animal Science
ESM
Agric General
Horticulture
Education and
Extension
Program m es
Figure 12.2: Failure rate in five 1st year B.Sc degree programmes in year 2001
Source: From Sokoine University Examination Statistics for the year 2001
12.3.2 Histogram
This kind of a bar graph is used to represent continuous data such as heights, weights, and temperatures e.t.c. As opposed to
the other kinds of bar graphs, in a histogram, bars are normally joined together. As an example let us draw the histogram for
animal weight distribution presented in section 11.1
17
Number of animals
Histogram for Animal weight Distribution
10
9
8
7
6
5
4
3
2
1
0
47
52
57
62
67
72
77
82
87
92
97
Weight(kg)
Figure 12.3: Histogram
for animal weight Distribution
12.4 Line Graphs
12.4.1Frequency polygons
If a line is used not with a free hand to join the points connecting the midpoints and the frequencies we have what we call a
frequency polygon. An example is given below;
Frequency Polygon for animal weight Distribution
Number of animals
10
8
6
4
2
0
47
52
57
62
67
72
77
82
87
92
Weight in Kg
Figure 12.4: Frequency polygon for Animal weight distribution
18
97
12.4.2 Ogives:
Ogives refer to less than or greater than cumulative frequencies curves. Below are the cumulative frequencies of the animal
weight distribution given in section 11.1, together with their corresponding graphs.
Class boundaries
Less than Cumulative
Frequencies
Greater than Cumulative
Frequencies
Table 12.3
44.5 49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5
0
2
4
7
16 22 28 35 41 45 48 50
50
48
46
43
34
28
22
15
9
5
Less than and Greater than ogives
Cumulative frequencies
60
50
40
Greaterthan
Less than
30
20
10
44
.5
49
.5
54
.5
59
.5
64
.5
69
.5
74
.5
79
.5
84
.5
89
.5
94
.5
99
.5
0
Anim al w eight in Kgs
Figure 12.5: Less than and greater than ogives for animal weight distribution
19
2
0
12.4.3Frequency curve:
If the points in figure 12.4 are joined with a free hand we have what is called frequency curve
Frequency curve for animal weight distribution
number of animals
10
8
6
4
2
0
0
20
40
60
80
100
120
Weight in Kgs
Figure 12.6: Frequency Curve for animal weight distribution
As you can see, in the figure above, the curve tends to depict some picture with regards to the peakdness and symmetry of the
distribution with reference to the centre (the mean). These features, when careful examined they give rise to two important
concepts of a distribution, known as “ Skewness and Kurtosis”. The two concepts are discussed below.
12.5 Skewness
Skewness refers to the degree of asymmetry or departure from the mean of the given distribution. The mean is the overall
representative of the entire population, in which case, under normal circumstances we would expect each of the observation
to be not far from the mean. But this is usually not the case for some distributions. This situation necessitates us to study the
degree of asymmetry of a given distribution. We roughly have three types of distributions as far as skewness is concerned.
If the frequency curve of a given distribution has a longer tail to the right we say the distribution is positively skewed. In
other words, most of the observations are numerically larger than the mean. As a matter of fact in this kind of a distribution,
the mean is always larger than the median and the mode of the distribution. On the other hand, if the frequency curve of a
given distribution is more elongated to the left, the distribution is said to be negatively skewed, and as such the mean is
always smaller than the median and the mode of the distribution. But what happens when the distribution is neither skewed to
the left nor to the right of the distribution? In this case, we have what is known as zero skewness implying that the mean,
median, and the mode of the distribution are the same. Such kind of distributions is known as normal distributions and they
are common in measurements such as weight, height and in student's marks. Infact most of the measured quantities in nature
follows such a distribution.
20
1. -Vely-skewed--
2. +vely skewed--
3. Zero skeweness,
Figure 12.7
12.5.1 Measures of Skewness
Basing on the explanations of the preceding section, we can deduce the measures of skew ness. One of such measures is the
famous Pearson’s coefficient of skewness, given as  
mean  mod e
. The sign of  in this formula will reflect the three
S .D
types of skweness defined in the preceding section. For example let us find the coefficient of skeweness from the example in
section 11.1.0.We had, x  72.2, ~
x  72andS.D  12.57 hence  =
72.2  72
 0.01, which shows that the distribution is
12.57
positively skewed. The coefficient of skweness could have also been obtained by involving the mode rather than the median
and the formulae would be  
3(mean  median)
, implying that that. mean  mod e  3(mean  median) This
S .D
relationship is known as the empirical relationship between the mean, the mode and the median of a given distribution. For
non- normal distributions, we can always apply the above relationship to approximate one of the measures given the other
two.
Another famous formula that needs no details of the mode and the median is the one involving the 3 rd and the 2nd moments
about the mean of the given distribution. Usually, we have,  
 32
 22
.
12.6 Kurtosis
Kurtosis is the measure of the degree of the peak ness of a distribution .we basically has three types of distributions with
reference to peaked ness. Distributions with excessively high level of peaked ness are known as leptokurtic while those with
extremely low level are called platykurtic. The distributions with a moderate level of peakdeness are known as mesokurtic
with the normal distribution as an example of them. The kurtosis of a distribution is given by    4  22  3 . If  >0, the
distribution is leptokurtic, if  <0 it is platykurtic and if  =0 then the distribution is a mesokurtic

Leptokurtic
platycurtic
Mesokurtic
y
Figure 12.8
21

Exrcise12
1.The following are the figures (in millions of USD) of Tanzanian trade with SADC for the period 1994-1998
Table 12.4
Year
1994
1995
1996
1997
1998
Exports
87.3
96.4
80.7
102.9
69
Imports
233.8
220.8
193.9
226.3
294.1
Source: Tanzania Revenue Authority-Customs Department.
Discuss how the data given can be presented in a bar graph.
2.What is so special about a histogram as opposed to other kinds of bar graphs?
3.What is the main purpose of data presentation? Discuss the important aspects in making data presentation.
22
CHAPTER 13:CORRELATION & REGRESSION ANALYSIS
13.1 Introduction:
In real world we have relationships between two or more different variables. For example, infancy age has something to do
with infancy weight. Similarly, someone's height and weight have a sort of association. There are so many examples in the
real world to show that quantitative relationships among variables exist. In statistics, we seek first to establish in a
mathematical way, whether such relationships exist and later to know the functional nature of such relationship. With the
first involvement we embark on what is called "Correlation Analysis while in the later task we deal with what is known as
"Regression Analysis". For the purpose of this subject we shall confine ourselves to the study of linear relationship between
two variables only, and this would be known as simple correlation and regression Analysis the reader is advised to revisit the
topic on linear functions and its properties for a thorough understanding of the materials presented in this section
13.2 Correlation Analysis
Suppose we have 3 different sets of paired observations for variables x and y plotted in the so-called
Scatter diagrams as shown in the figures below
200
200
150
150
100
100
50
50
0
0
-30
-20
-10
-30
0
10
-20
20
-50
-10
-50
0
10
20
-100
-100
-150
-150
-200
Figure 13.1
Figure 13.2
150
100
50
0
-30
-20
-10
-50
0
10
20
30
-100
-150
-200
Figure 13.3
On the basis of the given figures, it would seem plausible to base our measurement of correlation coefficient on the product
"xy". As you can see in figure 13.1, the two variables have a positive linear relationship and as such, the observations have
concentrated much on the first and the third quadrant in which case the produt "xy" is always positive!
23
Similarly in figure 2, where most observations are found in the second and the forth quadrant, the product "xy” is always
negative and as it can be seen the variables have a negative linear association. Things are different in figure3 where the
observations are scattered all over the four quadrants, the sign of "xy" in this case can generally be regarded to be zero
meaning that no linear association exist between the two variables. As a matter of fact, we cannot trace any sort of linear
trend among the points in figure 3. But then what happens to some points for the case of figures13.1 and 13.2, which seem
not to obey the general trend? In figure13.1 some points are not found in quadrant 1 and 3 and similarly in figure 13.2, some
points are not found in quadrants 2 and 4.This situation suggest for an average consideration of all the points .So the
quantity
 xy
n
will serve to indicate the overall direction and extent of the relation. The meaning is very clear that even
though some of the pair wise product of x and y may be of different sign, eventually the overall sum of the pair wise product
would tell us about the entire direction of the relationship. However, we have still one more problem, and this is on the units
of measurement. Think, for instance, one is to compute the correlation coefficient between height taken in ft and the weight
taken in pounds of some group of individuals. And another person is to do the same exercise perhaps on the same individuals
but with measurements taken in metres for the case of heights and in kgs for weights. The final result for the two individuals
would certainly differ in terms of the magnitude, even though it will be the same in terms of direction. Or think of the
situation where one wants to know as to which variable between weight and age relate much to one's height than the other.
Certainly, there would be no comparison between the two unless the units for ages and weights are harmonised! This
problem is easily over-comed through the use of standard scores. That is, instead of looking for the correlation coefficient
between x and y we shall look for the correlation between the standardised variables of x and y. So the suggested measure
becomes
1  x  x  y  y 

. After some manipulations, the correlation coefficient of the two variables x and y denoted

n  s x  s y 
1
 x y
xy 

n2
by r is then given as r = n
where s x andsy are the standard deviations of x and y respectively. The term
sx s y
in the numerator is known as the covariance of x and y denoted by Cov (x, y) =.
r can also be written as r =
n xy   x y
n x   x n y
2
2
2
  y 
2
1
 x y The above formula for
xy 

n
n2

This new formulae is the most practical for computation purposes. Note that the absolute value of r is always less than 1 i.e.
 1  r  1 . If r is close to 1 indicates a very strong relationship whereas the vice-versa is also true.
Example 1
The following are the scores in terms of G.P.A of 10 pre-entry female students at SUA in 2000/1 against their entry points
(based on A-level) performance
S/N
Entry points
G.P.A
1
2
3
3.5
2.3 2.1
3
4
2.6
Table 13.1
4
5
6
3.5 3.5
4
3.2 3.2
2.5
7
3
3.1
8
3.5
2.8
Find the sample correlation coefficient and comment on the nature of their relationship.
24
9
10
3
3.5
3.6 2.8
Sol:
r=
n xy   x y
n x   x n y
S/N
1
2
3
4
5
6
7
8
9
10
Total
r=
2
  y 
X
3
3.5
4
3.5
3.5
4
3
3.5
3
3.5
34.5
Y
2.3
2.1
2.6
3.2
3.2
2.5
3.1
2.8
3.6
2.8
28.2
2
2
2
7(96.8)  (34.5)(28.2)
10(120.3)  (34.5) 10(81.4)  (28.2) 
2
2

XY
6.9
7.35
10.4
11.2
11.2
10
9.3
9.8
10.8
9.8
96.8
Table 13.2
X2
9
12.25
16
12.25
12.25
16
9
12.25
9
12.25
120.3
Y2
5.29
4.41
6.76
10.2
10.2
6.25
9.61
7.84
13
7.84
81.4
=-0.35
Comment: Since r=-0.35 is negative but close to 0, then the Pre-entry female student's G.P.A's have poor and
negative linear association with their A-level entry points.
Example 2
A group of 5 students took tests before and after training and obtained the following scores
Table 13.3
Before X:
2
2.5
2.5
3
5
After Y:
2.5
3
3
5
5
Find the correlation coefficient r and comment on the nature of the relationship.
Sol:
Table 13.4
X
2
2.5
2.5
3
5
Total
15
Y
2.5
3
3
5
5
18.5
X2
4
6.25
6.25
9
25
50.5
XY
5
7.5
7.5
15
25
60
25
Y2
6.25
9
9
25
25
74.25
r=
n xy   x y
n x   x n y
2
2
2
  y 
2

=
5(60)  (15)(18.5)
5(50.5)  (15) 5(74.25)  (18.5) 
2
2
 0.796
Comment
Since the value of r is positive and close to 1, then there exist a strong and positive linear association between scores before
training and the scores after training
13.2.1 A note on Correlation Coefficient " r”
Although, in most cases the results from correlation analysis gives us a picture on how things relate, its interpretation should
be taken with a great care. It happens in most case that we have surprising results, which seem to be against the intuition.
Whenever such a situation occurs, we need to careful and considerably interpret the result. For example in one instance the
2001UE results for Communication Skills (SC100), Biometry (MB101) and Development studies (DS100) for the students in
B.Sc ESM were considered and it was found that Communication Skills was highly correlated to Biometry than to
Development studies. The results seem to be against the intuition that Communication Skills should not be highly
correlated to Development Study, a subject of which its mastery is largely based on language mastery! But what could be the
conclusion? Instead, of simply concluding that DS and SC are less
connected we should rather think of exceptional
factors prevailing in the conduct of the subjects. One of such possible reasons could be the fact that, most instructors,
including the ones in the said subjects are interested /concerned with the material content of the subject rather than the
grammatical part of it! As a result, the language component may not be reflected in the performance of most subjects
including the ones, which would have, seem to be reasonably related! That Biometry was highly correlated to
Communication Skills, this is no way a by chance event! With mathematical concepts manifested in its teachings, Biometry
was then a measure of one’s general understanding in all the subjects, including Communication Skills
13.3 Regression Analysis
In an attempt to find the form of an equation connecting two or more variables, statistician employs the so-called regression
analysis. In mathematics, this concept is known as curve fitting. There are so many types of curves ranging from linear,
polynomials, logarithmic and so on. As said before, we shall confine ourselves to the study of fitting the linear equations
with two variables only. This is called Simple linear regression analysis.
13.3.1 Simple linear Regression Analysis
Simple linear regression analysis deals with the determination of an equation connecting the linear association of two
variables. Assume that we have n paired observations of x and y such that (x 1, y1), (x2, y2), … (xn, yn). We intend to find a
and b such that yi =a +bxi. Consider, say, the following observations on x and y where y is the number of rats dying due to the
use of dose x of a drug.
Y
x
50
0
90
3
56
4
Table 13.5
62
65
5
10
80
9
70
6
Let us plot the paired observations in the x and y axes so that we get the so-called scatter diagram. The scatter diagram can
easily reveal the extent of linear association between the two variables.
26
Scatter diagram for the number of
dying rats against the dose of a
drug
The number "y" of dying
rats
12
10
8
6
4
2
0
0
20
40
60
80
100
Level "x" of a dose
Figure 13.4: Scatter diagram for the number of dying rats against the dose of a drug
As seen from the plotted graph, the suggested linear relationship does not perfectly exist. But we can just have an
approximate linear relation by finding a line l, which has a minimum sum of square deviations of the vertical distances from
the points not to be found on the line. Such a line is called the line of best fit, or the regression line. The main principle
underlying the linear regression line is that the line is so chosen in such a way that the sum of the square deviations of the
vertical distances of the points not to be found in the line is minimum. The method of fitting a line in such a principle is
called the method of least square estimation
13.3.2 The derivation of the least squares estimates.
We have ei  yi  yˆ i where yˆ i  a  bxi . So, we shall consider
e
2
i
   yi  yˆ i     yi  a  bxi 
2
2
Basing on the principle defined in the previous section, the least squares estimates would be found as follows (the reader is
advised to refer to the topics on partial derivatives and matrix algebra sections)
i)
  ei2
y

i
ii)
 ei2
y

i
a
b
Equating i and ii to zero we have;
And;
 a  bxi 
2
a
 a  bxi 
b
2
 2  yi  a  bxi 
 2  yi  a  bxi xi
an  b xi   y i …………(i)
a  xi  b xi2   y i xi …(ii)
27
Solving i and ii simultaneously using matrix algebra;
x
x
 n

 xi
i
2
i
  yi 
a 
=


b 
 
 y i xi 
 x 
x 
 n

 xi
Pre-multiply by



1
i
2
i
a 
b 
This finally reduces to   
1 0 a   n
on both sides we get; 
  =
0 1   b    x i
1
n xi2   xi 
2
x  y x  y x
Thus a =
n x   x 
2
i
i
i
i
2
2
i
i
and b =
  xi2

  xi
n xi y i   xi  y i
i
i
2
i
i
i
i
  xi2  y i   xi  y i xi 


2
  yi  
n xi2   xi 


=

n
y
x

x
y
 y i xi    i i  i  i 
2


n xi2   xi 


  xi 

n 
n xi   xi 
2
2
1
 x    y 
 x   y x 
=
cov( x, y )
.
var( x)
The expression for a can also be shown to be equal to a = y -b x . These values of a and b are the ones which always
minimises the sum of the square deviations of the vertical distances from the regression line.
Let us apply these findings on the previous problem.(refer to table 13.5)
Table 13.6
S/N
1
2
3
4
5
6
7
Total
X
50
90
56
62
65
80
70
473
b=
Y
0
3
4
5
10
9
6
37
xy
0
270
224
310
650
720
420
2594
7(2594)  (473)(370
 0.082
7(33105)  (473) 2
a = y -b x =
37
473
 (0.082)
 0.26
7
7
28
X2
2500
8100
3136
3844
4225
6400
4900
33105
Interpretation

·The value of a=-0.25 is the y intercept when x=0.So, we can argue that without any explanation on the level of the
dose, about -0.12 rats are expected to die. But as for this example, it suffices to say that no rat will be dying in the
absence of the dose as we cannot have negative number of rats.

·b=0.082 is the slope of the line, hence we can argue that for every unit increase in the dose x, there is a
corresponding 0.082 increase in the number of dying rats. One may not clearly understand what this means as there
cannot be 0.082 rats. However, what it means is that, in every 12 more units of the dose, there will be one more rat
dying from the dose.
Example.1
The following data were obtained on the paired variables x and y
X
Y
(a)
(b)
(c)
-2
2
-1
2
Table 13.7
0
3
1
4
2
4
Find the linear regression line of y on x and comment on the result.
Calculate the correlation coefficient r and comment on the result.
Find sx 2 and sy 2
Sol:
S/N
1
2
3
4
5
Total
x
-2
-1
0
1
2
0
y
2
2
3
4
4
15
Table 13.8
xy
-4
-2
0
4
8
6
X2
4
1
0
1
4
10
Y2
4
4
9
16
16
49
(a) Using a Microsoft excel program, the regression line is easily given as y =3+0.6x
(b) Again using Microsoft excel program, r = 0.95, which indicates a strong and positive linear association
between the two variables.
(c) From computer calculations sx =2 and sy =0.8
13.3.3 Explained, Unexplained, and Total Variations

The variance of the actual values of y observed in the field is called Total variation.

Note that while finding the regression line, we had the vertical distances e1, e2, e3 , …en, of the points not
to be found on the line. If the variance of these vertical distances is computed is known as the variance of
the errors of estimate. Another name assigned to it is the Unexplained variation. The unexplained variation
is normally denoted by se2 .
29

After we have obtained the estimated y's, we can as well obtain their variance denoted as sy . The variance
of the estimated values of y is known as the Explained variation.
Consider the previous example where we had y=3+0.6x
Table 13.9
S/N
1
2
3
4
5
X
-2
-1
0
1
2
ŷ
Y
2
2
3
4
4
1.8
2.4
3
3.6
4.2
ei =yi- ŷ
0.2
-0.4
0
0.4
-0.2
From the tables:
(1.). Sy2 =0.08
(2)
Se2 =0.08
Sŷ2 = 0.72
(3)
From these result you will definitely find that Sy2 = Sŷ2 + Se2
Total variation in y =Explained Variation+Unxeplained variation. This result can be proved
assumption that estimated y must on the average be the same
as the original y
algebraically basing on the
The partition of the total variation in y into two components enables us to define another measure of the goodness
of fit of the given regression line known as the "coefficient of determination" given as R 2
= Explained
variation/Total variation. With reference to this example we have R2 = 0.72/0.8 =0.9 = 90%. Note that R2 = r2
under simple linear regression analysis. The coefficient of determination shows the extent at which the variation in
y is explained/caused by the regression line, the greater the value of R2 =the better is the fitted line and the viceversa hold true. In the example given, 90% of the variation in y is accounted by the regression line.
Exercise 9-13
1.
Clearly, distinguish between statistics and mathematics?
2.
Discuss any four methods of data capture.
3.
Outline the advantages of an arithmetic mean as a statistical average compared to the standard deviation.
4.
The mean of five items of an observation is 4 and the variance is 5.2.If three of the items is 1, 2, and 6 then
find the other two.
5.
The mean and standard deviation of a sample of size 10 were found to be 9.5 and 2.5 respectively. Later on,
an additional observation became available. This was 1.5 and was included in the original sample. Find the
mean and standard deviation of the 11 observations.
30
6.
The following table shows the distribution of ages of a group of people in a village.
Age (years)
0-9
10-19
20-29
30-39
40-49
Number of people
25
35
75
41
24
Calculate:
(i)
(ii)
(iii)
The Arithmetic Mean
Standard Deviation
The coefficient of skew ness and comment on the result
7.
Find the mean and standard deviation of the following two samples put together
Sample. No
1
2
Size
50
60
Mean
158
164
S.D
5.1
4.6
8
The mean and standard deviation of a set of observations were found to be 16.5 and 4.7. But later on it was
discovered that, the value 16 and 12 were wrongly entered instead of a 6 and 2. Find the correct value of the
mean and the standard deviation
9.
Find the mean and standard deviation of the values 4, 5, 6 and 10
10.
Mention and explain any four advantages of taking a sample in a survey instead of having a complete
enumeration.
11.
What is a stratified sampling? Describe it with concrete examples.
12.
An experimenter made the following observations 1, 3, 5, 7, ….99. Find the mean and standard deviation of
the observations.
13.
In question no 6, find the following
(a)
Median
(b)
Mode
(c)
Draw a Histogram, frequency polygon and a greater than cumulative frequency curve.
(d)
Use the greater than cumulative frequency curve to estimate the number of people with ages less
than 28 years.
(e)
Find the range
14.
Explain the difference between the mean and the standard deviation.
15.
We have two investment proposals,
Expected Mean cash flow
Standard deviation
Oil Venture
Rs
1,00,000
7,200
Real Estate Venture
Rs
10,00,000
14,000
Explain which venture is more riskier.
31
16.
Coefficients of variation of two series are 75% and 90% and their standard deviations are 15 and 18
respectively. Find their means.
17.
For a frequency distribution of marks of History of 200 candidates (grouped in intervals 0-5, 5-10…) the
mean and S.D were found to be 40 and 15.Later on it was discovered that the score 43 was misread as 53 in
obtaining the frequency distribution. Find the corrected mean and s.d corresponding to the corrected
frequency distribution.
18.
What is the arithmetic mean for the following data?
Variate:
Frequency:
0
1
1
n
C1
2
n
C2
…
…
….
….
n
n
Cn
19.
Two groups of students reported mean weights of 162 and 148 pounds respectively. When would the mean
weights of both together be 155 pounds?
20.
Find the mean and standard deviation of the following series of observations
-1, 4, -9, 16, -25, 36…10,000.
21.
The expenditure for 100 families is given below:
Expenditure:
0-10
No. of families 14
10-20 20-30 30-40 40-50
?
27
?
15
Mode of the distribution is 24.Calculate the missing frequencies.
22.
Let d I be the deviations of a set of variables from an arbitrary constant C. Show that the standard deviations
of the Variate d I is the same as that of the original variables
23.
Consider a sequence of numbers in an arithmetic progression, with the first term as a1 and the last as an.
Find an expression for mean deviation. Hence or otherwise find mean deviation of the sequence, 1, 2, 3,
…1000.
24.
The mean and standard deviation of a Variate x are m and  respectively. Obtain the mean and Standard
deviation of (ax+b)/c, where a, b, c are constants.
The following were the points scored by the five students of Bsc.Environmental Sciences & Management
students in two different subjects in the June/July-2001 University Examination.
Math-(x)
Comm-(y)
1
5
3
3
2
5
0
3
2
3
Answer the following questions: 25.
The correlation coefficient between the two subjects is
a) 2,
b) -0.9,
c) -0.08 d) 0
e) 0.98
32
f)None of the given
26.
Comment on the nature of the linear relationship between the two subjects: a) Poor and direct relationship; b) indirect and strong relationship;
c) Strong and positive
d) Strong and direct relationship
e) No relationship at all
f) None of the given
27.
The line of the best fit of y on x is
a) y =3x+2
b)y = -0.9x+5
e) y = -0.08x+3.9;
f) y = 5
28.
c) y =8x +3
d) y =-3.9 +0.2x
For every unit increase in the score of Maths (x) there is a corresponding unit change in the score of
Communication Skills by -.
a) an increase of 3.9 marks
c) a decrease of 0.08 marks
e) an increase of 8 marks
b) an increase of 3 marks
d) a decrease of 0.9 marks
f) None of the given
The following table shows a distribution of ages of a group of people in a village.
Age (years)
Number of people
0-9
25
10 - 19
35
20 - 29
75
30 - 39
41
40 - 49
24
Answer the following questions: -
29.
30.
31.
32.
The modal age is
f) None of the given
The median age is
a) 17
b) 24
c) 30
a) 24 b) 20
e) None of the given
d) 45
c)34
The number of people with age less than 36 is
a) 176
b) 167
c) 175
f) You cannot tell
e) 25
d) 35
d) 160
e)168
The number of people with ages greater than 10 is
a) 172
b) 175
c) 140
d) 160
e) 170
Given the following sequence of observations, 1, 3, 6, 10, … 5050:
The mean, and median are;
a) 172, 1000
b) 1717, 1300.5
2555, 1555
e) 2000, 1476
f) None of the given
f) None of the above
33.
c) 225, 1598
d)
34. Asha and Janet are female students studying Mathematics in two different classes, which were taught by two
different instructors in different circumstances. When the test was given to the two classes, Asha got 60% while
Janet did also get 60%. However, the mean score in the first class was 50% and in the second was 40% while the
standard deviations for the two classes were 3 % and  % respectively. Which one between the two girls
performed better in Mathematics than the other?
a) Asha;
b) Janet;
c) No one;
d) Both of them;
33
e) You cannot tell; f) None of the given
35. Asha and Janet participated in a certain beauty contest in which Asha was ranked the fourth out of seven girls
who participated. How beautiful is Janet as per the opinion of the Judges?
(a) Like Asha
b) The ugliest c) The most beautiful
not tell; f) None of the given
d) like the wife of one of the Judges
e) You can
35. The average in the sequence 2,4,8 is 4 and not 4.6 i.e. (2+4+8)/3. Why do you think this is so?
a) Because 4<4.6 b) Because 4 is the geometric mean c) No reason
d) Not true
e) You can not tell;
f) None of the given
36. A surveyor had already identified about 2280 items from which a systematic
Sampling would be made. Given that the sampling interval was 10. Find:(a) The sample size to be taken
(b) If the first item to be picked was the 9th in the list, what would be the last item in the list to be included in
the sample?
(c) Considering the order of the items in the list as the numerical values, find the mean and median of the
sample.
38. y and x are said to be related in the form of y =ax + bx2 whereas the observed values of x and y are as shown in
the table
X
Y
1
10
5
190
6
360
4
124
With your knowledge of linear regression analysis, show that they do indeed obey this law. Hence, estimate the
value of y when x is 10
39. The following experimental values are said to obey a law of the form y=aebx whereas the observed values of y
and x are as shown in the table below
X
Y
1.2
73.20
0.8
22.05
5
64.74
0.7
16.33
( i ) Estimate the best values of a and b
( ii ) Hence estimate the value of y when x is 0.3
40.
Given that variables x and y have a linear relationship and the following data are provided. s x =9.83, s y
=25.916, and b=2.069. Comment on the nature of their relationship. Clearly state the reasons for your
comment.
41.
The first three moments about the value 4 of a variable are, 2, 9.7and –48. Find the 1st three moments about
the mean. Also compute the coefficient of skewness and comment on the nature of the distribution.
42.
Compute the coefficients of skewness and kurtosis if the first four moments about the value 3 of a variable
are 1.7, 8.9, 39.5 and 211.7
34
43.
If the regression line of y on x is 0.1x+1 and that of x on y is x=2y-2. Find the values of y and r.
44.
Show that it is impossible for two variates x and y to have the following properties. E (x)=3, E (y)
=2, E (x2)=10, E (y2) =29, E (xy)=0
45
Is it true that if the regression line of y on x is y=3x+1, then that of x on y is x=2y+3
46.
Let r be the correlation coefficient between x and y. What is the correlation coefficient between (3x+1) and
(2y-3). What conclusion in general can you make on the property of a correlation coefficient “r”?
47.
State the main principle underlying the linear regression analysis.
difference between regression and correlation analysis
48.
Show that the variance of the first n positive integers is
49.
Show algebraically that, the total variation=Explained
1 2
(n  1)
12
Varoiation+Unexplained Variation. (Hint: Consider the fact that, on average the
mean of estimated y’s is the same as the mean of observed y’s)
35
Explain the main
CHAPTER 14: ELEMENTARY PROBABILITY THEORY
14.1 Introduction
The word probability in its ordinary meaning refers to the chance/possibility of a certain event to take place. In other words,
the event must be the one, which no one is certain about how it will exactly occur. These are events occurring without any
rule/order events not determined by any one! . Such random events usually emanate from random experiments such as the
toss of a coin or the throw of die. From this definition it logically follows that in order to determine the probability of an
event, we must have the knowledge on the odds in favour of event A and the odds not in favour of A. The two will form the
set of the odds in favour of none of the event. This set is a universal set consisting of all possible events under consideration.
Accordingly, we define the probability of an event A as the fraction of the odds in favour of A out of the total possible odds.
Think of an experiment where one is tossing a coin. There are two possibilities; a head or a tail may turn up forming a
universal set of two elements. It would logically appear that the chance of having a head turning up in that experiment is
fifty-fifty. i.e. ½ But why? This is of-course based on the assumption that the coin is not biased and there is no reason as to
why should the odds in favour of one side exceeds the other!!! Such an approach to probability theory is known as classical
theory of probability-in reference to the classical society of the time during which the theory was first developed.
However, there many cases where such assumptions are not valid or rather it is difficult to enumerate the odds of the entire
universal set. Such situations suggest another approach to probability theory known as empirical probability. What is
simply meant is, the probability of an event should not merely be a matter of one’s subjective thinking / perception but rather
determined by empirical evidence based on data observation for considerable period of time and space. Just think of how you
can know about the chance of a first year student at SUA to fail in the examination? Or how you can establish the chance of a
smoker to die from cancer? All these would demand an empirical verification as said earlier!
At this point, it suffices to put a note on the uses of the word probability in statistical theory. By saying, for instance, that the
chance for a student to pass an exam at SUA is 0.9, it does not mean that a particular student X sitting for that exam is likely
to pass. Absolutely this is not the meaning! The meaning is if that student is allowed to sit for an exam ten times, nine of the
times is likely to be passing (no wonder that she/He passes less than 9 times).The relevance of the probability of an event in
real life should be sought of in terms of repetitive trials and not otherwise!
For the mastery of this topic the reader is advised, among other things, to make sure that the Set theory and combinatorial
mathematics outlined in the beginning are clearly understood.
14.2.Basic probability Concepts
14.2.1 Experiment
An experiment in probability theory is any well-defined action (trial) of which its outcome is not certainly known. Examples
of trials are such as the toss of a fair coin, the throw of a die or sitting for an examination
14.2.2 Possibility space/set
In an experiment, the set of all possible outcomes is called a possibility set, while each outcome is called a sample point. In
tossing of a die there are six possible outcomes, which form a sample space of six elements.
36
14.2.3 The probability of an event A
As explained in the introductory part, the probability of an event A is defined as the ratio of the number of elements in the
event set to the number of elements in the possibility set. That is, if the possibility set S consists of equally likely outcomes,
then the probability of an event A, written P (A) is defined as P (A) =n (A)/n (S).
Consider an experiment of throwing an ordinary die and the outcome A, such that the number occurring is 5 or 1.
The event set A = 5,1 and the possibility set S= 1,2,3,4,5,6 so P (A) =n (A)/n (S) =2/6 =1/3
From this definition, it follows that, always 0  P( A)  1 , because A is a subset of the Universal set. If P (A)=0 the event
cannot occur i.e. an impossible event. If P (A)=1 then the event is a sure event, i.e. it must occur.
14.2.4 Complementary Events
Let A denote the event A does not occur, then the following is true
P ( A ) =1-P (A) or P (A) +P ( A ) =1
Example
A bag contains 5 black and 3 white balls. A ball is drawn from the box. Find the probability that (i) the ball drawn is black
(ii) it is not black.
Sol:
(i.) P(black)=5/8
(ii)
P(not black)=1-P(black)=1-5/8=3/8
14.2.5 Independent Events
Two ehvents A and B are independent, if the occurrence of one event does not affect the occurrence of another. If two
events A and B are independent, then P (A  B)=P (A)xP (B). This is called the multiplication law for independence of two
events.
Example.1
The chance that A and B will solve a question are ½ and 2/3 respectively. If they both attempt the question find the chance
that the question will not be solved.
Sol:
A question will not be solved if and only if both of them fail in solving the question. So it requires us to consider the joint
occurrence of the events. “ A fails and B fails to solve the question”. Since the two events are independent then the chance is
1/2x1/3=1/6.
37
Example .2
A die is thrown twice. Find the probability of obtaining a 4 on the first throw and an odd number on the second throw.
Sol:
We consider two events. “Event A , a 4 on the first throw and event B an odd number on the second throw”.
P(A) =1/6, and P(B) = 3/6 since we have three odd numbers out of the six possible outcomes. Knowing that A and B are
independent then P(A and B)=P(A)xP(B)=1/6x3/6=3/36.
14.2.6 Conditional Probability
Example
An urn contains 3 red balls and 5 white balls. A ball is drawn at random and its colour is noted, without replacing the first
ball, a balujjl is drawn and its colour noted as well. Find the probability that
(a) Both balls are white, if the first was also white.
(b) ,, ,,
,, red
,, ,, ,, ,,, red
Sol:
Let w=event that selected ball is a white ball and r=event that selected ball is a red ball. We shall write P (w2/ w1) to mean the
probability of the second ball selected is white given that the first was white. Note that the notation does not imply division!
(a). P (w1  w2) =P (w1) xP (w2/ w1)=(5/8)x (4/7)=5/14
(b) P (r1  r2) =P (r1) x P (r2/ r1)=(3/8)x (2/7)=3/28
In general, if two events A and B are not independent, then P (A  B)=P (A) x P (B/A) or P (A  B)=P (B) x P (A/B).
14.2.7 Additive law of probability
If A and B are any two events of the same experiment such that P (A)  P (B)  0, then
P (A or B)=P (A)+P (B) - P (A and B)
Symbolically, P (A  B) =P (A) +P (B) - P (A  B)
Example1
A and B are shooting at a target. The chance for A to shoot the target is 1/3 and for B is 2/3. Find the probability that the
target will be hit .The target will be hit if either A or B shoot the target.
Sol:
From the established law, P (A or B) =P (A) +P (B) - P (A and B) = 1/3 + 2/3 – (1/3)*(2/3) = 7/9
38
Example.2.
Given that P (A/B) =2/5, P (B)=1/4, P (A)=1/3. Find (a) P (B/A)
The multiplication law of probability can be extended up to a case of more than two events.
Exercise
Given three events A, B, C, establish the additive law for the three events.
14.2.8 Mutually Exclusive Events
If either event A or B can occur but not both, then the two events A and B are said to be mutually exclusive events. In this
particular case P (A  B)=0. So for mutually exclusive events A and B P (AorB) =P(A  B)= P(A)+P(B)
Example.
hA die is thrown, find the probability of having a number which is either less than 3 or greater than 4.
Sol
The two events A and B are mutually exclusive, none of the numbers can be less than 3 and at the same time greater than 4.
Accordingly, P(A or B) =P(A)+P(B) = 2/6+2/6=2/3
14.2.9 Exhaustive Events
If two events A and B are such that AUB=S then P(AUB)=1 then events A and B are said to be exhaustive.
Consider S= 1,2,3,4,6 A= 1,2 , B= 3,4,6, AUB= 1,2,3,4,6, P(AUB)=n(AUB)/n(S) =6/6=1.Therefore, A and B are
exhaustive eventsj.
14.3 Worked examples
Example 1
A box contains 12 balls in which 4 are white, 3blue and 5 are red. 3 balls are drawn at random from the box. Find the chance
that
 All the three balls are of the same colour
 Two of the balls are of the same colour and
 All three balls are of different colour
39
Sol:

The same colour means either all white or all red or all blue. These three events are mutually exclusive
 4   3  5 
       
 3   3  3   3
events. Accordingly; we have
44
12 
 
3 

Two are of the same colour means either white with the rest or blue with the rest or red with the rest.
Accordingly,

 3  9   4  8   5  7 
          
 2 1   2 1   2 1   29
44
12 
 
3 
All three are of different colours means one white, one blue and one red. Accordingly,
 4  3  5 
   
1 1 1   3
11
12 
 
3 
Example.2
k
There are two urns. The first contains 5red balls and 3white balls .The second contains 7 red balls and 3 white balls. An urn
is randomly selected and the ball drawn from it. What is the probability that the ball drawn is red?
Sol:
The red ball can be drawn from either the 1st urn or the 2nd urn. The chance a red ball is drawn from the 1st urn is ½*5/8 while
the chance that a red ball is drawn from the 2nd urn is ½*7/10. Since the two events are mutually exclusive then the
probability of drawing a red
ball is ½*5/8+1/2*7/10=53/80
14.4 Bayesian Probability theory
Suppose event A can be caused by a set of mutually exclusive and exhaustive events E1, E2, E3…En .By the conditional
probability theory, P (AnEi)=P (A/Ei) P (Ei). Since event A can occur through E1, E2, E3…then the probability that event A
n
occurs=P (AnE1)+P (AnE2)+…. +P(AnEn)=
 P( AnE ) . The probability of event A to occur is known as the total
i
i 1
probability. Suppose event A has occurred or must occur. What would then be our interest? Most certainly, our interest
would be to know as to what events has caused it or may lead to the occurrence of such event. For example, if A occurs what
is the chance that event Ei has caused it?
n
We know that P (Ei/A)=P (AnEi)/P (A)=P (AnEi)/
 P( AnE ) P( E )
i 1
i
40
i
n
= P (A/Ei) P (Ei). /
 P( A / E ) P( E ) .
i 1
i
i
What we have so far deduced is known as Bayesian theory, owing its name to the English theologian and probabilistic
BAYES Thomas (1702-1761) who firstly inverted it.
Example.1
A product in a certain plant can be manufactured by any of the three different machines, M1, M2, M3.The chances for a
machine to manufacture a defective item are respectively 0.3, 0.2 and 0.5. Assuming that the chance for a product to be
manufactured by any of the machine is 1/3.Find,
(a)
(b)
The chance that a manufactured product is defective;
The chance that a defective product is manufactured by a machine M2
(a)
We have P (D/M1)=0.3, P (D/M2)=0.2, and P (D/M3)=0.5
Sol:
3
The chance for a defective product
 P( DnM ) =0.3x0.3+0.2x0.3+0.5x0.3=0.3
i
i 1
(b)
P (M1/D)=P (DnM1)/P (D) = P(D/M1)xP(M1)/P(D)=0.2x0.3/0.3=0.2
Example 2
There are two urns. The 1st contains 8 red and 2 green balls. The second contains 4 red and 3 green balls. A ball is drawn at
random from one of the urns and found to be green. What is the chance that the ball was drawn from the 2 nd urn.
jSol
Let suppose
U1 stands for the 1st urn
U2 stands for the 2nd urn
G
“ for the green ball selected
P (U1)=1/2 , P(U1)=1/2, P(G/U1)=2/10, P(G/U2)=3/7
h
We want to find P(U2/G).=P(U2nG)/P(G). But we know that a green ball can be selected from either the 1 st urn or the 2nd
2
urn. Accordingly P(G)=P(GnU1)+P(GnU2)=
 P(GnU ) P(U ) =0.5x2/10+0.5x3/7=0.31
i 1
i
i
Therefore (U2/G).=P(U2nG)/P(G).=P(G/U2)xP(U2)/P(G)=(0.2x0.5)/0.31=032
41
Exercise 14.
1.
An ordinary die is thrown. Find the probability that the number obtained
(a)
is a multiple of 3
(b)
is less than 7
(c)
is greater than 10
2.
A sample of 28 rats was treated by a certain dose of drug, with an anticipation of 5 different effects that may
be observed. The number of effects against the number of rats affected were observed and recorded as
follows: Number of effects
Number of rats
0
4
1
12
2
8
3
3
4
2
5
1
If the rat is chosen at random, find the probability that there are : (a)
(b)
(d)
3 effects on a rat
2 or 5 effects
at least one effect
3.
For the events A and B it is known that P(A)=2/3, P(AB)=5/12 and P(A B)=3/4 find P(B)
4.
A and B are two events such that P(A)=8/15, P(B)=2/3 and P(AB)=1/5. Are A and B exhaustive events?
5.
A and B are exhaustive events and it is known that P (A/B)=1/4 and P (B)=2/3.Find P (A)
6.
Write short notes on the following terms
(a)
(b)
(c)
(d)
(e)
7.
Probability
Sample space
Sample point
Mutually exclusive events
Independent events
The probability that an animal treated with a certain chemical will die is 0.2.Find the probability that
(a)
(b)
Two treated animals will die
In two treated animals one will die, the other will survive.
8.
The probability that a customer will visit a pharmacy in a day is 0.025. Find the probability that on two
consecutive days at least one customer will visit the pharmacy.
9.
A coin is tossed and a die is thrown. What is the probability of obtaining a head on the coin and an even
number on the die?
10.
Suppose that in general 20% of the patients affected with a specific disease die. In a random sample of 3
such patients, what is the probability for 2 deaths?
11.
A die is thrown 3 times, what is the probability that
(a)
(b)
12.
all throws show 6
all throws are alike
In certain experiment known as binomial experiment a coin was tossed 3 times. Find the probability of
having
42




0 Number of heads
2
``
``
3
``
``
0
``
``
Find the sum of the probabilities found above and comment on the result obtained.
13.
Of 150 patients examined at a clinic, it was found that 90 had heart trouble, 50 had diabetes, and 20 had both
diseases. What percentage of the patients had either heart trouble or diabetes?
14.
If 60 percent of the American males of the age of 20 and 65 percent of American females of the age of 20
live to be 70, what is the probability that an American couple married when they were 20 years will live to
celebrate their golden wedding?
15.
If two guinea pigs, one of pure black race and the other of pure white race are mated, the probability that
each offspring of the second generation is pure white, pure black or of mixed colour are respectively ¼, ¼
and ½. What is the probability that 3 such off springs would posses different colours?
16.
From a sample of 6 orange trees and 8 lemon trees, 5 trees are chosen at random for experimentation. Find
the probability that
17.
(a)
2 orange trees and 3 lemon trees are chosen
(b)
There are more number of orange trees chosen than lemon trees.
A balanced diet is said to have a combination of certain amounts of proteins, carbohydrates, and vegetables.
A student was asked to prepare different types of meals from a set of 5 different types of carbohydrates, 6 of
proteins and 2 of vegetables. How many different types of meals will a student prepare if she uses: (i)
(ii)
2 types of carbohydrates, 3 of proteins and 1 of vegetables.
1 type of carbohydrates, 2 of proteins and 2 of vegetables.
18.
Into how many different ways can the letter of the word stratified be arranged?
19.
A box contains 5 white and 2 black balls. A second box identical with the first contains 3 white and 5 black
balls. One box is chosen and a ball withdrawn from it. What is the probability that the ball drawn is white?
20.
Find the number of permutations of the letters of the word GRAMMAR.
21.
Find the number of ways in which 3 animals for experiment can be chosen from eight different animals.
22.
Find the number of ways in which six boys can be divided into two teams of three.
23.
In a sample of 24 animals, 7 have black colour. If two animals are chosen at random from the sample, find
the probability that
(i)
(ii)
(iii)
24.
They both have black colour
Neither has black colour
If 3 animals are chosen at random, find the probability that more than 1 will have black Colour.
In an experiment to determine the effects of a certain fertilizer on crops, two plots of land containing fruits
and wood trees were considered. Plot 1 contains 40 fruits tree and 10 wood trees. Plot 2 contains 20 fruit
trees and 70 wood trees. An unbiased coin is tossed. If a head turns up, a tree is selected from plot 1, while if
a tail turns up a tree is selected from plot 2. Calculate the probability that a fruit tree is selected for
43
experiment in (a) one trial (b) Given that a fruit tree is selected for experimentation, calculate the probability
that when the coin was tossed a head was obtained.
25.
In a certain Experiment 5, seeds of normal type were planted. Assuming other factors remain equal, find the
chance that (i) none will germinate (ii) only two will germinate (ii) at least one will germinate.
26.
If on average, 1 ship in every 10 is sunk, find the chance that out of 5 ships expected 4 at least will arrive
safely.
27.
A six –faced die is so biased that it is twice as likely to show an even number as an odd number when
thrown. It is thrown twice. What is the probability that the sum of the two numbers thrown is even?
28.
The chance of winning three of the five games and four of the five games are equal. What is the chance of
wining all five games.
29.
When a soldier fires a target, the probability that he hits is: 1/3 for soldier A, 1/6 for soldier B, 1/6 for soldier
C and 1/12 for soldier D. If all the four soldiers A, B, C,D fire at the target simultaneously, calculate the
probability , that the target is hit by someone or more.
30.
In a bolt factory machines A, B,C manufacture respectively 25%, 35% and 40% of the total production. Of
their output 5,4, 2 percents are defective bolts. A bolt is drawn at random from the product and found
defective. What are the probabilities that it was manufactured by machines A, B, C?
31.
An urn contains 4 white and 5 black ball, a second urn contains 5 white and 4 black balls. One ball is
transferred from the first to the second urn, then a ball is drawn from the second urn. What is the probability
that it is white?
32.
A and B toss a coin alternatively on the understanding that the first to obtain a head wins the toss. A begins.
Find their respective chances of wining.
33.
What is the probability of picking either a red piece or white piece from a container, which contains 15 red, 5
white and 13 green pieces.
34.
If P (A) =1/2, P (B) =1/3 and P(C) =2/3. Find P (A or B or C)
35.
A card is picked at random from many cards numbered 1,2,3…2000. Find the chance that the picked card is
either divisible by 3 or 7.
36.
There are 4 female students, 4 male students and 4 lecturers available for interview. Three persons are
chosen at random for interview. Find the probability that, all the three categories of people are selected given
that - ( i) Sampling is done with replacement( ii) without replacement
37.
Three groups of plants contain respectively 3 lemon trees and 1 orange tree; 2 lemon trees and 2 orange
trees; and 1orange tree and 3 lemon trees. One tree is selected at random from each group. Find the
probability that the three selected trees consists of 1 lemon tree and 2 orange trees.
38.
A bag contains 50 tickets numbered 1, 2, 3, …50 of which 5 are drawn at random and arranged in ascending
order of their numbers.
X1 < X2 < X3, …X5
What is the probability that X3 ,=30 ?
44
CHAPTER 15: INTRODUCTION TO PROBABILITY DISTRIBUTIONS
15.1 Discrete random variables and discrete probability distributions
A variable of which its occurrence is subject to chance is known as a random variable. In tossing a coin n times the variable
X denoting the number of heads to appear is certainly a random variable of whose possible values are 0,1,2,3…n. For a
variable to be random it must be determined from a random experiment. Basically, a variable describes with numerical
quantification about a certain outcome of interest, in a given experiment. Our interest in this discussion is focused on all
possible values of a variable and not just some of them. Implyingly random variables describe numerically a set of
exhaustive sample points to be taken by the event of interest in a given experiment. Conclusively, a discrete variable X
n
assuming the values x1, x2… xn, with associated probabilities p1, p2… pn will be a random variable if and only if
p
i 1
i
=1.
Example .1
Let X be a variable “ the number of fours when two dice are thrown”. Show that X is a discrete random variable.
Sol:
When two dice are thrown, the possible number of fours is 0, 1 or 2. Therefore x can take the values 0, 1, and 2, meaning
that X is a discrete variable.
P (X=0) =(5/6) x (5/6)=25/36
P (X=1) =(5/6) x (1/6) +(1/6) x(5/6)=10/36
P (X=2) =(1/6) x (1/6) =1/36
xi p ( xi ) =25/36+10/36+1/36=1
Now

Therefowre X, is a discrete random variable
Example 2
hklfghdIf we toss a fair coin twice, the numbers of heads to obtain are 0, 1 or 2.
J,nP (X=0) = (1/2) x (1/2) =1/4
P (X=1) = (1/2) x (1/2) + (1/2) x (1/2) =1/2
P (X=2) == (1/2) x (1/2) =1/4
The value being considered is the number of heads .It can only take the value 0, 1, 2 and so is called a discrete variable.
Again, P (X=0) +P (X=1) +P (X=2) =¼+1/2+1/4 =1. So X is a discrete random variable.
In the examples given, we had the set of values to be assumed by a specified random variable together with their
corresponding probabilities. Such a distribution is called a probability distribution (Recall about a frequency distribution).
We can present such a distribution in a tabular form
45
Table 15.1
0
1
¼
1/2
X
P (X=x)
2
1/4
Sometimes the probability distribution of X can be expressed as a function of x in the form of a formula. In the above table
one could present the relationship in the following way

 p( x)  1 4 , if x  0, 2
 1

 2 if x  1
Such a function providing the probabilities of X at various values is known as the probability density function (p.d.f) or
probability mass function (p.m.f).
Example
The p.d.f of a discrete random variable y is given by P (Y=y) =cy2 for y=0, 1, 2, 3.4. Given that c is a constant, find the value
of c.
Sol:
From the definition of a random variable
c
y
2

p
i

must be equal to unity. So
 p(Y  y)   cy
2
1
=1.We have c 0 2  12  2 2  32  4 2  1 , and thus c=1/30
15.2 Expectation, E(x)
15.2.1 Introduction
The expectation E (x) of a random variable X is simply the mean of the probability distribution of X. It shows the average
n
x p
value of X expected after the conduct of a given random experiment. Thus E(X)=
i 1
n
i
p
i 1
i
n
=
 xi pi since
i 1
n
p
i 1
i
is always
i
equal to 1.
Example 1
A random variable X has probability distribution as shown in the table below. Find E (X)
Table 15.2
X
1
2
3
P(X=x)
0.3
0.2
0.5
.
Sol:
E (X) =1x0.3 +2x0.2 +3x0.5=2.2
Example.2
In a certain gambling game, a die is thrown and you bet for a number to appear. If your guess is correct a 100/= is awarded to
you, otherwise you are the one to award. If Mr A is to play only once, what is his expected gain?
46
Sol:
We need to find out the probability distribution for the concerned random variable. The variable X can assume only two
values, which are either 100 or 100 depending on the show up of the die.
Table 15.3
X
P (X=x)
-100
5/6
100
1/6
E (X) =-100x5/6 +100x//6=-67/=
15.2.2The Expectation of any Function “f (x)”
Sometimes we are interested in the expectation of a function of x. For example one might be interested in the expected value
of the linear function of x or quadratic function of x and so on. In general, if g (x) is any function of the discrete random
g ( xi ) P ( xi )
variable X then E (g (x)) =

Theorems
1. E (a)
=
a where a is a constant
2. E (aX)
=
aE (X) where a is a constant
f3. E (ax+b)
=
aE (X) +b where a and b are any constants
4. E [f (X) +g (X)]
=
Ef (X) +Eg(X)
The student is advised prove the above identities. Their proofs should be straightforward basing, on what was outlined in the
properties of a sigma notation.
Example3.
The following is the probability distribution of X shown in the table below
Table 15.4
X
0
1
4
P (X=x)
0.2
0.4
0.2
Find the following
(a) E (X) (b) E (2X) (c) E (7X+1)
(d) E( X2 ) (e) E (X+ 5 X2)
Sol:
(a) E (X) =0x0.4+1x0.4+4x0.2=1.2 (b) E(2X)=2xE(X)=2x1.2=2.4
=7E(X)+1=7x1.2+1=9.4
(d) E (X2)= 0  0.2  1  0.4  4  0.2  3.6
2
2
2
(e) E (X+ 5 X ) =E(X)+E(5 X ) = E(X)+5E(X ) = 1.2+5x3.6=19.2
2
2
2
j
47
(c)
E(7X+1)
15.2.3 Variance, v (x)
Recall that in a frequency distribution V (X) =
random variable with E (X) = μ then V (x) =
f
p
i
i
(Xi - X )2
(xi - μ)2 /

/

f i = ∑ fi(xi)2 /  f i –( x )2 .But if X is a discrete
p i = ∑ X2pi /  p i –(μ)2 = ∑ X2pi- μ2 = E (X2) – μ2.
For example, variance of the distribution given in the preceding section would be V (X)=3.6-1.22 = 2.16
d
Theorem
1.
V (aX)
=
a2var (X)
2.
Var (aX+b)
=
a2var (X)
The student is advised to verify the above identities.
Exercise
A committee of six students is taken out of a class of 10 females and 8 males. Find the expected number of females in the
committee. Find also the standard deviation of the possible number of female students
15.2.3 Moments
Just as we have moments in frequency distributions, we also have moments in probability distributions. The frequencies shall
be replaced by probabilities in the original formulae. The rth moment about the mean of a probability distribution will be
given as
 r   ( x   ) P ( xi )  E ( x   ) r
r
c  r 
 ( x  a)
r
Whereas
the
rth
moment
about
a
point“a
”will
be
given
by
P ( xi )  E ( x  a ) r
15.3 Bivariate probability distributions
It sometimes happens that our interest is focused on the event involving the joint occurrence of two or more varieties. These
variables may be independent or dependent. These two conditions will have implication on the statistics involving such two
variables. Usually if we consider the joint occurrence of two variables the following statements are always valid.
1.
2.
3.
4.
E (X  Y)
E (XY) =
var (X  Y)
Var (X  Y)
=
E (X)  E (Y)
E (X) x E (Y) if and only if X and Y are independent
=
V (X) + Var (Y)  Cov (X, Y)
=
V (X) + Var (Y) if and only if X and Y are independent
jh
48
Proof
1. E ( X  Y ) 
n
k
 ( x
i 1 j 1
2.
k
n
k
n
k
i 1
j 1
 y j ) pij  xi pij   y j pij  1   xi pi  1   y j p j  E ( x)  E ( y)
i 1 j i
i 1 j 1
If X and Y are independent then the joint p.d.f of X and Y “ P (X=x,Y=y) may written as
P (X=x)  P(Y=y)= pi p j .
n
So E(XY)=

i 1
3.
n
i
k
n
k
j 1
i 1
j 1
 xi yi pi p j   xi pi  y j p j  ExE y 
Having proved the 1st and the 2nd the 3rd become straight forward.

 
 

Var ( X  Y )  E x  y   E x  y   E ( x 2 )  E y 2  2 E xy  E 2 x   E 2  y   E x E  y 
Rearranging the terms accordingly we get var (X  Y)=V (X) + Var (Y)  Cov(X, Y) where
2
Cov(X,Y)=E(XY)-E(X)E(Y)
4
If the two variables are independent then from the 2nd Cov (X, Y)=0 and hence var (X  Y)= V (X) + Var
(Y)
Example .1.
Suppose two coins are tossed once. Let X be the random variable denoting the number of heads in the 1st coin and
let Y be the random variable denoting the number of heads in the second coin Obtain the probability distributions of
X and Y
(a)
(b)
(c)
Find E(X), Var(X), E(Y) and Var(Y)
Obtain the probability distribution for the random a variable X+Y, X-Y and XY
Find the E (X+Y), Var (X+Y), E (X-Y), Var (X-Y), E (XY) and compare the results with the results in part a
above.
Sol.
The probability distributions are as follows
X
P (X)
X
P (X)
0
0.5
Table 15.5:For X
0
1
0.5
0.5
Table 15.5: For Y
1
0.5
(a) E (X)
=
0 x0.5+1x0.5 =0.5
Var (X)
=
E (X2) – μ2 = 0x0.5+1x0.5-0.52 = 0.25
h
By symmetry E (Y) =0.5 and V (Y) =0.25
49
(b) Let V=X+Y and U=X-Y and R =XY
kjWe shall consider all the possible pairs for x and y in order to determine possible values to be assumed by U,
V and R. Accordingly, we have the following probability distributions for U, V and R
Table 15.6: For V
V
P(V=v)
0
1/2x1/2=0.25
1
1/2x12+1/2x1/2=0.5
2
1/2x1/2=0.25
Table 15.7: For U
U
P (U=u)
i
-1
1/2x1/2=0.25
0
1/2x1/2+1/2x1/2=0.5
1
1/2x1/2=0.25
Table 15.8: For R
Rfd
P(R=r)
(c)
0
1/2x1/2+12x1/2+1/2x1/2=0.75
1
1/2x1/2=0.25
Comparison
(i)
For X+Y
E(X+Y)=E(V)=0x0.25+1x0.5+2x0.25=1, =0.5+0.5=E(X)+E(Y)
V (X+Y)=Var (U)=(0x0.25+1x05+4x0.25)-12=0.5=0.25+0.25=V (X)+V(Y)
Since X and Y are independent
g
(ii)
For X-Y
E (X-Y) =E (U) =-1x0.25+0x0.5+1x0.25 = 0 = 0.25-0.25=E (X)-E (Y)
V (X-Y)=V (U) =1(1x0.25+0x0.5+1x0.25)-02=0.5=0.25+0.25=V(X)+V (Y)
Since X and Y are independent
(iii)
For XY
E (XY)=E(R)= 0x0.75+1x0.25 =0.25=0.5x0.5=E (X) xE (Y)= since X and Y are independent
Example .2
The p.d.f of a joint distribution of x and y is given as
  0.2 y 2  x
for x  1,2,4 and y  1,2,3,4

10

f(x, y) = 
0 otherwise
Find (i) E (X)
(ii) E(Y) (iii) (XY) and verify whether X and Y are independent or not
50
Sol :

(i)
E(X)=
 0.2 y
x
2
10
ally allx
x


 1

1
   0.2 xy 2   x 2     0.2 y 2  x  1 x 2 
 10 

10  ally allx
ally allx
ally
allx
aly allx




1
 0.2(12  2 2  3 2  4 2 )(1  2  4)  (1  1  ...1)(12  2 2  4 2 )  4.2
10
(ii)
 y
E (Y)=
 0.2 y
2
10
ally allx
x


 1
1
   0.2 y 3   xy     0.2
 10 
10  ally allx
ally allx
allx




 y   y x 
3
ally
ally
allx

1
3 0.2(13  2 3  33  4 3 )  (1  2  3  4)(1  2  4)  1
10
(iii)
E(XY)=

 1
 1
1
  xy  0.2 y 2  x      0.2 xy 3   x 2 y     0.2 y 3  x   y  x 2 






10  ally allx
ally allx
ally
ally allx
 10  ally allx
 10 

1
 0.2(13  2 3  33  4 3 )(1  2  4)  (1  2  3  4)(12  2 2  4 2 )  7
10




Since 7  4.2 x 1 then X and Y are not independent.
15.4 Marginal Probability Density Function
The joint probability distribution function P (x, y) describe the joint occurrence of the two variables x and y. We can
however derive the probability distribution of one variable from the joint distribution by fixing the other. If we take the
sums with respect to y (making x fixed) in the P (x, y) i.e.
P( x, y ) the value of P (x, y) will become simply P (x) and

ally
the vice-versa is also true. Such probability distributions for a single variable derived from the joint distributions of the
two are known as marginal density functions. Consider the joint p.d.f of example 2 in the preceding section where we
had
  0.2 y 2  x
for x  1,2,4

10

f(x, y) = 
0 otherwise
and
y  1,2,3,4
The marginal densities for x and y can be found as follows
51
P(x)

1
0.2
1
0.2
x
4x  6
(0.2 y 2  x)  
y2   x  
(30)  1 
for x=1,2,4


10 ally
10 ally
10 ally
10
10 ally
10
P(y)

1
0.2
1
0.2 2
1
 0.6 y 2  7
2
2
for y=1,2,3,4.
(

0
.
2
y

x
)


y

x


y
1

x


 100 


10 allx
10 allx
10
10 allx
10
allx
allx
15.5 Continuous random variable and continuous probability distributions
15.5.1 Introduction
A continuous random variable is a theoretical representation of a continuous variable such as height, mass or time. The
probability density function of a continuous random variable X is often denoted as f (x), where 0<f(x)<1 through out the
range of values for which X is valid. In a discrete random variable we had the sum of the probabilities being equal to one.
This is also the case for continuous random variable. However in a continuous random variable x cannot take exact values
such as 2,5,or 3and as such we consider a certain interval of x values and not a single value. Infact P (X=a) where a is an
exact value can never be defined..
Example
X is the random variable “ the delay in hours of flight from Airport where f(x) = 0.2-0.02x
probability that (a) the delay will be less than 4 (b) the delay will be between 2 and 6
for 0  x  10. Find the
Sketch of f(x)
y
0.25
0.2
0.15
0.1
0.05
0
0
5
10
15
As you can see the values of x which is the number of hours ranges from 0 to 10 and there are infinitely many hours
between 0and 10 inclusive. Any interval selected at random from the given range of x values may be thought to have a
corresponding portion of area in the total area under the curve of f(x) as it can be suggested by the indicated rectangles.
In other words considering the P(0<X<a) is much like thinking of the area under the curve between X=0 and x=a, out of
a total area under the curve from X=0 to X=10.Therefore
P(0<x<a)=
xa
x 10
xa
x 0
x 0
x 0
 f ( x)dx /  f ( x)dx =  f ( x)dx
52
Note that the whole area under the curve would then represent the total sum of all the probabilities, which is 1.
x2
(a) P (0<X<2)=
 (0.2  0.02x)dx  0.36
x 0
x 6
(b) P(2<X<6)=
 (0.2  0.02x)dx  0.48
x2
All the statistics discussed in the discrete random variable do also apply for the case of continuous probability
distributions. It is important to realise that, this time we are dealing with infinitely many values of X within a given
range. Wherever there is a sum in a discrete case then there is a sum to infinity values in continuous distributions. In
turn this leads us to evaluate a definite integral on the specified summing limits.
The similarity of the formulae between the discrete and continuous random variables is given in the following table
Table 15.9
S/N
1.
Discrete
p
i
Continuous
1

 f ( x)dx  1
allx

2.

n
E ( X )   xi p ( xi )
E ( x) 
i 1
3.
E (g (x)) =
 xf ( x)dx


 g ( xi ) P ( xi )
E (g (x)) =
 g ( x ) f ( x)dx
i


4.
x
2
f ( x)dx   2  E (X2) – μ2
V (x)=∑ X2pi- μ2 = E (X2) – μ2
V(x)=
E(a)=a
E (ax)=aE(x)
E (ax+b)
Var (ax+b)=a2va(x)
E (a)=a
E (ax)=aE (x)
E (ax+b)
Var (ax+b)=a2va(x)

5.
6.
7.
8.
9.
r 

1
r
( x   ) P( xi )  E ( x   ) r   ( x   ) r f ( x)dx  E ( x   ) r

r

n

10.
 r   ( x  a) P( xi )  E ( x  a) r

r
 r   ( x  a) r f ( x)dx  E ( x  a) r

Exercise
In the example given above find the following
(i) Mean
(ii) Var (x)
(iii)
mode
(iv) The coefficient of skew ness (v) Median
53
15.5.2 Bi-variate Continuous Probability Distributions
If X and Y are two continuous random variables we can consider their joint occurrence just as we did in a discrete
random variable case. So Z=f(x,y) will dente the joint p.d.f of x and y. The following statements are still valid under a
continuous probability distribution
1. E (X  Y)
2. E (XY)
3. var (X  Y)
4. Var (X  Y)
=
=
=
=
E (X)  E (Y)
E (X) x E (Y) if and only if X and Y are independent
V (X) + Var (Y)  Cov (X, Y)
V (X) + Var (Y) if and only if X and Y are independent
Example


3 2 xy 2  y 2
for 1<x<3 and 0<y<2
80
2x  1
3y 2
densities are respectively f (x)=
and f (y)=
10
8
The joint p.d.f for X and Y are given as f (x, y) =
Find
whereas as the marginal
(i) E(X) (ii) E(Y) (iii) E(XY) and verify if X and Y are independent or not
Sol:
(i)
(ii)
(ii)
E ( x) 
E ( y) 
E(XY)

3


1
2

0
 xf ( x)dx  
x(2 x  1)
dx  2.13
10
 yf ( y)dx  
y (3 y 2 )
dy  1.5
8
 2 xy 2  y 2 
3 2 x 2 y 3  xy 3


=  xyf ( x, y)dxdy   xy3
dxdy
dxdy  80 
80



3
2
3
2

3 
2
3
3
=
2 x dx y dy   xdx y dy  3.2
80  1
0
1
0

Since E (XY) = 2.13 x 1.5=E (X) x E (Y) then X and Y are independent continuous random variables
15.5.3 Marginal Probability Density Functions for a Continuous Random Variable.
Similar to the discrete case, we can obtain the p.d.f of one variable from the joint probability distribution of X and Y by

fixing one of the variable. So the p.d.f for X would be given by f (x)=
 f ( x, y)dy and

example given, we can derive the marginal densities of X and Y in the following ways:
54
vice-versa is also true. In the
2
2
 2x  1
 2 xy 2  y 2 
3 
2
2


dy

f (x)=  f ( x, y )dy =  3
2
x
y
dy

y
dy

0   10 for 1<X<3

80
80  0

0 

3
3

3
 3y 2
1  2
2 xy 2  y 2
2
f(y)=  f ( x, y )dx = 
dx = 2 y  xdx  y  dx 
8
32 
32
1
1

1


2
15.6 The Moment Generating Function
The moment generating function of a probability distribution is defined as the E ( e tx ). Accordingly, for a discrete
random variable X the m.g.f will be given by M x (t )  E (e tx )   e tx p( x) while for a continuous random variable
allx

the m.g.f will be given by M x (t )  E (e tx )   e tx f ( x)dx . The moment generating function, as the name suggest is a

very useful tool in generating the moments for most probability distributions. If an m.g.f exists, it is unique for
that particular distribution. We can see the use of the m.g.f by expanding E ( e tx ) using the Maclaurian series
expansion for ex. We know that
t 2  2 t 3  3
t 2 E x2 t 3 E x3
tx


  )= 1  t1 

 .
E( e )=( 1  tE x   
2!
3!
2!
3!
 
 
As the expansion suggest, we have the moments about the origin as the coeffients of t in the given infinity series.
In each case, the 1st, the 2nd, the 3rd …rth moments about the origin can be easily obtained by substituting t=0 in
the rth derivative of the m.g.f
d r M x (t )
i.e.  r 
t=0
dt r
Exercise 15
1.
The discrete random variable X has p.d.f as shown
X
6P(X=x)
1
0.2
2
0.25
3
0.4
4
a
5
0.05
Find (i) the value of a (ii) P (1 X3) (ii) P (2 X<5) (ii) E (X)
2.
The p.d.f of a discrete random variable x is given by P (X=x)=kx for x=12, 13, 14. Find the value of the
constant k.
3.
A committee of 3 is to be chosen from 4 girls and 7 boys. Find the expected number of girls on the
committee, if the members of the committee are chosen at random.
The discrete random variable has p.d.f given by P (X=x) =kx for x=1,2, 3, 4, 5, where k is a constant. Find
the E (X)
4.
55
5.
Independent random variables X and Y are such that E (X)=4, E (Y)=5, Var (X)=1, Var (y)=2, find (a) E
(X+Y) (b) E (5x+6y) (c) Var (X+Y) (d) Var (3X+2Y) (e) Var (3X-5Y)
9 Independent random variables X and Y have the following probability distributions
X
P (X=x)
0
1/4
1
3/4
Y
P (Y=y)
0
2/3
1
1/3
Find (i) E (X) (ii) E(Y) (iii)E(XY) and verify whether X and Y are independent(iv) Var (X-Y)
6.
Two independent random variables X and Y are such that E (X)=3 and E(X2 )=12
E (Y)=4 and E (Y2 )=18, find the values of (a) E (3X-2Y) (b) Var (2X+Y) (c) Var (2X-Y)
7.
X has a p.d.f given by P (X=X)=kx for x=1,2,3,4. Find (a) k (b) E (X) (c) Var (X) (d) Var (3X)
8.
Two independent random variables X and Y are such that E (X2 )=14, E (Y2 )=20 Var(X)=10, Var(Y)=11,
find the values of (a) E (3X-2Y) (b) Var (2X+Y) ( c) Var(5X+2Y)
9.
The p.d.f is such that P (X=x)=kx for x =0, 1, 2, 3,…n find the value of k
10.
Given that X is a discrete random variable with p.d.f as P (X=x) = p x for x = 1, 2, …  . Where
p 1
11.
distribution.
12.
(a)
(b)
(c)
Find the value of the constant p
Find the E (X)
Find the coefficient of skewness and kurtosis and comment on the nature of the probability
distribution.
(i)
Distinguish between a probability distribution and a frequency
(ii)
The p.d.f of a discrete random variable X is given by P (X=x) = mx
for x assuming the values -1, 4, -9, 16, -25, 36…10,000.
Find the value of the constant m.
The p.d.f of a discrete random variable X is given as P (X=x) =K
where K is a constant. Find the value of the constant K
56
2x
[40 p x ] for x= 0,1,2, …40
x!
13.
A box contains two red and two white balls. Balls are drawn at random without replacement. Let X be the
random variable, the number of white balls chosen before the first red. Find E (X)
14.
The p.d.f of a discrete random variable X is given as P (X=x) =Kx for x = 1, 3, 6, 10, 15 . . . 465.
15.
(i)
Find the value of the constant K
(ii)
Find P (1<X<465)
 1 
The p. d. f of a discrete random variable X is given as P (X=x) = K  2
for X= 1, 2, 3, …100.
 x  x 
Find P (1  X  50).
17.
The p.d.f of a discrete random variable X is given as P (X=x) =Kx for x = 1, 7, 18, 34, . . . 970.
(i)
Find the value of the constant K
(ii)
18.
Find P (X=34)
The jont probability distribution for the variables X an Y is given as f (x, y) =
xy  x 2 y 2
for X=1,2,3,4 and
480
y=1,2,3.
(i)
(ii)
(iii)
(iv)
(v)
19.
Determine the marginal density functions for the variables X and Y.
Find P (X+Y>3), P (X-Y= equal to even number)
Cov (X, Y) and comment on the nature of X and Y
Var (x) and Var (Y)
Determine the correlation coefficient between X and Y
The joint p.d.f of a continuous random variables X and Y is given as f (x, y) =
e   x  y 
2
for x>0 and y>0
Find the marginal density functions of X and Y and hence or otherwise tell whether Xand Y are independent
or not
20.
Given f(x,y) =M7xy2 with 0<x<1 and 1<y<2 as joint probability functions for the variables X and Y .Find
(i)
(ii)
(iii)
(iv)
(v)
P (0.2<x<1)
The mean, mode of f (x) and f(y)
The mode of f (x, y)
P (x+y=1)*
P (x+y=0.2)*
57
21.
Define a continuous random variable random. The following is a p.d.f of a continues random variable x
f(x)
=
Kx2 +1 for 0<X<2
=
0
otherwise
Find the value of the constant K and hence mean and median of the distribution.
22.
A discrete random variable X has its p.d.f given as if P (X=x) = n c x pxqn- for X=0,1,2…n, where p+q=1
Using the concept of moment generating function, find its mean and variance
23.
A discrete random variable X has its p.d.f given as if P (X=x) = qxp, for x =0, 1, 2,…. where p+q=1.Using
the concept of a moment generating function, find its mean and variance
24.
A discrete random variable X has its p.d.f given as P (x) 
x e 
x!
for x = 0,1,2…
 .Using the
concept of a moment generating function, find its mean and variance
25.
26.
A continuous random variable z has it p.d.f f (z) =
1
2
e
1
 z2
2
.  z  
a.
Using the moment generating function or otherwise, show that its mean is 0 its variance is 1
b.
Find the mode and median and comment on the nature of its distribution.
A continuous random variable X has its probability distribution given as
1 b  a  for

f (x)= 
0 otherwise

a X b
Determine its mean, variance and median.
58
CHAPTER 16:SOME COMMON PROBABILITY DISTRIBUTION
16.1 Introduction
There are common probability distributions applicable in our daily life. Some of the probability distributions presented in
this chapter are, the binomial distribution, the Poisson distribution, the geometric distribution, the hyper geometric
distribution, the Uniform distribution and the normal distribution.
16.2 Binomial Distribution
A binomial distribution is a kind of a discrete probability distribution, which occurs in an experiment having the following
features.




The Experiment must have exactly two possible outcomes i.e. either success or failure
The number of trials for that particular experiment must be fixed
The probability of success “P” and that of failure ”q” must be the same in every trial
The trials must all be independent
Examples of binomial experiments, which leads to a binomial distribution
(i)A fair coin is tossed three times and the numbers of heads that can occur are considered
(ii)A die is thrown ten times and the number of times you obtain a six is considered
16.2.1 The probability density function of a binomial distribution
Suppose we have tossed coin n times, and our interest is in the number x of heads that may appear. In other words we want to
have n-x tails out of the n trials. But we know that the number x of heads and n-x of tails may appear in different way in
terms of the position they occupy. For instance you may have TTTHHH…T or THTHTH…T, e.t.c. Using our knowledge on
permutation we can know the number of such possible different arrangements of n objects out of which r are alike and the
remaining n-r are also alike. This is
n!
=
x!(n  x)!
nCx. But we know that the trials are independent and thus the chance for the occurrence of one such
arrangement is p x q n x . Hence, the probability of obtaining x number of heads is nCx p x q n x .
The p.d.f of a binomial distribution is thus established as P (x)
ncr p x q n  x

= nCr 0 otherwise

Example1
The probability that an animal is properly fed in a certain village is 0.6. Find the probability that in a randomly selected
sample of 8 lambs there will be exactly 3 who are well fed
Sol:
8C3(0.6)3(0.4)5=0.12
In a binomial distribution E (X)=np and Var (X=) npq
The proofs for the two are left as an exercise to student.
59
Example.2
A coin is tossed 6 times. Find the expected number of heads and their variance
Sol:
E (X)=np=6x1/2=3 and Var (X)=npq = 6x1/2x1/2 = 1.5
16.3.The Poisson distribution
It sometimes happens that a random variable under the binomial distribution is observed at an infinitely level. That is the
number of trials becomes extremely large. Under such circumstances the limit of the binomial probabilities can be found as n
approaches to infinity. Usually we do consider the mean occurrence (expected value of x) of such events within a specified
interval of time and space. Events of that nature are said to follow under the so-called Poisson distribution. For example
when we think of the distribution of the number telephone calls within a specified time say one hour at a particular calling
centre. Or a number of car accidents in a month at particular junction of a road. These are examples of the so-called Poisson
distribution. The p.d.f of a Poisson distribution can be derived as follows from the binomial distribution.


and. q= 1 
n
n
n x
x
n
x
 n(n  1) (n  x  1)(n  x)!    
 n(n  1) (n  x  1)     x   
 x 1    lim
1  
lim
1


 as n  
x!(n  x)!
n.n.n(n)

n  n

 n  x!  n 
as n  
We have P (x=) nCr p x q n x and the mean value “  ” =np implying that p=
t (  )
x
x 
1  1   x 1   
n
 1
where t = 
 lim 1  1   1   1    lim1  

x!  n  n   n n  n 
 t
as n  
as t  

x e 
for x = 0,1,2… 
x!
Where  is the mean of the distribution also called the parameter of the distribution.
If x is distributed in this way, then we write X ~ Po (  )
Example.1
Given that X follows a Poisson distribution with parameter 1.5, find the probability of having X=2 or 3
Sol:
P (X=2or3)=P (X=2)+P (x=3)

1.5 2 e 1.5 1.5 3 e 1.5
+
=0.25+0.13=0.38
2!
3!
60
Example .2
A coin is tossed 300 times .Use the Poisson distribution to approximate the probability of having 50 number of
heads.
Sol:
Since we have so many number of trials the binomial probabilities can indeed be approximated b the Poisson
distribution. We know  =np=100x1/2150. Hence, the p.d.f will be given by
P (x) 
50 x e 50
5010 e 50
. P (X=50) = 
=5.1x10-12
x!
10!
16.4 Geometric Distribution
A geometric distribution comes into being when we consider the number of failures preceding the 1st success in a sequence of
independent trials with a fixed probability “p” of success in each trial. So if the random variable X dente such number of
failures before the 1st success we shall have the p.d.f of X as p (x)=qxp for x =0, 1, 2,…. Where p+q=1
Exercise
a.
b.
Show that the given formulae is indeed a p.d.f
Establish the mean and variance of a Geometric distribution
Example
In a certain game A throws a die with a view of having a number 2 appearing. The exercise continues until when the
die shows up the number 2. Find
(i) The chance of having the exercise carried in 3 trials
(ii) The expected number of failures before the 1st success
Sol:
(i)
P (x )= qxp =(5/6)3(1/6) =125/1296=0.1
(iii)
It can be shown from the given exercise that for a geometric distribution E (X)=q/p. Hence the
expected number of failures =q/p =(5/6)(/1/6) =5
16.5 Hyper geometric Distribution
We had some examples of probability problems where sampling is made without replacement out of a population of (a+b)
observations in which a of them are of one type and the other b are of another type. If our interest is on the number x items of
category a to be included in the sample of size n, then that is a hyper geometric distribution. Examples of such a distribution
is like “the variable X denoting the number of female students to be included in a sample of 9 students selected from a class
consisting of 10female and 8 male students.
61
The p.d.f of a geometric distribution is given by
  a  b


  
x
n

x
 for
  
 a  b
 

f(x) =  
n



0 otherwise

x  0,1,3, n
The mean and variance of a hyper geometric distribution is given as
E (x) =
na
nab  a  b  n 
and var (x) =


ab
( a  b) 2  a  b  1 
9  10
=5girls, while the expected number of males
10  8
98
9  10  8  10  8  9 
is
=4males. The variance of the distribution is

 =1.2
10  8
(10  8) 2  10  8  1 
So in the given distribution the expected number of girls in the sample is
16.6 The Uniform Distribution
If, for, instance it is known that the probability for any value of x to occur is constant then we have what is known as a
uniform distribution. In case of continuous random variable with a  x  b the uniform distribution is given as f (x)=
for a  x  b
0 otherwise
Proof:
k for a  x  b where k is a constant and 0 otherwise
We know that f (x) =
x b
By the requirement,

x b
f ( x) 
xa
1
 k  1  k (b  a)  1  k  b  a
x a
Similarly for a discrete random variable the uniform distribution is given as
f (x)=
1
x b
x
for a  x  b
xa
0 otherwise
62
1
ba
16.7
The Normal Distribution
The normal distribution is an example of a continuous probability distribution. It is given as f (x) =
-   x   where  is the mean of the distribution and 
2
1
2
2
e  ( x   2 ) for
is the variance of the distribution. When X is distributed in
this way we write X ~ N (  ,  ).
2
16.7.1 Main features of the Normal distribution and the test for normality
1. It is a bell shaped and symmetrical about x =  . This means the coefficient of skew ness is zero.
2. Its frequency curve is a mesokurtic curve with the coefficient of kurtosis being zero. Normally the maximum
value of f (x) occurs when x=  and is given as f (x) =1/  2
3. Approximately 95% of the distribution lies within 2 standard deviation of the mean whereas 99.7% of the
distributions lies within 3 standard deviations.
The Sketch
16.7.2 The standard normal distribution and reading from the normal distribution
When  =0 and  2 =1 then the p.d.f of a normal distribution becomes f (z) =
1
2
e
1
 z2
2
. This kind of normal
distribution is called a standard normal distribution, and we write as Z~ N (0,1). Consider say P (Z<1.8). To find P
1.8
(Z<1.8) it would d require us to evaluate

0
1
1
e  z 2 dz which is possible but very tedious. For that reason
2
2
statisticians have taken trouble to construct several of such definite integrals for an ease of finding probabilities
under the standard normal curve. Using the standard normal curve (Table ) P (Z<1.8) can easily be found as 0.0790
16.7.3 Using the standard normal Tables for any normal distribution
In order to use the standard normal tables for any normal distribution, we standardize a given random variable say X. So if
X~N (  ,  2 ) then Z=
X 

. Suppose a random variable X denotes an examination score in mathematics by a student at a
certain colleague, where it has been found that the score is normally distributed with mean 70 and variance 64.What is the
chance of having a student scoring between 60 and 90. In order to use the standard normal tables we must standardise the
variable X into a standard variety Z. This can be done as follows: -
63
 60  70 X   90  70 
  P 1.25  Z  2.5 . This means finding the P (60<X<80)


x
8 
 8
We have P (60<X<80) = P
under the given distribution is the same as finding P (-1.25<Z<2.5) under the standard normal distribution .So P (60<X<80)=
P (-1.25<Z<2.5)=0.7999
16.7.4 The biviriate normal distribution
It is worth at this point to mention about the distribution of two independent normally distributed random variables x and y as
we shall frequently encounter this distribution in inferential statistics.
2
Let X and Y are any two independent normally distributed random variables with means x ,  y and variances  x2 and  y
respectively. Then the following is true: 1.
V = X+Y is also normally distributed with mean 
2.
V =X-Y is also normally distributed with mean 
x
x
+ 
- 
and variance  x2 +  y
2
y
and variance  x2 +  y
2
y
Exercise 16
1.
An unbiased coin is tossed 4 times. Find the probability of having
3 heads
2.
Assuming that two crossed animals of type A1 and A2 are equally likely to produce either an animal similar
to A1 or similar to A2. Find the probability that in a group of five animal calves there will be more animals of
type A1.
3.
State the conditions under which the binomial distributions occur.
4.
Of Sokoine University students, 60% have their ages between 20 and 26 years. From a sample of 10Students
chosen at random find the probability that (a) only 3 have ages between 20 and 26 (b) more than 8 have
ages between 20 and 26
5.
X is r.v such that X Bin (n, p). Given that E (X) =2.4 and p = 0.3, find the standard deviation of the
distribution.
6.
Show that if P (X=x) = n c x pxqn-x x= 0, 1, 2, …n, then E (X) = np
7.
The probability that a target is hit is 0.3. Find the least number of shorts, which should be fired if the
probability that the target is hit at least once is greater than 0..95.
8.
There are about ten multiple-choice questions. The probability for one to guess a correct answer is 1/3. Find
the probability of one having (a) 4 answers correct. (b) All answers correct
9.
Of the articles from a certain production line, 10% are defective. If a sample of 25 articles is taken, find the
expected number of articles and the standard deviation.
64
10.
The mean number of bacteria per millilitre of a liquid is kwon to be 4. Assuming that the number of bacteria
follows a Poisson distribution, find the probability that in 1ml of liquid, there will be (a) No bacteria
(b)
4 bacteria
(c) less than 3 bacteria
11.
The random variable X follows a Poisson distribution with standard deviation 2. Find P (X 3)
12.
If Z N (0,1) find (a) P (0.829< Z <1.843) (b) P (-2.05<Z<0)
(c) P(Z < 1.78) ( d) P( Z >2.326 )
13.
If Z N (0,1) find y if (a) P (Z<y) =0. 506 (b) P (Z<y) =0. 891 (c) P (Z>y) =0.001 (d) P(Z >y)
=0.00122
14.
15.
If XN (100,80), find (a) P (85<X<112), (b ) P(105< X<115) ( c) P( X-100 < 80 )
The heights of boys at a particular age follow a normal distribution with mean 150.3 cm and standard
deviation 5cm. Find the probability that a boy picked at random from this age group has height
(a) Less than 153 cm, (b) less than 148 cm,
(c) more than 158 cm
(d) between 147cm and 149.5cm.
16.
The marks in an examination were normally distributed with mean  and standard deviation . 10% of the
candidates had more than 75 marks and 20% had less than 40 marks. Find the values of  and .
17.
The lengths of rods produced in a workshop follow a normal distribution with mean  and variance 4. 10%
of the rods are less than 17.4 cm long. Find the probability that a rod chosen at random will be between 18
and 23 cm long.
18.
If XN (,2) and P (X< 35)=0.2, P (35<X<45) = 0.65. Find  and .
19.
The lengths of certain items follow a normal distribution with mean  cm and standard deviation 6cm.It is
known that 4.78% of the items have a length greater than 82 cm. find the value of the mean 
20.
A discrete random variable X follows under the uniform distribution with values ranging from 2,6,12,20….
600. Find P (12<X<462)
21.
A discrete random variable X follows under a uniform distributions with a<X<b. Establish the mean and
variance of the distribution
22.
Show that the mean and variance of the hyper geometric distributions are given by E (x) =
(x) =
na
and var
ab
nab  a  b  n 


( a  b) 2  a  b  1 
23.
In a certain college a student is allowed to sit for an exam any number of times until when he passes. If the
chance that a student passes the exam is 0.4. How many times do you expect a student to sit for such an
exam?
24.
Six students are sitting for an examination 8 times simultaneously. The chance that a student passes an
examination is 0.1. What is the chance of having 4 passing at least 3 examinations?
65
25.
In a certain colleague, the chance that a female student passes an exam is 0.4 while the chance for a male
student to pass is 0.7. Find the chance of having 4 students passing an exam in a class of 10 males and 8
female students sitting for the examination.
26.
In number 27 above what can be the chance of having the same number of sex passing the examination.
27.
Show that the variance for a Poisson distribution is the same as the mean. A random variable X follows
under the Poisson distribution with parameter 7. Using the normal distribution, approximate P (X>7)
28.
By considering some properties of two independent random variables, establish the joint p.d.f of the sum of
two independently distributed Poisson variates with means 1 and  2 respectively.
29.
Each of the two faces of a die is painted with colours, yellow, green and blue respectively. The die is thrown
ten times. What is the chance of having 3yellow faces, 2green faces and 5 blues faces in the whole
exercise.[Hint: This is an example of a multinomial distribution]
30
A random variable X¬ N(100,80) and a random variable y¬ N(62.5,50).Find the chance that the difference
between X and Y is not larger than 20.
31
By considering the properties of independent random variables, establish the joint p.d.f of the sum of two
independent binomial varieties with parameters (n1,p) and (n2,p) respectively.
32.
Of SUA students 60% have their ages between20yrs and 26yrs. 30% between
26 yrs and 30yrs and 10% above 30yrs. A random sample of 10 students is selected. Find the probability that
2, 5 and 3 students in the categories mentioned will be selected.
66
CHAPTER 17: SAMPLING DISTRIBUTIONS
17.1 Random sampling
We discussed measures of central tendency and variability in a population when we learnt about frequency distributions and
probability distributions. We have these measures as well in a sample. When these measures are computed based on sample
observations, we refer to them as statistics. In general, a statistic is any characteristic, which is derived from a sample. If a
sample of 10 students taking introductory statistics is selected and their scores obtained, their average score will be a statistic.
Similarly the mode, median and the variance of their scores will all be statistics. As opposed to the case in a sample, these
measures are often known as parameters as far as the study of population’s distribution is concerned. Statistics are meant to
represent the unknown parameters of a population.
Since we can take more than one sample of items from a population, it follows that samples also have a certain distribution,
with specific mean and variance. If we consider the various sample means from the different possible samples to be taken
from a certain population, then we are thinking of the sampling distribution of the mean. As a matter of fact we can find
sampling distributions of all the statistics, i.e. mode, mean, median variance etc. However for the purpose of our study we
shall centre our discussion on the sampling distributions of the mean in this section and of the variance in section 20.
Sampling can be done with or without replacement from a finite or infinite population. However, in practise sampling is done
either with replacement from a finite population or without a replacement but from an infinite population. The two
procedures constitutes of what is practically known as random sampling.
17.1.1Sampling with replacement
If we have X 1 , X 2 , …X n , random samples of n independent observations from a population with mean  and variance
 2 , then E ( x ) =  and Var ( x )=
2
n
if sampling is done with replacement. We can demonstrate the validity of these
statements with the following example. Suppose a discrete random variable X has probability distribution P (X=x) as shown
in the table below.
Table 17.1
X
0
1
P (X=x)
0.8
0.2
(i)
2
Show that  = 0.2 and var (X) =  = 0.16
(ii)
By considering all possible samples of size 2 find the probability distribution of, the mean of such samples.
Verify that E ( x ) =  and Var ( x )=
2
2
67
Sol:
(i)  = E ( x ) = 0x0.8+1x0.2 = 0.2 and Var ( x ) = 02x0.8+12x0.2 –0.22 = 0.16
Possible samples are
S/N
Samples
1
2
3
4
(0,0)
(0,1)
(1,0)
(1,1)
Table 17.2
Sample
mean
0
0.5
0.5
1
Probability
0.8x0.8 =0. 64
0.8x0.2 =0.16
0.2x0.8 =0.16
0.2x0.2 =0.04
The probability distribution for x is therefore
X
P(X =x)
Table 17.3
1
0
0.5
0.64
0.32
0.04
E ( X ) = 0x0.64 +0.5x0.32+1x0.04 =0.2 which is the same as  -the population mean
Var ( X ) = E ( X ) 2 -(E ( X )) 2. But E ( X ) 2 = 02 x0.64 + 0.52x0.32 +12x0.04 =0.12
2
Therefore Var ( X ) = 0.12-0.2
= 0.08
= 0.16/2 which is the same as
2
n
the population Variance divided by
the sample size
Exercise
Given the following distribution
X
P (X=x)
0
0.2
Table 17.4
1
0.1
2
0.5
3
0.2
(a)
Find the mean  and the variance  2 .
(b)
By taking all possible samples of size 3 verify that E ( x ) =  and Var ( x )=
68
2
n
17.1.2 Sampling without replacement.
If sampling is done without replacement then the following is true
E ( x ) =  and Var. ( x )=
( N  n)  2 
= 1 
( N  1) n 
n  s2

N n
However when N is large, meaning that the population considered is an infinite one, the formulae for the variance of
the sample means becomes the same as in the case of sampling with replacement. Hence Var. ( x )=
2
n
.
17.2.The sampling distribution of the mean-I- (Normal distribution)
Theorem
“If x 1, x 2,. x N are the means of the random samples of size n taken from a normal distribution where X ~ N (  ,  2 ) then
the distribution of x is also normal with mean  and variance  2 /n”
The intuition behind the theorem should be obvious. We first know that X ~ N (  ,  2 ) and that x is a sum of about n
normal variates. From our knowledge of biviriate/multivariate normal distribution, x should also follow under normal with
the established mean and variance.
Example
A random sample of size 10 is taken from among the students of whose mean score in introductory statistics is 70 with a
variance of 36. Find the probability that the sample mean is less than 67.
Sol:
We are required to find P ( x <67). This cannot be done unless we know about the distribution of the sample mean x . But
from the above stated theorem, we know that x ~(  ,
Accordingly, P ( x <67). = P (
x
2
n

2
n
)
67  70
) = P (Z<-1.580) = 0.0571
36
10
Exercise
1.
In the example above what would be the chance that the sample mean doe not exceed the population mean
score by 5
2.
In a certain college the female scores in mathematics is normally distributed with mean 64 and
variance 16, while the male’s score in the subject is normally distributed with mean 70 and
variance16. A sample
69
of 4male student and 5 female students are taken. What is the chance that the four males will have
their score better than the five females?
Show algebraically that “If x 1, x 2,. x N are the means of the random samples of size n taken from a normal
3
distribution where X ~ N (  ,  2 ) then the distribution of x is also normal with mean  and variance
 2 /n”
17.2.1 The central limit theorem
If x 1, x 2,
...
xn

are the means of the random samples of size n taken from any distribution with mean
 2 , then for large n, the distribution of the sample mean x is approximately normal and x ~ N (  ,

and variance
2
n
). Note that in
practice n is taken to be at least 30.
Example
A random sample of size 40 is taken from an unknown distribution of whose mean is known to be 8 and variance 4, find the
probability that the sample mean exceeds 6.
Sol:
We do not know whether the population follows under normal distribution or not. However under the central limit theorem
the distribution of x shall still follow under normal with mean  , and variance
2
n
since the sample size taken is very
large.




 x   7 8

P ( x < 6) = P 
 = P ( x < 2.828) =0.4977
2
4




32 
 n
Exercise.
A sample of size 30 was taken from a poison distribution with parameter   3
Find the probability that the sample mean follows between 4 and 7. (Hint consider 4.5 and 6.5)
17.3 The distribution of the sample mean II- (The student’s t distribution)
The student’s t distribution is also one of the sampling distributions of the mean. It is a continuous probability distribution.
The distribution happens when the population variance  2 is unknown. Recall that, if X~N (,   2 ), then
x~
N
(  ,  2 /n). If it happens that we only have the sample variance s2 which is in practise the case, then x ~t (  ,s2/n ). W. S.
Gosset, writing under the pen name of Student, firstly introduced the distribution. It was later on proved by R.A. Fisher that
the distribution has a symmetrical, bell shaped, but non normal distribution
70
 t2
Which is of the form f (t) = c1 
v




 v 1 


 2 
for
-   t   , where the parameter v is called the number of degrees of
freedom. Normally v= n-1 where n is the size of the sample taken. So in terms of n the distribution may be written as f (t)

t2 


c
1

= 
n

1


n
 
2
Example. 1
A sample of size 9 of the heights of the student in a certain school was taken from a population with N (1.5,  2 ). The
sample variance was found to be 0.63. What is the chance that the mean was less than 2?
Sol:
We are supposed to look for P( x <2). Upon standardizing we get p (
x
2
s
n
Since  2 is unknown, the quantity
x
s2
n

2  1.5
)
0.63
9
shall follow under the student’s t with n-1 degrees of freedom. Accordingly P
( x <2) = P (t<1.900). The meaning here is the same as for the normal distribution that we have to evaluate the definite
integral
1.9
1.9


 f (t )dt  
 t2
c1 
8




9
 
2
dt . However we have statistical tables constructed for this purpose and we can easily read the
value for the above definite integral at 8 degrees of freedom as 0.050
(Refer to the student’s t tables at the back of the book)
Example 2
The performance of students in a certain class was found to have a mean of 60. A sample of
is the chance that their score had exceeded 55 if their variance score was 64.
5 students was selected, what
Sol: We are looking for P ( x >55). Upon standardizing x we have
P ( x >55). = P (
x
s2
n

55  60
), which is P (t>-1.118).
100
5
From the student’s tables at 4 d.f we have P (t>-1.118) =1-P (t<-1.118)=1-0.1 = 0.99
At this point the reason that we need the student’s t distribution should now be very clear. However, the central limit theorem
does also apply to the student’s t-distribution. That the distributions of the sample means under the student’s t-distribution
will eventually follow under the normal distribution if n is very large (greater than 30). In example 2 above what could be the
solution to the question if a sample of size30 was taken instead of size 5?
71
Since n is large, then by the central limit theorem the standardised variable of x would follow under normal distribution
anti-though the variance is not known. So we would have
P(
x
2
s
n

55  60
) = P (Z>-2.739) = 0.99
100
30
Exercise 17
1.
If X ~ N (200, 80) and a random sample of size 5 is taken from the distribution, find the probability that the
sample mean
(a)
(b)
is greater than 207
lies between 201 and 209.
2.
If X ~ N (200, 100) and a random sample of size 10 is taken from the distribution, find the probability that
the sample mean lies outside the range 198 to 205.
3.
If X ~ N (50, 12) and a random sample of size 12 is taken from the distribution, find the probability that the
sample mean
(a)
(b)
is less than 48.5, (b) is less than 52.3,
lies between 50.7 and 51.7.
4.
At a college, the masses of the male students are distributed approximately normally with mean mass 70kg
and standard deviation 5kg. Four male students are chosen at random. Find the probability that their mean
mass is less than 65 kg.
5.
A normal distribution has a mean of 40 and a standard deviation of 4. If 25 items are drawn at random, find
the probability that their mean is (a) 41.4 or more (b) between 38.7 and 40.7 (c) less than 39.5.
6. If a large number of samples, size n are taken from a population which follows a normal distribution with mean
74 and standard deviation 6, (a) find n if the probability that the sample mean exceeds 75 is 0.282, (b) find n if
the probability that the sample mean is less than 70.4 is 0.00135.
7. A normal distribution has a mean of 30 and a variance of 5. Find the probability that (a) the average of 10
observations exceeds 30.5, (b) the average of 40 observations exceeds 30.5, (c) the average of 100 observations
exceeds 30.5. Find n such that the probability that the average of n observations exceeds 30.5 is less than 1%.
8. The r.v. X is such that X ~ N (  , 4). A random sample size n, is taken from the population. Find the least n
such that P(    < 0.5) > 0.95.
72
9. Χ is the r.v. ‘the sample mean of samples, size 15, taken from N(30, 18)’ and Y is the r.v. ‘the sample mean of
samples, size 8, taken from N (20,16)’. Find the distribution of
(a) X – Y,
(e) 4X – 2Y.
(b) X + Y,
(c) Y - X
(d) 5X + 3Y,
10. In a certain country the heights of men are normally distributed with mean 175 cm and standard deviation 5cm
and the heights of women are normally distributed with mean 165 cm and standard deviation 6 cm. Find the
probability that the mean height of three women chosen at random is greater than the mean height of four men
chosen at random from the population.
11. A random sample X , X is drawn from a distribution with mean  and standard deviation  . State the mean
and standard deviation of the distribution of
(a) X1 + X2 , (b) X1 - X2 , (c) X.
A student’s performance is equality good in two subjects. The marks she might be expected to score in each
subject may be treated as independent observations drawn from a normal distribution with mean 45 and standard
deviation 5. Two procedures might be used to decide whether to give the student an overall pass. One is to
demand that she pass separately in each subject, the pass mark being 40; the other is to require that her mean
mark in the two subjects exceed 40. Find the probability that the student will obtain an overall pass by each of
these procedures.
12. In certain nation, men have heights distributed normally with mean 1.70 m and standard deviation 10 cm. Find
the probability that the average height of three men chosen randomly is greater than 1.78m and the probability
that all three will have heights greater than 1.83m?
For the nation, women have heights distributed normally with mean 1.60m and standard deviation 7.5cm. Find
the probability that a husband and wife have not more than 5 cm difference in heights and state the assumptions
that you have made in the calculation.
13. X1 and X2 are random variables such that X1 is normally distributed with mean 120 and variance 8 and X2 is
normally distributed with mean 150 and variance 22. A random sample of size 20 is taken from the distribution
of 3X1 + 4X2 . Find the distribution of the sample mean.
14. Random variables X and Y are such that X ~ N (100, 10) and Y ~ N (120, 20). Random samples of size 50 are
taken from each distribution. Find the probability that the sample from the distribution of Y will have a mean
which is at least 21 more than the mean of the sample from the distribution of X.
73
15. (a) If X and Y are independent random variables with means  x,  y and variances  x2 ,  y2. Respectively,
show from first principles that the mean and variance of aX + bY are a  x + b  y and a2  x2 + b2  y2
respectively where a and b are constants. (b) The diameters x of 110 steel rods were measured in centimetres
and the results were summarised as follows:
 x  36.5 ,
x
2
 12.49
Find the mean and standard deviation of these measurements.
Assuming these measurements are a sample from a normal distribution with this mean and this variance, find the
probability that the mean diameter of a sample of size 110 is greater than 0.245 cm.
16. A random sample of size 100 is taken from Bin(20,0.6). Find the probability that (a) X is greater than 12.4 (b)
X is less than 12.2, where X is the sample mean.
17. A random sample of size 30 is taken from Po (4). Find (a) P(X < 4.5),
(b) P(X > 3.8), (c) P(3.8 < X < 4.5).
18. If a large number of samples, of size n, are taken from Po(4.6) and approximately
means are less than 4.005, estimate n.
2.5% of the sample
19. If a large number of samples, of size n, are taken from Po (2.9) and approximately 1% of the sample means are
greater than 3.41, estimate n.
20. If a large number of samples of size n are taken from Bin(20,0.2) and approximately 90% of the sample means
are less than 4.354, estimate n
21. The standard deviation of the masses of articles in a large population is 4.55 kg. If random samples of size 100
are drawn from the population, find the probability that a sample mean will differ from the true population mean
by less than 0.8 kg.
22. Two red balls and two white balls are placed in a bag. Balls are drawn one by one, at random and without
replacement. The random variable X is the number of white balls drawn before the first red ball is drawn.
1
, and find the rest of the probability distribution of X.
3
5
(ii) Find E(X) and show that Var(X) = .
9
(i) Show that P(X = 1) =
(iii)
The sample mean for 80 independent observations of X is denoted by X. Using a suitable
approximation, find P(X>0.75).
23. The mass of coffee in a randomly chosen jar sold by a certain company may be taken to have a normal
distribution with mean 203g and standard deviation 2.5g.
(i) Find the probability that a randomly chosen jar will contain at least 200g of coffee.
(ii) Find the mass m such that only 3% of jars contain more than m grams of coffee.
(iii) Find the probability that two randomly chosen jars will together contain between 400g and 405g of coffee.
(iv) The random variable C denotes the mean mass (in grams) of coffee per jar in a random sample of 20 jars.
Find the value of a such that
P( C  203 < a ) = 0.95.
74
CHAPTER 18: ESTIMATION OF POPULATION PARAMETERS
18.1 Introduction
If from the observations in a sample, a single value is calculated as an estimate for unknown population parameter, the
procedure is known as point estimation. But if a population parameter is estimated within a given range falling between two
values we refer to such estimation procedure as interval estimation. Interval estimation is much preferred for decision
making/inferential statistics. If you may recall on our discussion on measures of central tendency we had seen that, to simply
regard an arithmetic mean as the representative of the entire population is much riskier than having two limits allowing for
the variation from the mean. This indicates the strength of interval estimation to the point estimates. However, the two type
of estimation are closely related with interval estimation requiring first, the point estimate of the population parameter. Our
discussion in this chapter will be centred on two population parameters, the mean “  ” and the variance “  2 ”
18.2 Point estimation
Following what has been discussed in the previous section, what then should be an estimator of a particular population
parameter? How should it behave? It is easy for the case of the population mean to say intuitively that its corresponding
estimate should be the sample mean. The same also should be the case with the population variance that its estimator should
be a statistic, which is more or less like the sample variance. But this is not always the case for some population parameters.
Instead of relying on such intuitive knowledge, a rather general procedure is employed to find an estimate of a certain
population parameter. We have several approaches such as the method of moments, least square estimates (which had
already been discussed) and maximum likely hood method. The treatment of such methods is beyond the scope of this
presentation. However a rigously students is strongly advised to consult other references.
The methods mentioned above are meant to satisfy some of the very important characteristics of a good estimator of a
population parameter, amongst which being that an estimator ˆ of a population parameter  should always be unbiased.
That is E ( ˆ ) =  . From our discussion in sampling distribution we can clearly see that the sample mean is an unbiased
estimator of a population mean. For we had proved that E ( x ) =  . Basically what is meant is, the thought estimator ˆ
should be the very parameter  under average consideration from all the possible samples.
But things are different for a sample variance. E (s2)  2 . That is the sample variance is not an unbiased estimator of the
population variance  2 . To have it as an unbiased estimator, we need to make some slight improvement, and usually we
have ns2/(n-1) as an unbiased estimator of the population variance  2 . However when n is very large (preferably n>30) the
estimator ns2/(n-1) simply reduces to s2.
Exercise
 x  x
2
1.
Show algebraically that E [ns2/(n-1)]=  2 where s2 =
2.
Find the unbiased estimators of the population mean and variance from the following sample values
8,9,10,11,12,15
75
n
7,
18.3 Interval Estimation of Population parameters
18.3.1 Introduction
As earlier said it sometimes preferable to give a range of values for which a certain population parameter may fall instead of
the vice-versa. Such an estimation of a population parameter is called interval estimation. Usually, interval estimation is
attained after specifying a certain degree of accuracy, called confidence interval. For instance 95% confidence interval for
the population mean  , means a and b such that P (a    b ) =95% =0.95.
18.3.2 Confidence Interval for 
18.3.2.1 Confidence Interval for  when population variance  2 is known
Let us consider a (1-  ) % confidence interval for  .
If XN (  ,  ) then for any n, x  (  ,
2
2
n
).
 / n Where Z N (0,1).
By standardization, we have z=( x -  )
We know that the central (1-  ) % of N (0,1) lies between  z .
2
Sketch
Therefore  z
 / n z
 (x-)
2
2
-Multiplying through out by  / n , we have;
 z  / n  ( x -  )  z  / n ;
2
2
Multiplying by –1 through out we have;
 z  / n  (  - x )  z  / n ;
2
2
Adding by x through out we have;
x  z  / n    x + z  / n ;
2
2
This may finally be abbreviated as x  z  / n
2
Example
A sample of 19 students taking agriculture general was taken to assess the performance of the students in introductory
statistics. The mean score of the students was found to be 30. Find a 95% confidence interval for mean score of all the
students taking agriculture general if it is known that the variance of their score is 9.5.
76
Sol:
We know that the area contained within 95% lies between  1.96 .
From the established formulae 95% confidence interval would be given by x  1.96  / n
Accordingly we have
30  1.96 (3.1)/ 19 =30  1.4=(28.6,
31.4)
The interpretation is that if hundred such samples were taken, 95 of them are likely to have their means between the two
limits.
Exercise
In one-incidence two persons agued about the mean age of the 1st year students at a certain Tanzanian college, one saying 23
and the other 26.To that effect, a random sample of 10 students was taken and revealed the mean age of 25 years. By
finding a 99% confidence interval for the population mean, comment on the dispute between the two guys if the overall
variance is known to be 16 years of age.
18.3.2.2 Confidence Interval for  , population variance  2 unknown
The situation described in part a is a very ideal situation. For it is wonderful as to how can you know about the population
variance and not about the population mean? Of-course one may treat the population variance under the current study to be
the same as the one found in earlier studies made on similar populations. This seems to be the only way of having a situation
described in part a above. In practice however, we rarely know about the population mean and variance.
So what happen when we know not of the population variance  2? From what we had learnt on the sampling distribution of
the mean, the statistic “ x ” may still fall under the normal distribution depending on the sample size. Usually if n is large x
will fall under normal in despite of the absence of the population variance other wise it falls under the student’s t.
Example .1
A random sample of 40 individuals indicated a mean height of 3 m and a variance of 0.2m.Find a 99% confidence interval of the
mean height of the population associated with the 40 individuals
Sol:
Anti-though  2 is unknown the sample mean x ¬N (  ,  2) because n=40>30
The area containing 99% of the entire distribution lies between - 2.575 and 2.575
Accordingly the 99%confidence interval for  would be3  2.575 (0.2)/ 40 =3  0.01= (2.99,3.01)
Example.2
A sample of size 5 of petty traders was taken to asses the monthly sales of the petty traders in Morogoro urban area. The
mean sales in hundred thousands shillings of the five petty traders were found to be 30, 26, 28, 35, 40. Find a 95%
confidence interval for mean monthly sales of the entire urban area by the petty traders.
77
Sol:
In this example we know nothing concerning the population variance just in the 1st example. Worse still the sample size
taken is very small. So of necessity x will follow under the student t with (5-1) d.f. Consequently the 95%confidence
 x  x 
=
2
interval would be x  t ( n 1),
ˆ / n
where ̂
2
2
From the data given x =31.8 and
n 1
=
x
2
 nx
n 1
2
=
ns 2
n 1
ˆ =5.67. The t values containing 95% of the distribution at 4 d.f are respectively
 2.776.
Thus 95% C.I for  is 31.8  2.776 (5.67) / 5 =31.8  7=(24.8,38.8).
18.3.3 Confidence Interval for the difference between two population means –(  1 -  2 )
18.3.3.1 Introduction
Sometimes our interest is focused on the difference between two population means. For example a medical doctor would
wish to know as to which medicine between two medicines A and B is effective in treating a certain disease. Or a district
education officer may be interested at comparing the performance of students in School A and School B. These situations
suggest the scrutiny of the difference between the two population’s means.
18.3.3.2 Confidence Interval for (  1 -  2 ) when 12 and 22 are known
2
Suppose X1N (  1 ,  12 ) and X2 N (  2 ,  21
). Lets consider say a (1-  ) % confidence interval
for  1 -  2. A (1-  ) % confidence interval for  1 -  2 will be given by
( x1 - x 2 )  Z 
2
  12  22 



 n1 n2 
Exercise
Prove the above result that a (1-  ) % confidence interval for  1 -  2. is given by
( x1 - x 2 )  Z 
2
  12  22 

 . (Hint: consider the sampling distribution of x1 - x 2 )

 n1 n2 
Example
Two samples of sizes 10 and 8 of animal weights were taken from two different populations of animals of type A and type B.
The sample means were respectively 5kg for the 1st sample and 7kg for the second. It is known that the weights for the two
populations are normally distributed with variances2.61 and 3.00 respectively. Find a 95% confidence interval for the
difference between the two populations and comment on the result.
78
Sol:
We know that the area contained within 95% lies between  1.96 . From the established formulae the 95% confidence
interval for  1 -  2. Would be
  12  22 
 .

 n1 n2 
( x1 - x 2 )  1.96 
 2.61 3 
 
 10 8 
= (7 -5)  1.96 
=2  1.6 =(0.4, 3.6)
Comment:
The above interval indicates that the difference can never be zero. Hence the difference is significant
at 5% level. In other words Animal of type B are heavier than those of type A as the samples
indicate.
18.3.3.3 Confidence Interval for  1 -  2 when 12 and 22 are unknown]
Just as it was the case in part a of the previous section, so also it is in part a of this
section. The described condition of part a is an ideal one. In most situations we do not
have both, the population means as well as the populations variance. However, if it
happens that the two population variances are unknown while both the samples sizes are
large enough [n1, n2 > 30] the sampling distribution of x1 - x 2 would still fall under the
normal distribution otherwise it will follow under the student’s t distribution.
Case 1: when 12 and 22 are unknown, but the samples sizes are large enough. [n1, n2 > 30]
Example.1
A random sample of 40 pigs was fed on diet A and another sample of 30 pigs were fed on diet B and the increase in weights
noted in each case. The mean increase in weight due to diet A was 13kg while for diet B was 15 kg and the variances were
respectively 12kg and 25.61kg. Determine a 95% confidence interval for the mean increase in weights of the two populations
and comment on the result.
Anti-though 12 and 22 are unknown the statistic “ x1 - x 2 “ shall fall under normal distribution. Because n1=, 40
and n2 =30 which are larger than 30
 12 25.61 

 . =2  2.1 = (-0.1kg, 4.1kg)
30 
 40
So a 95% C.I= (15 -13)  1.96 
Comment:
The two limits are of opposite signs, suggesting to us that the difference might sometimes be zero.
Hence we can conclude that the two diets do not have a significant difference in terms of their
contribution to the pigs mean increase in weight.
Case II when 12 and 22 are unknown, and the samples sizes are small. [n1, n2 < 30]
We have seen a case in which the two population variances are unknown but the sample sizes are large. How about when the
population variances are unknown and the sample sizes are not large?
79
When samples are small [mostly taken to be less than 30], the sample variances s12 and s 22 can not be used as estimates for
n1 s12  n2 s 22
. An assumption is thus
 and  . Instead we look for a pooled estimate of the population variance i.e. S =
n1  n2  2
2
1
2
p
2
2
made that the two populations are having the same variance. Further to that, the sampling distribution for “ x1 - x 2 ” follow
under the student’s t with ( n1  n2  2 ) d.f
As a result (1-  ) % confidence interval for  1 -  2. is given by
( x1 - x 2 )  t  n  n
1
2  2 , 2
 s 2p s 2p 
  .
 n1 n2 


(Hint: consider the sampling distribution of x1 - x 2 )
Example
The following were the student’s scores in MB201for random samples taken from two different degree programmes
at the Sokine University of agriculture in the year 2002
Table 18.1
B.SC. Agric Engineering
Sample size
Sample mean
n1 =12
Bachelor of Vertinary
Medicine
n2 =8
x1 =7.5
x 2 =5.9
Sample variance
s12 = 0.58
s 22 =3.1
Find a 95% confidence interval and compare the e student’s performance in the two-degree programmes
Sol
As it can be seen both the population variances are absent and that sample sizes are smaller than 30, so the above
formulae will be used.
From the student’s t tables we have t  n  n  2 , 2 = t18,( 0.05) 2 =2.101
1
2
n1 s12  n2 s 22 12(0.58)  8(3.1)
 1.76
And from the data given we have S =
=
n1  n2  2
12  8  2
2
p
 1.76 1.76 

. =
8 
 12
Therefore 95% C.I for  1 -  2 = (7.5 -5.9)  2.101 
1.6  1.3 = (0.3
2.9)
It is thus evident that the performance of the Engineering students is significantly better than that of the vertinary students in
the subject.
80
Exrcise:18
1.The score, x of 63 seniors on a graduate Record Examination showed
 x =34,540 and  x
2
=19,480,000. Calculate the
unbiased estimate of the population variance. What is your estimate of σ?
2. If two samples of size 10 and 15 are drawn from the same population and have variances of 2.40 and
2.70 respectively, what is an unbiased estimate of the population variance?
3. ( a) When is an estimator q said to be an unbiased estimator of the population parameter Q.
(b) From a lot of capsules the contents of five samples of four each were weighed in milligrams. The
results follow:
Sample no.
1
62.3
61.9
63.1
62.5
62.5
2
62.0
61.8
61.9
62.3
62.1
3
62.9
62.0
61.9
62.5
61.8
4
62.8
62.2
62.6
61.6
62.9
Calculate the variances of each sample and from these variances estimate the population (lot) variance of the weighted
contents of capsules.
4.If the mean age at death of 64 a man engaged in a somewhat hazardous occupation is 52.4 years with a standard deviation
of 10.2 years, what are the 98% confidence limits for the mean age of all men so engaged?
5.150 bags of flour of a particular brand are weighed and the mean mass is found to be 748g with standard deviation 3.6 g.
Find (a) 90% (b) 95% (c) 98% confidence intervals for the mean mass of bags of flour of this brand.
6.A special aptitude test was given to 26 law school freshmen. The results showed a mean score of 82.0 and a variance of
49.00. Set up the 90 percent confidence interval for the mean score of all law school freshmen.
7.In no 6 above, suppose the population variance was known to be 52, find a90 percent confidence interval.
8.Four rats were fed a special ration during the first three months of their lives. The following gains in weight (grams) were
noted: 55, 62, 58, 65. Find a 99% confidence interval for the population mean of rats fed with such a special ration.
9.Two laboratory assistants make 10 observations each on the same galvanometer for the same experiment. The average
readings were 61 and 58 with variances of 0.60 and 0.40 respectively. Find a 95% confidence interval.
10.Ten pupils from one school have a mean I.Q of 108 and a variance of 60; 17 pupils from another school show a mean of
114 with a variance of 80. Find a 99% confidence interval and comment on the significance of the difference in I.Q between
the pupils of the two schools.
11.The total nitrogen (N) content (mg per cc) of rat blood plasma was determined for a group of 60 rats of age 50 days and
for a group of 70 rats of age 80 days. The mean N content for the first group was 0.983 and the variance was 0.00253; for the
second group the corresponding statistics were 1.042 and 0.00224, respectively. Find a 99% confidence interval and
comment on whether N content vary with age.
81
12. A systolic blood pressures of a group of 60 patients showed x =140, sx =10. A second group of 60 showed y =145, and
sy =13. Find a 95% confidence interval for the mean difference between the two groups.
82
CHAPTER 19: HYPOTHESIS TESTING
19.1 Introduction
A hypothesis is a statement, which can either be true or false. It is a supposition about a certain characteristic of a population.
The reason that we have hypotheses in statistics is due to the fact that in oftentimes our investigation are based on sample
data rather than the entire population. In relying upon the sample data we will always be uncertain about the size/nature of
the population parameters. Hypothesis testing is a general procedure of reaching to a statistical decision basing on the
sample statistics. The already discussed interval estimation, as we hade seen may also serve the same purpose in decisionmaking. When making a statistical enquiry, we often put forward a hypothesis concerning a population parameter. For
example,
 The mean score of the students in Group 1 is 50%
 The mean height of 15 year-old girls is 1.62cm
These hypotheses are called null hypotheses and are denoted by H0.
In order to test the validity of H0, we consider the observations made from random samples taken from the populations and
perform a statistical test. If the statistical test indicates that we should reject the null hypothesis, H 0, we do so in favour of an
alternative hypothesis, denoted by H1. But such a statistical test is usually not perfect. It may reject the correct null hypothesis
and accept the wrong alternative hypothesis and the vice-versa. For that reason a statistical test for a population parameter is
done after having specified the level of significance  , i.e. the chance for the null hypothesis to be rejected. In other word
the probability of accepting the null hypothesis is 1-  . If it happens that the true null hypothesis is rejected then we have
committed the so-called type I error. On the other hand when the wrong null hypothesis is accepted then we will have
committed the so-called Type II error.
19.2 The test statistic
Suppose, we wish to investigate whether the mean of a normal population is 50 or not.
The hypotheses to be stated would be
H0:  = 50 (i.e. the population mean is 50)
H1:   50 (i.e. the population mean is not 50)
But how can we know that the mean is 50 or not? It sounds miraculous that we should be able to know about that. The only
way possible is by having extra information regarding the population. Suppose a value was picked at random from the
population and found to be 40. Can we use this random observation to decide whether the mean is really 50 or not? Indeed
we can. Since we know that the population mean is ideally the representative of the entire population, then under normal
circumstance this value should be close to the suggested mean  otherwise the suggested mean is not the true mean. To
achieve our target it is precise to standardize the value X. The standardised value Z is then known as the test statistic. In this
X  50
example we would use z =

.If z is close to zero, i.e. z is small, we accept that the sample value could have
been taken form a population with mean 50 and we do not reject H0 otherwise we reject H0 and conclude that the population
mean is not 50. However is the used statistic reliable? As a matter of fact, it is very risk to use just a single value as we have
done as our test statistics. The quality of the test statistic will depend upon a number of factors among which being the
number of observations it can involve. For that matter, in practice, statistics such as sample mean and sample variance, which
involves more than one observations. In general, a good test statistic is the one, which is an estimate of the population
parameter under consideration.
83
19.3.Critical region and critical values
In the preceding section we said of rejecting or accepting a null hypothesis basing on whether the standardised value is close
to zero or not. But how close should it be? To get rid of such a subjective conclusion we need to specify the limits on either
side of the population parameter over which the value of Z can be supposed to fall in. This then would be a call for a
confidence interval, which was earlier, discussed. Normally we need to select a set of values containing (1-  )% of the
distribution for which the null hypothesis can be accepted. The region at which the null hypothesis is rejected is called the
Critical region and the boundaries of the critical region are called the critical value.
19.4 One tailed and two tailed test
The above test considered where we had  = 50 Vs   50 is an example of a two tailed test. In a two-tailed test, the
specified population parameter can either be greater than or less than the given value. In this case if   25, means either
 > 25 or  < 25.
In a one tailed test we are interested at either a definite increase or definite decrease of a given population parameter at a
specified value. The above hypothesis could thus be either  = 25 Vs  < 25 or  = 25 Vs  > 25
19.5 Procedures for carrying out a statistical test.
In general, when performing a significant test it is useful to follow a set of procedure. The following procedures can roughly
be followed




State first the null hypothesis, H0 and alternative hypothesis H1
If we are looking for a definite increase or definite decrease in the population parameter, we use a one tailed test
and if we are looking for any change we use a two-tailed set
Consider the appropriate distribution given by the null hypothesis
Decide on the level of the test. This fixes the critical values of the test statistic
Decide on the rejection criteria
Now consider the sample values and



Calculate the value of the test statistic.
Make a conclusion: If the value of the test statistic lies in the critical region reject H0
If the value of the test statistic does not lie in the critical region do not reject H0
19.6 Testing for  =C basing on a sample mean
19.6.1When the population variance  2 is known
When the sample statistic considered is the sample mean x , the test statistic will be obtained by standardizing x , and this
will be given by ( x -C)
 / n.
84
Example
A sample of the test scores of 49 students was taken and the mean score was found to be 68.9. The variance
of the test score by all the students in the school is known to be 36. Is there sufficient evidence at 3% level to
assert that the mean score of all the students was 70?
Sol
 =70
 =70
(i)
H0:
H1:
(ii)
(iii)
We suppose that indeed  =70.Accordingly XN (70, 36)
The z value containing 97% of the entire the entire distribution
are –2.17and 2.17
(i)
Z=( x -C)
(v)
The calculated value 1.28 is smaller than the critical value of 2.17. Hence we conclude that
the population mean could be 70 at 3% level of significance.
 / n.=
68.9  70
6
49
 1.28
19.6.2 When population variance  2 is unknown
As note before on confidence intervals, the preceding section is an ideal one. Often times we know not of the population
mean and the population variance. In other words the test statistics involves the sample variance s 2 and not population
variance  2 .
Accordingly we have,
Z=( x -C)
s / n for large n and t=( x -C)
̂ / n for small n
Example 1
A normal distribution is thought to have a mean of 50. A random sample of 100 gave x mean of 52.6 and s= 14.5. Is there
sufficient evidence that the population mean is indeed 50? Test at the 5% level.
Sol:
In this example the population variance is not known, yet the sample size is very
Large (>30), hence the distribution of the sample mean shall follow under the normal distribution.
i)
H0:
H1:
 =50
 =50
85
(ii)
We suppose that indeed  =50.Accordingly
x N (50, 14.52 100 )
(iii)
The z value containing 95% of the entire the entire distribution
are –1.96and 1.96
(ii)
The test statistic would be Z=( x -C)
(v)
The calculated value 1.79is smaller than the critical value of 1.96 Hence we conclude that
the population mean could be 50 at 5% level of significance
s/ n=
52.6  50
100
14.5
 1.79
Example 2
A sample of size 5 of the scores by student’s in-group E yielded the following results.
Table 19.1
Student 1
2
3
4
5
Score
50
45
40
60
50
Can we assert at 5% level of significance that the mean score in-group E was 55?
Sol:
In this example the population variance is not known and worse the sample size is very
Small (<30), hence the distribution of the sample mean shall follow under the student’s t with n-1
d.f. From the data given, x = 49and ̂ =7.42
 =55
  55
i)
H0:
H1:
(ii)
(iii)
We suppose that indeed  =55.Accordingly x t (55, 7.422 5 )
The t value containing 95% of the entire the entire distribution at 4 d.f are –2.776and 2.776
(iii)
The test statistic would be t=( x -C)
(v)
The calculated value 1.808 is smaller than the critical value of 2.776 Hence we conclude that
the population mean could be 55 at 5% level of significance
86
̂ / n =
49  55
7.42
5
 1.808
19.7 Testing for the difference between two populations “ 1   2 =C “
19.7.1 Testing for 1   2 =C based on the sample means when the population variance
 12 and  22 are known
If X1 and X2 are independent unpaired samples of sizes n1 and n2 such that
X1 N (  1 ,  ) and X2 N (  2 ,  ) then x1 - x 2 (  1 -  2 ,
2
1
2
2
 12
n1
+
 22
n2
) Since, the population variances are given,
then the standardised value of x1 - x 2 will follow under the standard normal distribution and the test statistic will be given
as
Z =
( x 1 - x 2 ) - (0)
 12
n1

 12
n2
Example
A random sample of size 100 is taken from a normal population with variance  12 = 40. The sample mean x1 is 38.3.
Another random sample of size 80, is taken from a normal population with variance  22 = 30. The sample mean x 2 is 40.1.
Test at the 5% level, whether there is a significant difference in the population means  1 and  2 .
Sol:
From the given test statistic we shall have Z=
( 38.3 - 40.1 ) - (0)
40 30

100 80
=-2.0
This value will be compared with the critical values of 1.96 at 5% level of significance which implies the rejection of the
null hypothesis that 1   2 =0
19.7.2 Testing for 1   2 =C based on the sample means when the population variance
 12 and  22 are unknown
Once again let us consider a realistic situation at the field where usually the population variance is unknown.
Accordingly the sampling distribution of the mean will follow under either t-distribution or Z-distribution depending on the
sample sizes n1 and n2 . If both the
87
sample sizes are large the test statistic will be Z=
( x 1 - x 2 ) - (C)
s2
n1
would be
t=

s2
whereas if the sample sizes are very small the test statistic
n2
( x 1 - x 2 ) - (C )
s 2p
n1

s 2p
n2
Example 1.
The systolic blood pressures of a group of 60 patients showed x =140, sx =10. A second group of 60 showed y =145, and
sy =13. Is there a significant difference in the mean difference between the two groups?
Sol: In the example given the population variances are known. But the sample sizes are both larger than 30
Therefore Z=
( x 1 - x 2 ) - (C)
s2
n1

s2
=
( 140 - 145 ) - (0)
n2
100 169

60
60
 2.36
This value will be compared with the critical values of 1.96 at 5% level of significance which implies the rejection of the
null hypothesis that 1   2 =0
Example 2
The following were the student’s scores in MB201for random samples taken from two different degree programmes at the
Sokoine University of agriculture in the year 2002
Table 19.2
B.SC. Agric Engineering
Bachelor of Vertinary
Medicine
n1 =12
n2 =8
Sample size
Sample mean
Sample variance
x1 =7.5
x 2 =5.9
2
1
s 22 =3.1
s = 0.58
Test at 5% level of significance whether 1   2 =0 or not and hence comment the student’s performance in the two-degree
programmes
Sol
As it can be seen, the two population variances are unknown and that sample sizes
are smaller than 30.
From the student’s t tables we have t  n  n
1
2  2  , 2
= t18,( 0.05) 2 =2.101
n1 s12  n2 s 22 12(0.58)  8(3.1)
 1.76
And from the data given we have S =
=
n1  n2  2
12  8  2
2
p
88
Therefore t =
( 7.5 - 5.9) - (0)
1.76 1.76

12
8
=
2.6
We shall reject the null hypothesis because 2.6>2.101. It is thus evident that the performance of the Engineering students is
significantly better than that of the vertinary students in the subject.
19.8 A test for paired observations
The examples in the preceding section involved two independent random samples taken from two different populations.
Sometimes observation is made at pair and as a consequence any two samples considered are not independent. For
example one may be interested at comparing the mean performance of the students in two different subjects say math
and History. The score for every student would be regarded as paired observation. If x and y are the two variables
observed jointly, the distribution of the sum or the difference cannot easily be established unless we involve the cov (x,
y).
Alternatively we consider the distribution of di=xi-yi. So the test for  x   y  C
would
basically be the test for  d  c .
In particular, the test for the difference between two population means would the same as the test for  d  0 . The test
statistics will thus be Z=( d -C)
s d / n for large n and t=( d -C)
ˆ d / n for small n.
Example
The following were the points scored by the students of Bsc.Environmental Sciences &
Management two different subjects in the June/July-2001 University Examination at Sokoine
University of Agriculture.
2 2 2
2 2 2
Table 19.3
2 2 2 2
2 5 2 4
3
2
5
3
3
1
Sol
These are paired observation where we need to consider deviations di=xi-yi as shown in the table below: Table 19.4
4
1 3 1 2
2 2 2
2 2 2 2 3 2 2 3 2 3
X
2
1 5 1 2
2 2 2
5 2 4 2 2 2 2 2 5 2
Y
2
0 -2 0 0
0 0 0 -3 0 -2 0 1 0 0 1 -2 1
di
5
3
2
3
1
2
BIOMETRY
DEV- STUDY
4 1 3 1
2 1 5 1
2
2
3
2
2
2
2
2
3
2
2
5
Test the hypothesis that the performance of the students in the subjects is the same.
Considering the figures in the third row, we have d  -0.06 and ̂ d  1.43 and n =20
Since n is small the test statistic would be
t=( d -C)
ˆ d / n =  0.06 1.43
20
 0.1876
The t-values containing 95% of the entire distribution at 19 d.f are –2.093 and 2.093
Upon comparison we accept the null hypothesis that  d  0 , meaning that the performance in the two subjects is not
significantly different at 5% level of significance.
89
Exercise:19
1.What is a statistical hypothesis? Explain the essence of having hypotheses in statistics
2.The following were the random samples on weights of six children under five years at
Kigugu village 12
13
14
11
15
8
9. Can the mean weight of the entire village be equal
to 20? Test at 5% level of significance
3.Two samples drawn from two different normal populations revealed the following.
S/N
1
2
Sample
size
8
12
Sample variance
Sampl
mean
2
3
12
14
Do the two population means differ significantly?
4.The following table shows the mean number of bacteria colonies per plate obtainable by four slightly different
methods from soil samples taken at 4P.M and 8P.M respectively.
Method
4 P.M
8 P.M
A
29.75
39.20
B
27.50
40.60
C
30.25
36.20
D
27.80
42.40
Are there significant more bacteria at 8.P.M than at 4 P.M?
5. If X and Y are two dependent variables corresponding two different populations. Derive the formulae for the mean
and variance of the sampling distribution of the mean difference of the two populations
6.Ten soldiers visit the rifle range two weeks running. The first week their score were
67,24,57,55,63,54,56,68,33,43. In the second week their scores in the same order were 70,38,58,58,56,67,68,75,42,38.
Is there any significant improvement?
90
CHAPTER 20: THE
 2 DISTRIBUTION
20.1 Introduction
If the variable variables Z1, , Z2 ,
Z3
are independent and each having N (0, 1), then y= z12  z 22  ... z n2 follows
, …. Zn
under the so-called chi-square (  ) distribution with n degrees of freedom. The p.d.f of a chi square distribution is
2
given as y = C e
(  2 / 2)
(  2 ) ( n / 2 ) 1 where C is a constant depending on n, the number of degrees of freedom.
Sketch of the curve
Consider the random samples
x1 , x2 ,..., xn
taken from a normal distribution with N (  ,
 2 ). Based on the given
definition, the following statistic

2
=(
x1   ) 2

2
However, if

2
=(


2
 ... 
( xn   ) 2

2
Would follow under the
 2 (n) .
is unknown and is estimated from the data we have;
x1  x) 2


( x2   ) 2
2

( x2  x ) 2

2
 ... 
( xn  x ) 2

2
=
(n  1)ˆ 2

2
Which also follow the
 2 (n-1)
Exercise
1.
Show that the mean of the chi-square distribution with n degrees of freedom is also n
2.
Show that
(n  1)ˆ 2

2
follows under the
 2 (n-1)
(Hint: The sum of two chi-variates with n and m degrees of freedom follows under chisquare distribution with n+m d.f)
91
20.2 The sampling distribution of the variance
An immediate application of this result is on the inferences about the population variance
confidence interval for
Accordingly  n 11 2  
 2.
(1-)%
We know the chi-values containing (1-)% are  n 11 2  and  n1  2  .
n  1s 2
2
 2 . Consider
  n1 2 . which is the same as
n 1s 2
 n1
2 

n  1s 2
 n11 2 
.
Example
Establish a 95% confidence interval for the variance of a normal distribution using the random sample below: 4,5,7,8,10
Sol
From the data, s2 = 4.56, and the
 2 4,0..025 = 11.143 while  2 4, 0..975 = 0.484
Therefore 95% confidence interval for  is
2
1.64   2  37.68. =
4  4.56 . 1.64<37.68
4  4.56
2 
11.143
0.484
Exercise
A sample of 5 students in a certain class showed a variance of 20 in their mathematics score. Is it possible that the
variance of the entire class score is 25? Test at 10% level of significance.
20.3 The uses of a chi-square distribution
Apart from being the sampling distribution of the variance, the chi-square distribution has many other uses resulting
from the following theorem
If Oi and Ei are the observed and expected frequencies of the
i th
event in a
given random experiment consisting of n events, then the quantity
 O  E 
n
i 1
i
Ei
2
i
follows under the
 2 (n-1)
dealing with data involving counts. Below are some of the applications of this theorem
92
The proof of this
theorem is beyond the
scope
of
this
presentation.
However,
this
theorem finds its
applications in many
daily practical inquire
20.3.1 The test on the goodness of fit- (single classification)
The chi-square distribution can be used to test whether a certain observation follows under a suggested distribution or
not.
Example
A student was asked to throw a die 100 times and recorded the following results
Score on x
Observed frequency
Expected frequency
1
24
2
12
3
14
4
15
5
26
6
11
Does the data support the hypothesis that the die is unbiased? Test at 5% level
Sol
1.
Score on x
Observed frequency
Expected frequency
1
24
17
 O  E 
6
2. The quantity
i 1
i
2
12
17
3
14
17
4
15
17
5
26
17
6
11
17
2
i
=
Ei
Example 2
Mendel reported the following on results for a di-hybrid (double-heterozygous) cross with peas.
1.
2.
3.
4.
Round/yellow— 315
Round /green 108
Wrinkled/yellow-101
Wrinkled/green 32
The theory predicts that the frequencies should be in the proportions 9:3:3:1
Are these results consistent with the hypothesis of independent segregation and simple dominance of yellow over green
and round over wrinkled?
sol
93
Example 3
Genetic theory states that children having one parent of blood type M and the other of blood type N will always be one
of the types M, MN, N and that the proportions of three types will on average be as 1:2:1. A report states that out of 300
children having one M parent and one N parent, 30% were found to be of type M, 45% of type MN and the remainder
type N.Test whether the genetic theory is true or not.
Sol
20.3.2The test of independence on the association of attributes(Double classifications)
So far, we have considered cases in which individuals/ observations are classified in single criteria. However sometimes
we have to consider cases where individuals are being classified in more than one criterion. The chi-square distribution
can always be used to test the independence or dependence of the two criteria classification over the individuals.
Example.
Four hundred school children were classified into left-handed or right handed, and left-eye-dominant and right-eyedominant. The following were the results: -
Left-eyed
Right-eyed
Total
Left-handed
27
27
54
Right-handed
110
236
346
Total
137
263
400
Is there any association between one being left-handed and one being left eyed and vice-versa?
Sol
20.3.3 Remarks on the
 2 distribution
(a) The expected frequencies should at least exceed 5 and preferably much larger. If one or more of the expected
frequencies falls below 5, we pool the smaller classes to form larger ones until the condition is fulfilled. In
combining frequency classes in this way we lose the degrees of freedom
(b) The sum of the expected frequencies must be equal the sum of the observed frequencies
(c) The number, m of classes or cells should preferably be neither too large nor too small. If 5  m  20 one is
usually on the safe side.
94
CHAPTER 21:THE F-DISTRIBUTION: A TEST FOR  12   22
21.1 Introduction
When testing for difference between two populations means with population variances unknown and sample size being not
large we always assumed that the two population variances are the same and looked for the pooled variance as an estimate of
the variance for both the two populations. This is not always the case and sometime we need to make a test whether really
 12   22 .
Such a test can be made using the so-called F-distribution. The F-distribution can be derived from the chi-square distribution
using the following theorem.
“Given two chi-variables,  n21 and n22 then the quantity F=
 2 / v1
 v / v2
follows under F-
2
distribution with degrees of freedom v1 andv2 ”
The p.d.f for the F-distribution is given by y  CF
v1  2
2
v v
 1 2
v
(1  1 F ) 2 . Where C is a constant depending upon v1 andv2 .
v2
Of particular interest is when considering the ratio of two sample variances corresponding to sample sizes n1 andn2 . Recall
that
(n  1) sˆ 2

2
follows
 2 (n-1). It follows that if s12 and s 22
are the variances of two samples of sizes n1 andn2 taken from
two different populations with variances  12 and 22 , the quantity
ˆ 12
ˆ 22
( n1  1)
( n2  1)
would
/
 12 n1  1  22 n2  1
follow under F-distribution with (n1-1) and (n2-1) degrees of freedom. we write this as F (n1-1 n2-1).
If we assume that the null hypothesis  12   22 is true the quantity
simply,
ˆ 12
ˆ 22
( n1  1)
( n2  1)
becomes
/
 12 n1  1  22 n2  1
ˆ 12
which is the ratio of two unbiased estimators of the two population variances.
ˆ 22
95
Sketch of the F-curve
Example
In testing for percent of ash content, 17 test from one shipment of coal show s=2.66 percent, and 21 test from a second
shipment show s=4.55 percent. Test H0:  12   22 Vs  12   22 with   0.10
21.2 The uses of the F-distribution
The F- distribution is frequently used in the analysis of variance where you consider variation from two different sources
contained in a certain observation The ratio of the variances from the two source will tell whether there is a significant
variation between the two sources of variation. If there is a significant variation between the two sources, the implication is
that the sources do differ significantly in terms of their contribution in the total variation. At the moment we can use the Fdistribution in testing the validity of a given regression line
Recall that,
TOTAL VARIATION =EXPLAINED VARIATION +UNEXPLAINED VARIATION
And
Or TSS=(R2)TSS+ (1- R2)TSS
Where R2 is the coefficient of determination = EXPLAINED VARIATION/TOTAL VARIATION
We have two sources of variation as indicated; Explained variation (R2) TSS¬ 1 and unexplained variation
(1- R2)TSS¬ n 2  . Hence F=(R2)TSS (n-2) / (1- R2)TSS gives F-value at 1and n-1 degrees of freedom.
Usually under simple linear regression the ratio of the two variations follows F (1,n-2).
R2 is required to be significant, meaning that EXPLAINED variation is larger than the UNEXPLAINED variation.
Example
The mental ages, x and the scores on a test y, of a group of 4 boys were as follows.
X
Y
5
0
5
5
7
8
Find
(a) Regression line of y on x
(b) Test at 5% whether the regression line is relevant or not.
(c) Comment on part a after the result in part b
(d) On the basis of the result in part (b) estimates the score at the age of 6
96
8
10
Sol
(a)
(b)
y=-10.22+2.6x
From the data given R2=0.8. The calculated F is 0.8(2)/(1-0.8)=8
The F value at 1 and 2 d.f is 7.71
Since the calculated value “8” is larger than 7.71, we conclude that the regression sum of squares are significantly larger than
the Error sum of squares. Hence the regression line is significant.
Exercise: 20-21
1
A sample of 30 drawn from a normal distribution had a variance of 10. Find the 90% confidence interval for the
population variance
2.
Assume for a certain age of group of Tanzanian males that systolic blood pressures show variance of 268. A selected
sample of 20 men from this age group had a variance of 313. May one conclude that this age group represents a
population with variance not equal to 268?
3.
A sample of 5 males and 4 females scores in mathematics were taken. Find the chance that the variation in female
score was 9times larger than in male’s scores
4.
The following were the paired observations between x and y. (3,1), (4,3), (4,2), (2,5), (1,7). Find the linear regression
of yon x. Is the regression line significant at 5%level?
5.
The following table shows the association, among 1000 schoolboys between their general ability and their
mathematical. Find out whether there is a connection between one’s mathematical ability and the general
ability
Math ability
Good
Fair
Poor
Total
6
General ability
Good
44
265
41
Fair
22
257
91
Poor
4
178
98
Total
It is known that, for over a period of five years a certain college comprises of 600 1st years, 400 , 2nd years
and 300 third years. Can a group of 20 1st years, 10-second years and 5 third year students found at random
be said to belong to such a college.
97
A list of useful references
1.Dr.B.S.Grewal(
Higher Engineering mathematics-
2.Sanjay Arora & Bansi Lal
Introducing Probability & Statistics,
3.Elmer B.Mode
Elements of Statistics-Third edition
4.D.A.Bryars-
Advanced level statistics,
5. Ministry of national education
Advanced mathematics volume II,
6. D.S.Gupta
An outline of statistical theory
7.Harry Frank& Steve son C.Althoen
Statistics-Low price edition
98
Download