Uploaded by eliasumar9

stastics chapter all 230710 174304.pdf(1)

advertisement
Chapter one
Introduction
1.1. Definition of Statistics
How do we define Statistics?
It has two meanings. In the more common usage (layman definition), statistics refers to a
collection of numerically expressed facts or data.
Examples:
The number of colleges in a city;
The number of students in a college;
Per capita income statistics;
Statistics of imports, exports, consumption, etc;
But the subject statistics has a much broader meaning than just collecting and publishing numerical
information.
Therefore, we define statistics as the science of collecting, organizing, presenting, analyzing, and
interpreting numerical data to assist in making more effective decisions.
According to Dominick Salvatore and Derrick Reagle “statistics refers to collection, presentation,
analysis and utilization of numerical data to make inferences and reach decisions in the face of
uncertainty in economics, business and other social and physical sciences.”
As the definition suggests:
 The first step in investigating a problem is to collect data.
 The data must be organized in some way and perhaps presented in a chart.
 Only after the data have been organized and presented, we can analyze and interpret it.
Example: If students of economics at a university would like to know the monthly household income of
200 residents in a town, then they
foccuos
a) have to collect the data, that is, income of the households under study ,
b) should organize the data (say by arranging the data in ascending or descending order),
c) should present that data by using charts, tables, etc,
d) and they should do some analysis (say find the average, median, mode variance, standard
deviation, , etc) and interpret the data.
1.2. Types of Statistics
The study of statistics is usually divided in to two categories:
a) Descriptive Statistics
 It is a statistical method that deals with describing (summarizing) given set of data without
making conclusions about the larger data.
1
 It consists of collection, organization and presentation of data in an informative way.
ie there is no
decision
1
 Tables, graphs and numerical summary measures may be used to describe data.
 In descriptive statistics, the statistician tries to describe a situation.
Examples on descriptive statistics:
1) Consider the national census conducted by the Ethiopian government in 1999 E.C. Results of
this census give the average age, average household income, and other characteristics of the
Ethiopian population and these are descriptive statistics.
2) A survey found that 49% of the populations in Ethiopia are males. The statistic 49 describes
the number out of every 100 persons who are males.
3) According to Consumer Reports, Sony TV owners reported 2 defective TVs per 100 TVs
(2%) in 2001. The statistic 2(2%) describes the number of problems out of every 100 TVs.
4) According to the bureau of the labor statistics, the average daily wages of workers in a town
is birr 15 in August 2007.
5) The GDP of country X was 100 million in 1960 and 140 million in 2007. If we calculate the
percentage growth of GDP from 1960 to 2007, that is still a descriptive statistics. What is
the percentage growth of GDP from 1960 to 2007?
[Answer 40 %=
140  100
x100% ]
100
Query: Would it be descriptive statistics if we used this GDP growth rate (40%) to estimate
the GDP of country X in the year 2010? Why? What type of statistics is it?
b) Inferential Statistics
 It is also called statistical inference or inductive statistics.
 It is a statistical method that involves taking a sample from a population, computing the statistic
2
based on the sample, and inferring from the statistic about the value of the corresponding
parameter.
 It is a branch of statistics that is used to determine something about the population on the basis of a
sample taken from that specific population
 It is a decision, estimate, prediction, or generalization about a population, based on a sample.
Examples:
1) The accounting department of a large firm will select a sample of the invoices to check for accuracy for
all the invoices of the company.
2)
Wine tasters sip a few drops of wine to make a decision with respect to all the wine waiting to be
released for sale.
Note the words “population” and “sample” in the definition of inferential statistics.
2
 A population is a collection of all possible individuals, objects or measurement of interest. When a
researcher gathers data from the whole population for a given measure of interest, it is called census
(complete enumeration).
 A sample is a portion or part of the population of interest.
When we discuss about inferential statistics we have to differentiate between parameter and statistic.
 Parameter is the calculated value of a population (say population mean, population standard
deviation, etc.) and statistic is the calculated value of a sample (say sample mean, sample standard
deviation, etc.). The difference between sample statistic and its corresponding parameter is called
sampling error.
Example on sample vs. population:
i.
If we want to do a research on the impact of high school GPA (transcript result) on college GPA of
economics students at a university, the population is all economics students at that university.
ii. A researcher may select all students of economics at Debre Markos University as a sample to know
the impact of high school GPA on college GPA and infer (conclude) something about the impacts of
high school GPA on college GPA of economics students at all Ethiopian colleges/universities.
Exercise
The marketing department of a bank asked a sample of 1960 customers to try a newly developed banking
system. Of the 1960 samples, 1176 said they would use the new system if it is marketed. What would the
marketing department report to the bank officials regarding the acceptance of the new system in the
population? Is this an example of descriptive or inferential statistics?
Solution:
Based on the samples of 1960 customers; we estimate that, if it is marketed sixty percent
(1176/1960*100%) of all customers will use the new system and it is inferential statistics, because a sample
was used to draw a conclusion about how all customers in the population would react if the new system
were marketed.
3
1.3. Why we study Statistics?
Statistics is required for many college programs like business, economics, engineering,
psychology, medicine etc. The course content is basically the same. The biggest difference is the
examples used and level of mathematics required. Statistics course in colleges of business and
economics usually teach the course at a more applied level.
Thus, in business and economics, we are interested in such things as:
 profits (revenue minus cost),
 Gross Domestic Product (GDP),
 Demand,
 Supply,
foccous
 Consumption,
 Cost ,
 Wages, etc.
Dear distance learners, why statistics is required in so many fields of studies? We are studying statistics
for the following reasons:
1) The first reason is that numerical information is everywhere.
If you look in the magazines in Ethiopia, you are going to find a lot of numerical
information like exchange rates (say $1=10 birr), unemployment rates (say 5% in
Bahir
foccous
Dar=
unemployed
),
labor force
per
capita
income
(=
Gross National Income
),
Population
consumption rate of cement, export of coffee, import of cars, inflation rate, demand for
kerosene, enrollment rates of high schools, etc.
Therefore, to be an educated consumer of this information, an understanding of the
concepts of basic statistics will be useful.
2) Students and/or professionals may be called on to conduct research in their fields, since statistical
procedures are basic to research.
To accomplish this, they must be able to design experiments; collect, organize, analyze
and summarize data and possibly make reliable predictions or forecast for future use.
They must also be able to communicate the results of the study in their own words.
3) Students, like professionals, must be able to read and understand the various statistical studies
performed in their field. To have such understanding, they must be knowledgeable about the
vocabulary, concepts and statistical procedures used in these studies.
4
4) Data is everywhere and no matter what your future line of work, you will make decisions that
involve data and understanding of statistical methods will help you make these decisions more
effectively.
1.4. Uses of Statistics
Importance of statistics is clearly stated in the following words of Carol D. Wright of USA “to a very
striking degree, our culture has become a statistical culture. Even a person who may never have heard of
an index number is affected by of those index numbers which describe the cost of living. It is impossible to
understand psychology, sociology, economics, business, finance, or physical science without some
general idea of the meanings of an average, of variations, of sampling, of how to interpret charts and
tables.”
According to H.G Wells “statistical thinking will one day be as necessary for effective
citizenship as the ability to read and write.”
The main functions of statistics are to enlarge our knowledge of complex phenomena. That is;
i.
It presents facts in a definite and precise form. Example: Instead of saying that per capita income
of Ethiopia is low; better and clear to say it is 110.
ii. It reduces data: i.e. it simplifies a complex mass of data and presents it in a few, clear, and useful
summaries. The bulky data may be summarized in totals, averages, percentages, etc.
iii. It measures the magnitude of variation in data.
iv. It furnishes with technique of comparison.
v. It helps to estimate the unknown population parameter from a sample.
vi. It helps to test and formulate hypothesis.
vii. It helps to study the relationship between two or more variables.
viii. It helps to forecast future events.
1.5. Users of Statistics
Most people become familiar with statistics through radio, television, newspaper, and magazines
and statistical methods are used in almost all fields of human endeavor.
Statistical methods help people identify and solve many problems concerning the environment,
the economy, transportation, public health and other matters of public concern.
Economists use statistical techniques to predict future economic conditions, to understand
economic problems, to formulate economic policies, to do research in the areas of
economics, to do market analysis, etc.
Doctors use such methods to determine whether certain drugs help in the treatment of medical
problems.
Weather forecasters use statistics to help them predict the weather more accurately.
Engineers use it to set standards for product safety and quality.
Statistical ideas help scientists design effective experiments.
5
Lawyers are increasingly turning the statisticians to help weigh evidence and determine
reasonable doubt.
In education, the researchers might want to know if new methods of teaching are better than the
old ones.
1.6. Application of Statistics in Business and Economics
 Now-a-days the success of a particular business or industry very much depends on the accuracy
and precision of statistical analysis.
 Before taking a new venture or for the purpose of improvement of an existing venture, the
business executives must have a large number of quantitative facts. Examples:
 cost of raw materials,
 demand of products in the market,
 price of products in the market,
 various taxes to be paid,
foccous
 labor conditions,
 sales forecast.
 All these facts are to be analyzed statistically before stepping in for a new enterprise or before
fixing the price of a commodity.
 Statistical methods are now used for exploring possibilities to
 advertising campaigns,
 for adjustment of production methods and
 as an aid to establish standards.
 Statistical techniques help in forecasting future markets.
 Market research and market surveys by statistical sampling methods are now extremely useful for
any business person.
 In industry, statistics is widely used in quality control.
 In production engineering, to find whether the product confirms to specification, statistical tools
like inspection plans, control charts, etc are of great use.
 Wide application of statistics can be found in insurance companies where the premium rates are
fixed on the basis of mortality, average length of life, possibilities of investment, etc.
1.7 Limitations of statistics
i.
Statistics deals with only quantitative information, i.e. information should be capable of
numerically expressed either directly or indirectly.
ii. Statistics deals with only aggregates of facts and not with individual data items.
iii. Statistical data are only disadvantage
approximately and mathematically correct.
iv. Statistics can be easily misused and, therefore, should be used only by experts.
6
Misuse of statistics
Knowingly:
Unknowingly:

Advertising media


Government for political cause
-
Statistics

Inappropriate comparison
-
The subject matter to which it is applied
Lack of knowledge in

Incomplete information
1.8. Steps of Statistical investigation
A statistical study involves the following stages:
i.
Determine the objective of the study;
ii. Collection of data;
iii. Organizing the collected data;
iv. Presenting the data;
v. Analyzing the data, and
vi. Interpreting the results of the study and recommendations.
1.9. Types of Variables
What is variable?
 A variable is measurable characteristics of a given phenomenon (object, process, event, etc) which
can take different values in a given population or samples of elements or it is a characteristic about
each element of a population or a sample.
Examples:

annual income (it can be Birr 2000, Birr 3000, Birr 4000, or any other value),

quantity demanded (it can be 2000 units, 3000 units, 4000 units, or any other value),

price (it can be Birr 2 per unit, Birr 4 per unit, Birr 10 per unit or any other value),

gender (female or male), etc.
 Data (singular datum):

are the set of values collected for the variable from each of the elements of the sample

are the actual measurements or observations that result from an investigation or survey

are the values (response) of the variable associated with an element of a population or a
sample.
Example:

The variable monthly household income of a family in a town can assume different values
(say, Birr 1000, Birr 3000, etc). But if we collect the monthly household income of 100
households then the values are called data.
7
 Data set: is a collection of data values (data). Example: the monthly households’ income of 100
residents in a town is called data set.
 Raw data: is a data collected in an original form (not yet organized)
 Information: is a set of data corresponding to a specific aspect of knowledge combined in an
organized way. Information is a processed data to be used directly. It can transfer knowledge and
meanings
From the point of view of statistical methods, variables can be broadly classified into qualitative (or
categorical) and quantitative (or numerical) variables.
Qualitative Variable፡
 When the characteristic being studied is non-numeric, the variable is called qualitative variable or
attribute.
 It is a variable or characteristic which cannot be measured in quantitative form but can only be
identified by name or categories.
 Examples include; gender, religious affiliation, type of automobile owned, place of birth, eye color,
etc.
 When the data are qualitative, we are usually interested in how many or what portion fall in each
category. For example, what percent of the population are males? What percent of the population
owns a Nokia mobile apparatus?
8
 Note that: Generally, although numerical codes can be assigned to the different categories of
variables, arithmetic operations (addition, subtraction, multiplication and division) are not
applicable to qualitative data.
Quantitative Variable:
 It is a variable that can be measured and expressed numerically.
 Examples: balance in your checking account, minutes remaining in class, or number of children in a
family, time taken to finish an exam, etc. Quantitative variables can be classified as either discrete
or continuous.
1) Discrete variables: can only assume certain values and there are usually “gaps” between values.
Discrete variables can be assigned values such as 0, 1, 2, 3, 4, 4.5, 7.75, etc…. and are said to be
countable and typically discrete variables result from counting.
Examples: the number of
bedrooms in a house, or the number of cars sold at a car market, etc.
2) A continuous variable can assume any value within a specified range. Examples: The pressure in
a tire, the weight of a stone, or the height of students in a class, the distance from Debre Markos to
Bahir Dar, age, temperature, etc. Typically, continuous variables result from measuring something
and therefore, variables must be rounded to the limit of the measuring device.
Review exercises
1) A commonly (layman) definition of statistics is:
a. A Collection of numerical values
b. A single value
c. The sum of several values
d. The largest value in a set of observations
2) In descriptive statistics our main objective is to
a) Infer something about the sample
b) Describe the data we collected
c) Infer something about the population
d) Compute an average and conclude about the population from which the data is collected
3) Which of the following statements is true regarding a population?
a) It must be a large number of values
b) It must refer to people
c) It is a collection individuals, objects, or measurements
d) None of the above
9
4) Which of the following statements is true regarding a sample?
a) It is a part of population
b) It is a subset of the population
c) It is taken as census is sometimes costly
d) All of the above are correct
5) A qualitative variable
a) Always refers to a sample
b) Is not numeric
c) Is numeric
d) All of the above are correct
6) A discrete variable is
a) An example of a qualitative variable
b) Can assume only whole number values
c) Can assume only certain clearly separated values
d) Cannot be negative
e) All except A
7) In inferential statistics our main objective is to
a) Describe the population
b) Describe the data we collected
c) Infer something about the population based on the sample
d) Compute an average
8) In each of these statements, tell whether descriptive or inferential statistics have been used.
a) In the year 2015, the enrolment rate of elementary schools in Ethiopia will be 100%.
b) The average household income for people aged 25-34 is birr 2000/month.
c) Drinking coffee may raise cholesterol levels by 7%.
d) Some economists say that National Bank of Ethiopia (NBE) may increase the interest rate on
deposits to lower the money supply of the economy.
9) Classify each of the following variables as qualitative or quantitative.
a) Color of the automobile
b) Number of desks in classrooms
c) Gender (1=female, 0=male)
d) Number of pages in a book
10
10) Classify each of the following variables as discrete or continuous.
a) Water temperature of the Sauna at a given health spa
b) Income of a household
c) Life time of batteries in a tape recorder
d) Weights of a newly born infants at a certain hospital
11) Consider the following :
Selling price of a house depend on the following factors:
a. Number of bedrooms
b. Size of the house in square feet
c. Swimming pool (1=yes, 0=no)
d. Distance from the center of the city
e. Township
f. Garage Attached (1=yes, 0=no)
g. Number of bathrooms
Which of the variables given above are qualitative and which are quantitative? Why?
12) Briefly explain the difference between the following concepts and give examples, if necessary.
a) Qualitative variable vs. quantitative variable
b) Quantitative data vs. qualitative data
c) Descriptive statistics s vs. Inferential statistics
d) Sampling vs. Census
e) Parameter and statistic.
f) sample vs. population
13) Describe the importance of Statistics for an Economist.
14) Select an article newspaper (say Ethiopian Herald) that involves a statistical study and write a paper
answering the following questions.
a. Is the study descriptive or inferential in nature? Explain your answer.
b. What are the variables used in the study? Classify the variables as qualitative or quantitative
15) One of the following is not true?
a. Population is sometimes referred to as the universe
b. The height of Ras Dashen mountain is 4440m can be considered as continuous variable
c. The ages of students at Debre Markos University is a variable
d. None
11
16) The difference between the sample mean and the population mean is called
a) Population mean
b) Population standard deviation
c) Standard error of the mean
d) Sampling error
17) The number of TVs sold by a certain shop during the months of November, December, January and
February, respectively are 25, 40, 35, and 32. Indicate whether the following conclusions belong to
the domain of descriptive statistics or inferential statistics.
a) During the four months, the average number of TVs sold per month was 33
b) Since the average number of TVs sold per month was small, the shop should invest more on
advertisement.
c) Out of the four months, the sale in November was the least.
d) The number of TVs sold in December was the highest because of Christ mass.
18) Is students ID number a qualitative or quantitative variable? Why?
19) Is a plate number of a car a qualitative or quantitative variable? Why? How about a house number?
Why?
20) Is Telephone number/region number a qualitative or quantitative variable? Why?
12
Chapter Two
Sampling Theory
Chapter Objectives:
When you have completed this chapter, you are expected to:
 Comprehend the basic concepts of sampling theory.
 Understand the reasons for sampling.
 Identify the basic sampling techniques.
 Demonstrate a knowledge of basic sampling methods.
 Apply sampling theory in business and economics.
2.1. Basic Concepts of Sampling Theory
Students are expected to know the following concepts in sampling theory:
1) Population or universe is a group of all elements /observations (persons, animals, objects,
measurements, etc) under consideration in a certain problem. The word population is a technical
term in statistics, not necessarily referring to people.
Examples:

All students in this university;

All households in Debre Markos town;

All light bulbs produced by a firm in a single day;

All fish in a lake, etc.
2) Census is a collection of data from the whole population (that is, complete enumeration). It is the
actual measurement or observation of all possible elements from the population or it is a survey
of everyone in the population.
3) Reference population (source or target population) the population of interest, to which the
focous
researcher would like to generalize the results of the study. Example: If a researcher would like
to study the effect of a new fertilizer on crop yield in Ethiopia, then the reference population is all
farmers in Ethiopia who are using the new fertilizer.
4) Sampling theory is a study of relationships existing between a population and samples drawn
from the population. Attaining a specified precision at minimum cost is the main intention of
sampling theory. In sampling theory population is often required as an assumption.
5) Sample is the small group that is chosen for the study. It is a part or portion or sub set of a
population taken so that some generalizations about the population can be made. The main
concern in sampling is to ensure that the sample accurately represents the population we are
13
interested to study. That is, samples are taken in a way that they will be representative of the
population.
6) Sampling is the process involving the selection of a finite number of elements from a given
population of interest for purposes of an inquiry. It is a process of taking samples from a
population of interest for purpose of an inquiry. Example: In industry, the quality of a product is
assessed through sampling; the public opinion on social, economical and political problems is
ascertained through sampling.
7) Sample size is the number of individuals or observations in a sample (usually denoted by n).
8) Parameter is any measurable characteristic of a population. Example: Population means,
Population standard deviations, population medians, etc.
9) Statistic is a number resulting from manipulation of sample data. That is, it is any measurable
characteristic of a sample. Example: sample means, sample standard deviations, sample medians,
etc. A statistic is used to estimate a population parameter such as Population mean (  ),
Population standard deviation (  ), etc.
10) The sampling error is the difference between a sample statistic and its corresponding population
parameter. It is the error that occurs because a sample has been taken instead of a census. For
example: the sample mean may differ from the true population mean.
11) Sampling Unit is the ultimate unit to be sampled (elements of the population to be sampled).It is
the unit of selection in the sampling process. Examples:
 In a sample of households, the sampling unit is a household;
 In a sample of students, a student is the sampling unit.
 In a sample of districts, the sampling unit is a district, etc.
12) Sampling Frame is the list of all possible units in the reference population, from which a sample
is to be drawn. Example: If a researcher would like to do a research on poverty levels of residents
in a town and if s/he decided that the sampling unit for the study is an individual, then the
sampling frame would be the list of all individuals living in that town. A student roster is a
sampling frame for a sample of students.
13) Sample design is a set of procedures for selecting the units from the population that are to be in
the sample.
14) Sampling fraction (sampling interval):- the ratio of the number of units in the sample to the
number of units in the sampling frame or in the reference population. For example, a sampling
fraction or ratio of 1:3 is equivalent to a sampling interval of 1 in every 3 units. This means that
the sample constitutes 33.3% of the total units in the sampling frame or in the reference
population.
14
An application of the terminologies

Population: All students in Debre Markos University in 2001 E.C.

Sampling Frame: All students appearing in the list of students prepared by the registrar on Hidar
30, 2001 E.C.

Sample design: Probability sampling

Sample size: 200 students selected from the sampling frame.

Sampling unit (unit of analysis): a student

Statistic: Students in the sample have spent an average of 300 birr per month.

Parameter: Students in the university are probably spending, on average, between 250 birr and
350 birr per month (estimate derived from sample statistic).
2.2. Reasons for Sampling
Why a Sample instead of a census? When studying characteristics of a population, there are many
practical reasons why we prefer to select samples of a population. Some of the reasons for sampling are:
a) A census can be extremely expensive and time-consuming. Contacting every member of a
large population would require great expenditures of time and money, and sampling from the
list can provide satisfactory results more quickly and at much lower cost.
Efficiency is the commonly known advantage of sampling. For example: a researcher may
wish to determine the average annual income for households in Ethiopia. A sample of
households would take fewer days and lower cost than interviewing all the households in
Ethiopia. Therefore, a sample has to be taken.
b) The physical impossibility of checking all items in the population (sometimes census is
impossible): Example: the population of fish, birds, mosquito and the like are large and
constantly moving, being born and dying. Therefore, we just take some samples to do a
research as it is impractical to have a census upon such types of populations.
c) A census can be destructive: The Awash wine factory, like every other winery, employs
wine tasters to ensure the consistency of product quality. Naturally, it would be
counterproductive if the tasters consumed all of the wine, since none would be left to sell the
thirsty customers. Likewise, firms wishing to ensure that its steel cable meets tensile-strength
requirements couldn't test the breaking strength of its entire output. As in the Awash factory
situation, the product "sampled" would be lost during the sampling process, so a complete
census is out of the question.
d) The sample results are usually adequate: In practice, a sample can be more accurate than a
census.
15
e) Speed: The collection and analysis of data can be done more quickly if the data are not
excessive. Time and energy are saved. That is, the data can be collected and summarized
more quickly with a sample than with a census. This is a valid consideration when the
information is urgently needed.
f) It enables the researcher to get more detailed information about a particular subject under
investigation. If only a few people are surveyed, the researcher can conduct an in-depth
interview by spending more time with each person, thus getting more information about the
subject. That is not to say the smaller the sample, the better; in fact, the opposite is true. In
general, larger samples-if correct sampling techniques are used-give more reliable
information about the population.
Disadvantages of sampling:
i.
Reliability: If the sample is not a true representative of the population, then we may sacrifice
reliability in favor of less time and money.
ii. If complete information is required on each and every element of the population, census should
be applied.
2.3. Sampling Methods
The population is too large to consider for collecting information from its all members. Usually, a
representative sub-group of the population (sample) is included in the investigation. Sampling involves
the selection of a number of study units from a defined population. The main concern in sampling is,
therefore, to ensure that the sample accurately represents the population we are interested to study.
Sampling methods can be categorized as probability and non-probability.
2.3.1. Probability Sampling: A probability sample is a sample selected such that each item in the
population being studied has a known chance (greater than zero) of being included in the sample. These
methods remove human judgment from the sampling process and ensure a more representative sample
and it has certain basic features.
Methods of Probability Sampling: The four basic types of sampling methods are:
 Simple random sampling,
 Systematic sampling,
 Stratified sampling, and
 Cluster sampling.
The choice of which to use in any given situation will depend on the types of a problem being
investigated, aim of the research and the available resources.
a) Simple Random Sample (SRS): In SRS, each item in the population has a known, the same, nonzero chance of being included in the sample.
16
Random samples are selected by using methods such as random numbers (which can be generated
from computers) or lottery method. To select a simple random sample you need to follow the
following procedures:
 Make a numbered list of all units in the population (sampling frame),
 Each unit on the list should be numbered in sequence from 1 to N (where N is the size of the
population),
 Select the required number of study units, using a "lottery" or a table of random numbers.
Lottery Method in SRS
1) Numbered or named papers representing a unit in the population are placed in a hat.
2) The papers are thoroughly mixed and the number of papers equal to the sample size is selected
from the hat. For a sample of 200 students, the researcher would select 200 papers.
3) The sample then consists of all units of the population corresponding to the selected papers.
Random Number Table Method in SRS
1) The researcher assigns a number to each unit of the population and constructs the random table.
2) Then s/he randomly selects a starting place (point), goes through the table across the rows or
down the columns and lists the numbers as they appear on the table.
3) Members of the population with the selected numbers constitute the sample.
4) A random number table is a list of numbers generated by a computer that has been programmed
to yield a set of random numbers.
5) It is possible for a unit’s number to be selected more than once.
Advantage of SRS
 Ensures that the sample is unbiased in that every individual and every sample has an advantage of
being chosen.
 SRS is the basic sampling method assumed in survey statistical computations. This can be used
with confidence.
Disadvantages of SRS
 SRS requires a sampling frame and this is sometimes impossible (the case of fish population),
 It is difficult to take samples if the reference population is scattered,
 If the population is extremely large, it is tedious and time consuming to number and select the
sample,
 Minority subgroups of interest in the population may not be represented in the sample.
Note that: In SRS, when we apply the table of random numbers, we have to ignore repeated digits
and those lying above the range of the population size. The following table shows a random number
generated by a computer.
17
731
065
777
796
870
963
130
610
759
454
704
173
030
130
611
005
796
465
951
662
591
414
219
145
343
330
606
637
765
155
590
333
873
496
739
665
456
265
126
687
034
005
258
910
055
349
929
365
984
496
905
172
400
609
844
408
846
838
362
542
485
489
230
221
293
378
496
696
911
898
308
662
250
825
716
795
080
180
487
769
074
750
467
029
647
057
017
108
798
719
839
769
780
814
610
744
629
042
308
361
067
619
658
839
744
159
596
527
650
205
151
875
325
634
664
409
052
842
734
503
675
794
821
221
194
412
879
012
804
975
965
539
105
841
188
430
132
407
945
213
351
859
816
246
321
714
049
895
120
705
025
756
235
042
620
205
048
563
859
040
Example: Suppose a researcher wants to know the impact of microfinance on the clients' household
income. S/he wishes to select 10 clients out of 250 clients and a research assistant is required to select
a random samples. Assuming that you are a research assistant, select a simple rand sample of 10
clients.
Solution:
1.
Number each client from 1 to 250 (based on alphabet of their names or identity numbers),
2.
Using the random numbers shown above, find the starting point. To find the starting point,
one generally closes one's eyes and places one's figure anywhere on the table. In this case, let
us select number 005 in the 6th row and 2nd column,
3. Going down the column and continuing to the next columns, select the first 10 numbers.
18
4. The numbers are 005, 042, 159, 049, 173, 172, 029, 221,213 and 205. Therefore, clients with
these numbers will be included in the sample for further analysis.
b)
Systematic Sampling (Quasi-random sampling): In systematic sampling, the elements to be
included in a sample are picked at a constant interval. That is, the items or individuals of the
population are arranged in some order and a random starting point is selected from 1 through k
(where k 
population size N
 ) and then every kth member of the population is selected for the
Sample size
n
sample.
In systematic sampling:
 A complete list of all the elements within the population (sampling frame) is required.
 The procedure is to take every kth item from the sampling frame.
 Let N= population size; n=sample size; k=sampling interval, k=N/n
 Choose any number between 1 and k. suppose it is j (1  j  k) .
 The jth unit is selected at first and then (j+k)th , then ( j+2k)th, …..etc. unit is selected until the
required sample size is reached.
Example 1: Suppose there are 2000 subjects in the population and a sample size of 50 subjects are
needed. Select a systematic sample of these 50 subjects.
Solution: The sampling interval (k) is 40 (2000/50). The number of the first subject to be included in the
sample is chosen randomly, for example, by blindly picking up one out of 40 pieces of paper numbered 1
to 40. Suppose subject 12 was the first subject selected, then the sample would consist of samples whose
numbers were 12, 52, 92, etc until 50 subjects (samples) are obtained.
It is obvious that a sample chosen this way is not strictly random since not all the members of the
population have an equal chance of being selected.
Example 2: Suppose a researcher wants to know the impact of microfinance on the clients' household
income. S/he wishes to select 10 clients out of 250 clients and a research assistant is required to select
systematic samples. Assuming that you are a research assistant, select a systematic sample of 10 clients.
Solution:
1. Number each client from 1 to 250 (based on alphabet of their names or identity numbers),
2. Since there are 250 clients and 10 are to be selected, the rule is to select every 25 th clients. This
rule is determined by dividing 250 by 10 which gives 25,
3.
The number of the first subject to be included in the sample is chosen randomly from numbers 1
to 25. In this case let us select number 5.
19
4. Then select every 25th number on the list starting from 5. The numbers include the following: 5,
30, 55, 80, 105, 130, 155,180, 205 and 230. Therefore, clients with these numbers will be
included in the sample for further analysis.
Note: The answer is not unique as it depends where the number of the first subject to be included is
picked.
Advantages of Systematic Sampling:
 Less time consuming and easier to perform than SRS,
 It is more convenient to use as compared to SRS,
 It provides a good approximation to SRS.
Disadvantages of Systematic Sampling:
 If there is any sort of cyclic ordering of the subjects, the samples will not be representative of the
population. Example: If subjects in the population are arranged in a manner such as:
1) Defective item
2) Non-defective item
3) Defective item
4) Non-defective item
5) etc,
The selection of the starting point could produce a sample of all defective items or non-defective
items depending on whether the number to be added (k) is even or odd.
Example: starting point =defective item +even k=all defective item in the sample and starting point
=non-defective item +even k=all non-defective items in the sample.
Example: Moha Company stores boxes containing Pepsi and Mirinda in the following order.
1) Box containing Pepsi
2) Box containing Mirinda
3) Box containing Pepsi
4) Box containing Mirinda
5) .
6) .
7) .
.
.
.
.
200)
The quality department of the company would like to check the expiry date of the products by taking a
systematic sample size of 40 boxes containing either Pepsi or Mirinda. Assume that you are working in
20
the quality department of the company, select the systematic samples required. Is the sample you selected
a representative?
21
c) Stratified Sampling: In stratified sampling, a population is first divided into subgroups, called
strata (singular stratum), and a sample is selected from each stratum based on simple random or
systematic sampling method. The strata are made according to various homogeneous characteristics
such as sex, race, region or institutional affiliation such as faculty. This sampling method is
appropriate when the distribution of the characteristic to be studied is strongly affected by certain
variables. Note: Stratified sampling is applied if the population is heterogeneous.
Stratified sampling can also be proportionate or non-proportionate. In the latter case, an equal
number of elements are drawn from each stratum while in the former case a proportionate number is
obtained.
a) Proportionate Stratified Sampling: Number of units selected from each stratum is directly
proportional to the size of the strata. If Pi represents the proportion of population included in
the stratum i, and n represents the total sample size, the number of elements selected from
stratum i is nxPi
Examples:
1)
Let us suppose that we want a sample size of 30 to be drawn from a population size of 8000
which is divided in to three strata of size 4000, 2400 and 1600. Adopting proportional allocation:
i. Find the sample sizes under each stratum.
Solution: We shall get the sample size for the different strata:
a. N1=4000, we have P1=4000/8000=0.5 and hence n1=n. P1=30*0.5=15
b. N2=2400, we have P2=2400/8000=0.3 and hence n2=n. P2=30*0.3=9
c. N3=1600, we have P3=1600/8000=0.2 and hence n3=n. P3=30*0.2=6
N= N1 +N2+ N3, P= P1 +P2 +P3=1 n1 +n2 +n3=15+9+6=30
Thus, using proportional allocation, the sample sizes for different strata are 15, 9 and 6
respectively which is in proportion for the sizes of the strata namely 4000:2400:1600.
2) In a class of students, you can stratify the whole class on the basis of gender (F or M) and you
would draw an equal number of students from each group (disproportionate) or an unequal number
of students from each group depending on the proportion of males to female in the original class list
(proportionate). Let us take a numerical example: If there are 50 students in a class of which 10 are
female and if 10 students are needed for some study,
a) select a proportionate stratified sample of 10 students (8M, 2F)
b) select a disproportionate stratified sample of 10 students (5M, 5F)
22
Advantage: The representation of the sample is improved
Disadvantages:
 If there are many variables of interest, dividing a large population in to representative
subgroups requires a great deal of effort,
 If variables are somewhat complex or ambiguous (such as beliefs, attitudes, etc), it is difficult
to separate individuals in to the sub groups according to these variables.
Example (class work): Using the population of 20 students given below, select a sample of 8 students on
the basis of gender (female/male) and grade level (freshman/sophomore).
S.N
Name
Gender
o
Grade
S.No
Name
Gender
level
Grade
level
1
Abebe
M
Fr
11
Melat
F
Fr
2
Bekele
M
So
12
Nigusie
M
Fr
3
Birtukan
F
Fr
13
Petros
M
So
4
Chaltu
F
Fr
14
Rosa
F
So
5
Dagmawit
F
Fr
15
Regassa
M
Fr
6
Dagne
M
Fr
16
Selam
F
Fr
7
Huluka
M
Fr
17
Solomon
M
So
8
Lulit
F
So
18
Tigist
F
So
9
Melaku
M
So
19
Tibeyin
F
So
10
Mohammed
M
So
20
Tirhas
F
So
Solution: Steps:
1) Divide the population in to two groups based on gender
2) Divide each subgroup further in to two groups of freshman and sophomore
3) Determine how many students need to be selected from each subgroup to have a
proportional representation of each subgroup in the sample. There are four groups and
since a total of eight students are needed for the sample, two students must be selected
from each subgroup.
4) Select two students from each group by using SRS or systematic sampling.
Solution: 1) Divide the population in to two groups based on gender as shown below:
23
Males
Females
S.No Name
1
2
3
4
5
6
7
Abebe K.
Bekele M.
Dagne K.
Huluka G.
Melaku J.
Mohammed A.
Nigussie K.
Gender Grade
Level
M
Fr
M
So
M
Fr
M
Fr
M
So
M
So
M
Fr
8
9
10
Petros L.
Regassa K.
Solomon K.
M
M
M
So
Fr
So
S.No
Name
Gender
11
12
13
14
15
16
17
Melat A.
Lulit L.
Birtukan L.
Rosa M.
Chaltu C.
Selam A.
Dagmawit B.
F
F
F
F
F
F
F
Grade
Level
Fr
So
Fr
So
Fr
Fr
Fr
18
19
20
Tigist M.
Tibeyin Y.
Tirhas W.
F
F
F
So
So
So
2) Divide each subgroup further in to two groups of freshman and sophomore as shown below:
Group 1
Group 2
S.No Name
Gender Grade Level
S.No
Name
Gender
Grade Level
1
Abebe K.
M
Fr
1
Melat A.
F
Fr
2
Dagne K.
M
Fr
2
Birtukan L.
F
Fr
3
Huluka G.
M
Fr
3
Chaltu C.
F
Fr
4
Nigussie K.
M
Fr
4
Selam A.
F
Fr
5
Regassa K.
M
Fr
5
Dagmawit B.
F
Fr
Group 3
Group 4
S.No Name
Gender Grade Level
S.No
Name
Gender
Grade Level
1
Mohammed A.
M
So
1
Lulit L.
F
So
2
Melaku J.
M
So
2
Rosa M.
F
So
3
Petros L.
M
So
3
Tigist M.
F
So
4
Solomon K.
M
So
4
Tibeyin Y.
F
So
5
Bekele M.
M
So
5
Tirhas W.
F
So
3) Determine how many students need to be selected from each subgroup to have a proportional
representation of each subgroup in the sample. There are four groups and since a total of eight students
are needed for the sample, two students must be selected from each subgroup.
4) Select two students from each group by using random numbers. In this case we can select the following
students: Group 1: Student 5 & 4, Group 2: Students 5 & 2, Group 3: Student 1 & 3, Group 4: Students
3 & 4.
24
5) The stratified sample then consists of the following students:
d)
S.No Name
Gender Grade Level
1
Nigussie K.
M
Fr
2
Regassa K.
M
Fr
3
Mohammed A.
M
So
4
Petros L.
M
So
5
Birtukan L.
F
Fr
6
Dagmawit B.
F
Fr
7
Tigist M.
F
So
8
Tibeyin Y.
F
So
Cluster Sampling: if the population is homogeneous and very large or resides in a large area, it
is costly and time consuming to take samples by using the three methods just mentioned above. In this
case, we divide the population in to groups called clusters and then we select representative clusters
randomly. Finally, the samples will be taken from the sample clusters. We can take either all members
of the sample clusters or we may select samples from the clusters by using other sampling techniques.
Procedures:
1) The reference population is divided in to clusters or subgroups, preferably similar in size,
2) A sample of the clusters is taken by random or systematic sampling,
3) All the units in the selected clusters are then studied or we may select samples from each
cluster. If part of the elements in each cluster is included in the sample, then the procedure is
called two stage sampling. The first stage is selecting a sample of clusters and the second stage
is selecting a sample of elements from each cluster.
Advantage:
 A list of all individual study units in the reference population is not required.
 Reduces cost
 simplify field work and it is convenient
Disadvantage:
 The members of the clusters are often more homogeneous than the members of the whole
population and therefore, it may not be representative.
 The elements in a cluster may not have the same variation in characteristics as elements selected
individually from the population
25
e) Multi-Stage sampling: is a sampling technique that is used when the reference population is large and
widely scattered. Selection of samples is done in stages until the final sampling unit is obtained. The number
of stages of sampling is the number of times a sampling procedure is carried out. The primary sampling unit
(PSU) is the sampling unit in the first sampling stage and the secondary sampling unit (SSU) is the sampling
unit in the second sampling stage, etc. For example: the PSU can be the weredas, the SSU can be the kebeles,
etc. From PSUs, we can select samples based a suitable method and each of these selected PSUs is further
sub-divided in to second stage units (say kebeles) and from these SSUs again a sample is taken by some
suitable methods. Further stages may be added if required.
Example:
Multistage sampling procedure was used to conduct a research entitled “Health Service Utilization in
Amhara Region of Ethiopia.”
Procedures followed:
Previous provinces of Gondar, Gojjam, and Wollo are divided in to two zones.
One of the two Gondar zones, one of the two Gojam zones and one of the two Wollo zones were
randomly selected. Later one more zone, North Shoa was included (total four zones).
Two districts from all the zones except the North Shoa (one district only) were selected (Total
seven districts).
Two rural and one urban kebeles were chosen from each selected district were considered (14
rural kebeles and 7 urban kebeles).
Advantages
•
Cuts the costs of preparing sampling frame.
Disadvantages
•
Gives less precise estimate than SRS for the same sample size
2.3.2. Methods of Non-Probability Sampling
Non-Probability Sampling: In non-probability sampling, not every unit in the population has a chance
of being included in the sample and the process involves at least some degree of personal subjectivity
instead of following predetermined, probabilistic rules for selection. This sampling technique is:
 Used when a sampling frame doesn't exist,
 It is non-random selection (unrepresentative)
 Inappropriate if the aim is to measure variables and generalize findings
 Easier, quicker and cheaper to carryout than probability designs.
There are three non- probability sampling methods. These are:
26
a) Convenience Sampling: is a method in which a sample is chosen with ease of access being the
primary concern. Example: Interviews conducted in convenient locations such as student lounge.
b) Purposive (Judgmental) Sampling: the researcher exercises deliberate subjective choice in
drawing samples what s/he regards as more informative for a study undergoing.
c) Quota Sampling: is a method that ensures that a certain number of sample units from different
categories with specific characteristics are represented. Here, judgmental and convenience
sampling methods are combined. Quota sampling can be applied for affirmative action. Example:
Suppose we know that 54% of the adults in a community are females, and the study requires 100
respondents as a sample. In quota sampling, we might interview the first 54 females and the first
46 males.
2.4. Errors in Sampling
There are two types of errors:
1. Sampling error: is the discrepancy between the population value (parameter) and sample
value (statistic). It may arise due to inappropriate sampling technique applied. It can be
minimized by increasing the size of the sample. When n = N, sampling error = 0
2. Non-sampling error (bias): are due to procedure bias such as:
 Subjects’ non-response
 Due to incorrect response
 Problem with sampling frame
 Measurement error
 Errors at different stages in processing the data.
Ways to reduce data error
 Ensure that survey instruments are well prepared, simple to read, and easy to understand.
 Properly select and train interviewer to control data gathering bias or error.
 Use sound editing, coding, and tabulating procedures to reduce the possibility of data
processing error.
Review Exercises
1) What are the reasons of sampling? Discuss and give example for each reason.
2) Differentiate between parameter and statistic. Which one is the result of taking a sample?
3) Define systematic sampling and explain how it is carried out. Describe how you would obtain a
systematic sample of 80 students from a population of 1600 students.
4) Briefly explain the difference between the following concepts and give examples, if necessary.
Sampling vs. Census
Cluster sampling vs. Stratified sampling
Sampling frame vs. Sampling unit
27
5) Assume that you are going to undertake research on the Ethiopian culture. Before taking a
sample, you observed that the culture is too diversified and large in number. Which type of
sampling method you are going to use so that your samples will represent the whole cultures.
Why?
6) Briefly explain cluster sampling. In which type of population it is preferred to select the samples
from the population?
7) Assume that there are 500 students in FBE, DMU in five departments with students' size of 150,
100, 50, 150 and 50. Assume that 20 students are to be selected from these five department
students for scholarship based on probability sampling. Further assume that students from all
departments have equal chance of being selected, i.e., departments with large number of students
will send more students than others. If you are assigned to select 20 students from FBE, then
a) Which type of sampling method you are going to use?
b) Determine the sample size to be selected from each department.
8) To study the reaction of students to a policy issued by a college, a sample of 100 students is
required. The number of male students is 1000 and the number of female students in the college is
1500. If you want to select your sample of 100 students using a proportional allocation, how
many students of each sex should you include in your sample?
9)
Suppose you are a Woreda administrator having five kebeles with respective population size
10000, 5000 15000, 20000, and 50000. If you are supposed to select 1000 representatives of the
Woreda, determine the number of individuals to be selected in each Kebele so that your selection
to be fair.
10) Classify each of the following samples as simple random, systematic, stratified or cluster
a. In a large school district, all teachers from two buildings are interviewed to determine
whether they believe the students have less homework to do now than in the previous
years.
b. Every 7th customer entering a shopping mall is asked to select his or her favorite shoes.
c. Nursing supervisors are selected using random numbers to determine annual salaries.
Choose the best answer from the given choices and encircle it.
1. Which of the following is not a reason for sampling?
a.
The destructive nature of certain tests
b.
Sometimes census is impossible
c.
The adequacy of sample results
d.
None
28
2. If n=N then, the sampling error is:
a. Less than zero
b. Greater than zero
c. Equal to zero
d. None of the above
3. Which of the following is a method of non-probability sampling?
a. Simple Random sampling
b. Systematic sampling
c. Stratified sampling
d. Quota Sampling
4. A sample size
a) Is the number of sampling units included in the sample
b) Has more than 30 observations
c) Is usually identified as n
d) All of the above
5. In a simple random sample
a) Every kth item is selected to be in the sample
b) Every item has a chance to be in the sample
c) Every item has the same chance to be in the sample
d) All of the above
29
CHAPTER THREE:
DATA COLLECTION AND PRESENTATION
Chapter – Objectives:
When you have completed this chapter, you will be able to:
 identify the types of data,
 identify the sources of data,
 convert raw data into a data array,
 organize data using frequency distributions,

visually represent data using graphs and charts.
3.1. Data collection
3.1.1 What is data?
 Data are facts/values that variables will assume.
 Data are a raw fact that will be used to draw a conclusion or make a decision.
 It is a raw numerical description of a variable ready to be analyzed which is obtained by
measuring or counting.
 In research, statisticians use data in many different ways. Data can be used to describe situations
or events or to make an inference
3.1.2 Classification of data
Data are classified as:
i)
quantitative or qualitative data
ii) Primary or secondary data
iii) Time series or cross sectional data
i)
a.
quantitative or qualitative data
Quantitative data are data that is expressed numerically or they are numerical observations of variables.
Example: age, Grade Point Average (GPA), Sales, etc. Valid computations such as mean, variance, etc are
possible in the case of quantitative data.
b.
Qualitative data: data that is non-numeric. Example: marital status (married single, widowed, divorce), race
(Asian, African, etc), gender (male/female), blood type (A, B, O, AB). Valid Computation: Proportions in
each category are possible, Example. What percent of students in this class is female?
ie we can count that
it is not countig
30
ii)
Primary or Secondary data
a) Primary Data

Data originally collected by the researcher for the purpose/problem at hand.

Data generated from primary source of data.

Data that are collected by the investigator himself for the purpose of a specific inquiry or
study.
b) Secondary data

When an investigator uses data, which have already been collected by others, such data are
called “secondary data.”

it isa secondary
used by
other
body
Data generated from
source
of data.

Data generated by someone else for some other purpose.

The secondary data can be obtained from journals, reports, government publications,
publications of professionals and research organizations, internet, videos, library, etc.
sourse 
One must be very careful before using secondary data as it may contain errors like
transcribing errors, estimating errors, errors due to bias, etc
iii)
Cross sectional or time series data
a. Cross Sectional Data: A data collected from a population at a given point in time.
Example: The data collected on household of a town in 2001 can be presented as a cross sectional
data as follows:
Observation Monthly
Number
household
it focous ato abeb incameis in month 2000birr
income in Birr
1
200
2
300
3
189
b. Time series data: Data collected overtime on one or more variables.
Example:
Year
Unemployment rate
1950
5%
1951
8%
31
1952
10%
3.1.3. Methods of data collection
Sources of data
There are two sources of data: These are primary sources and secondary sources.
i.
Primary sources
 It is source of data that provide first hand information for the use of immediate purpose.
 Data collected from primary sources are called primary data.
 Data collected from primary sources are new data which had not existed before and for
which the researcher received full credit.
ii. Secondary sources
 Individuals or agencies which provide data originally collected for other purpose by them or by
others.
 Usually they are published or unpublished materials, records, reports, magazines, market reports,
etc.
 Data which is not originated by the investigator himself but which he gets from some one’s
records.
 Compared to primary data, which is costly but accurate and more reliable, a secondary data is less
costly and less accurate.
 Primary data at some time can be secondary if someone else uses it.
 Secondary sources exist as storage of previously collected information. Example: Archival or
library sources, published books, unpublished documents, videos, internet, annual reports,
statistical abstracts, census of population, economic censuses, etc.
Methods of collecting primary data
a) Survey research
b) Experimentation
c)
Observational research
a) Survey research
In survey research, we communicate with a sample of individuals to generalize on the characteristics of the
population from which the samples were drawn.
Types of surveys: Three most common surveys are:i. The mail survey: It can be electronic mail (e-mail) or through the post office.
32
Questionnaire is a set of questions printed on a paper. Questionnaires: - are groups or sequences of questions
designed to collect data upon a subject. The questionnaire is either filled out personally by the respondent or
administered and completed by interviewer.
Types of questions:
 Multiple choice
 Dichotomous (having only two choices (yes/no, female/male, etc).
 Open – ended (where the respondents are free to give any responses).
Characteristics of mail survey

If one drafts a detailed questionnaire, it can be mailed to the respondent for filling or can be
put in charge of enumerators who go around and fill them after obtaining the desired
observation.

It is relatively less costly

The individual should be literate to give an appropriate response

Non-response error may be high if mailing is costly.

This survey can be used to cover a wider geographic area than telephone surveys or personal
interviews since mailed questionnaire surveys are less expensive to conduct.

It has low number of responses and inappropriate answers to questions.

It has low return rate.

Some people may have difficulty in reading or understanding the question.
Enumerator method
 Here, a questionnaire is designed but selected agents called ’enumerators’ do the task of
filling the questionnaire.
 The method can be adopted even if the respondents are illiterate
 It is more expensive than the mailed questionnaire method.
 Non-response is low.
ii.
The personal interview: It is an oral questioning of respondents either individually or in group.
Characteristics of personal interview

It tends to be relatively expensive and time consuming and hence not ideal to large group of
informants.

It offers a lot of flexibility in allowing the interviewer to explain questions, to probe more deeply in to
the answers provided.
face to face

It more accurate and reliable.

It maximizes trust and cooperation between interviewer and the interviewee.

It has a higher rate of response.

It decreases refusals.

The investigator presents himself personally before the informant and questions carefully.
33

It is useful in situations where great depth study is required.

In face to face interview, the interviewer can see and assess the respondent’s non–verbal behaviors.

Face – to- face interviews can take place with respondents who don’t have phones or the ability to read
a mailed questionnaire.
iii. The Telephone interview
Characteristics of telephone interview

It is similar to the personal interview, but uses telephone instead of personal interaction.

It makes it possible to complete a study in a relatively short span of time.

It has high response rate.

It is less effective in a community with few number of telephone lines.

It is less costly than personal interview.

A major drawback is that some people in the population may not have phones or may not be at
it isnot aqiurit
home when the calls are made. Hence, not all people have a chance of being surveyed.
b) Experimentation
 We record the results of our experiment.
 In experimentation, researchers are interested to identify the cause and effect relationships
between variables.
c) Observational Research

We see what is happening and record it. E.g. traffic accident, etc

Observation relies on watching or listening, then, counting or measuring.

There are no respondents.

It is time consuming/expensive.
3.2. Data Presentation
3.2.1. Tabular Methods of Data Presentation
Tabulation is the arrangement of information or data in tables. There are various techniques of tabulation.
a) Data Array is a table showing data arranged in descending or ascending order.
2

Descending (100, 99, 98, 97 ……..)

Ascending (1, 2, 3,4,5,6,7,8,9 …………)
Examples:

An alphabet list of post office renters can be considered as a data array of qualitative information.

A list of monthly income recorded for several years and arranged in descending or ascending order is a data
array.
In general, the data array offers a number of advantages:
34
a)
We can determine at a glance the highest and lowest values contained in the data.
b) We can identify groups of similar data values.
c)
We can easily see differences between values in the data.
Given the following data set on Household Income (raw or ungrouped data)
112
100
127
120
134
105
110
118
109
112
110
118
117
116
118
114
114
122
105
109
107
112
114
115
118
118
122
117
106
110
116
108
110
121
113
119
111
120
104
110
120
113
120
117
105
118
112
110
114
114
Data Array
Ascending order
100
110
112
116
119
104
110
113
117
120
105
110
113
117
120
110
114
117
120
105
110
114
118
120
106
110
114
118
121
107
111
114
118
122
108
112
114
118
122
109
112
115
118
127
109
112
116
118
134
minmam105
value
Descending order (take the reverse)
Maximum data value = 134
Minimum data value = 100
maximam value
Range = 134 – 100 = 34
b) Frequency Distribution
A frequency distribution is a table that group data in to non-overlapping intervals called classes and records the
number of observations in each class. The frequency distribution summarizes data in a condensed form that can be
readily understood and easily interpreted.
Key Terms in frequency distribution
Class each category of the frequency distribution is called a class.
Frequency is the number of data values falling within each class.
Total frequency: - the sum of class frequencies.
35
:
:
xi  x1 , x2 ...........xn  class
f i  f1 , f 2 ........... f n  frequency
n
+…+
= total frequency. It implies
f
i 1
i
= total frequency = n = number of observation (sample
size)
Class Limits are the boundaries for each class. These determine which data values are assigned to that class. Class
limits can be lower or upper class limits and they have the same decimal value as the data value.
Class boundaries are those limits which are determined mathematically so that no gap exists between classes. It is
also called true class limits.
Class interval is the width of each class. This is the difference between the lower limits/upper limit of the class and
the lower limit/upper limit of the next higher class.
range
number of classes desired
Range  Maximum value - minimum value
Approximate class width 
Class Mark is the midpoint of each class. This is mid way between the upper and lower class limits.
Guidelines for the frequency distribution
In constructing a frequency distribution for a given data set, the following guidelines should be followed.
a)
The set of classes must be mutually exclusive. That is, a given data value should fall into only one
class/category. There should be no overlap between classes and limits such as the following would be
inappropriate:
Class
frequency
15-20
4
20-25
5
This is not allowed, since a value of 20 could fit in to either class and a clear boundary has to be set.
Class
frequency
17.0-23.5
5
22.0-28.5
10
This is not also allowed, since there is an overlap between the classes. If we have a data value say 22 in which class
shall we group it? In both classes, avoid this problem.
36
b) The class must be exhaustive. That is, we have to include all possible data values. No data value should fall
outside the range covered by the frequency distribution. In the data set given above, maximum data value = 134,
minimum data value = 100. If the last class contains a class limit of 128-133, then it is not exhaustive
(complete) as the maximum value (134 is not included in the classes).
c)
If possible, the classes should have equal widths. Unequal class widths make it difficult to interpret both
frequency distribution and their graphical presentation. One exception occurs when there is an open-ended
distribution i.e., it has no specific beginning value or no specific ending value.
Example:
class
< 10
(meaning that any value below 10 will be tallied in this class)
10 - 20
21 – 31
32 – 42
43 – 53
54 – 64
>65 (means values above 65 will be tallied in the last class)
Generally, in open – ended classes, the lowest class lacks a lower limit or the highest class lacks an upper limit.
Open – ended classes are classes with either no lower limit or no upper limit.
d) Selecting the number of classes to use. There is no hard and fast rule to determine the number of classes of a
data set but it is a subjective process. If we have too few classes important characteristics of the data may be
buried within the small number of categories. If there are too many classes, many categories will contain either
zero or a small number of values. In general 5 to 20 classes will be suitable or recommended.
e)
f)
When possible, class widths should be rounded numbers (e.g. 5, 10, 25, 50,100 etc)
It possible, avoid using open – ended classes.
Example: The following frequency distribution is given
Births (per 1000 population)
10-15
Number of countries (f)
29
15-20
8
20-25
10
25-30
12
30-35
10
35-40
4
40-45
18
45-50
look
50-55
nubmer class
12
2
data or freqace
105
Take the class limit (20-25)=[20, 25)=20≤X<25. All values within are at least 20 but less than 25.
37
20  x  25 =10, is the number of countries with a birth rate in this category.
Frequency of
Class interval/width is the difference between the lower class limit and that of the next higher class. (25 – 20 = 5)
Class mark =

lower limit  upper limit 25  20

 22.5
2
2
Types of frequency distributions
There are three types of frequency distribution tables.
looke
These are:a)
the absolute frequency;
b) the relative frequency;
c)
the cumulative frequency .
a) Absolute frequency: An absolute frequency distribution table shows the absolute number of occurrences
of an entry or groups of entries in a data set. To construct an absolute frequency distribution table, list all
the scores in the first column and count the number of times each score occurs in the original data set.
Record this against each item in the second column.
b) Relative frequency: The relative frequency distribution table shows the number of occurrence of each item
or class of items in the data set as a proportion of the total number of observation. This can be expressed in
decimal, fraction or percentage form.
=
where n is total number of observations, RF=
Relative frequency, AF = Absolute Frequency, TF = total Frequency (number of observations that is, n)
c)
Cumulative frequency: The cumulative frequency distribution table shows the absolute frequency of
occurrence added at each successive class in the data set. Alternatively one can use the relative cumulative
frequency table based on relative frequencies.
Given the following frequency distribution
Class
Class
Absolute
Cumulative
Relative
Cumulative
boundaries
frequency
frequency
frequency
Relative
Limits
frequency
24-30
23.5-30.51
3
3
3/25
3/25
31-37
30.5-37.5
1
4
1/25
4/25
38-44
37.5-44.5
5
9
5/25
9/25
45-51
44.5-51.5
9
18
9/25
18/25
52-58
51.5-58.5
6
3*1=4 24
6/25
24/25
59-65
58.5-65.5
1
25
1/25
25/25
Total
25
1
1
[23.5-30.5)--implies all values within are at least 23.5 but less than 30.5
38
25/25=1
n
 f i  n  25,
i 1
n
fi
n
1
< 37.5 = 4
30.5 = 22
i 1
The class boundaries in the second column are used to separate classes so that there are no gaps in the frequency
look
distribution. The basic rule of thumb is that the class limits should have the same decimal place value as the data, but
the class boundaries should have one additional place value and end in a 5. Example: lower limit – 0.5 = 31-0.5 =
30.5 => lower boundary
upper limit +0.5 = , 37+0.5 = 37.5 => upper boundary
The “less than” and “more than” cumulative frequencies
The “less than” cumulative frequency of a class is the total frequency of all values less than the upper boundary of
the class and the “more
lookthan” cumulative frequency of a class is the total frequency of all values which are greater
than the lower boundary of the class.
Example:
Class Limits
100-104
105-109
110-114
115-119
120-124
125-129
130-134
Class
boundaries
99.5-104.5
104.5-109.5
109.5-114.5
114.5-119.5
119.5-124.5
124.5-129.5
129.5-134.5
Total
Upper
boundaries
104.5
109.5
114.5
119.5
124.5
129.5
134.5
Absolute
frequency
2
8
18
13
7
1
1
50
Relative
Frequency
0.04
0.16
0.36
0.26
2/50=0.04
0.14
0.02
0.02
1
Less than
cumulative
frequency
2
10
28
41
48
49
50
Lower
boundaries
99.5
104.5
109.5
114.5
119.5
124.5
129.5
More than
cumulative
frequency
50
48
40
22
9
2
1
Example: The following data is given on a monthly household income of a community, construct a frequency
distribution and calculate
a)
The absolute, relative and cumulative frequencies
b) The less than and the more than cumulative frequencies
c)
Interpret the values found at (a) and (b) above
Date set
112
100
127 120 134
105 110 118
109
112
110
118
117 116 118 114 114 122
105
109
107
112
114 115 118 118 122 117
106
110
116
108
110 121 113 119 111 120
104
110
39
120
113
120 117 105 118 112 110
114 114
n = 50
Solution: Steps:
1.
Array the data
2.
Determine the number of classes
Rule of thumbs
i)
Recommended number of classes (based on number of observation)
Number of observation
Number of classes
<50
5–7
50 – 200
7–9
200 – 500
9 – 10
500 – 1000
10 – 11
1000 – 5000
11 – 13
5000 – 50,000
13 – 17
17 – 20
> 50,000
So, the recommended number of classes for this data set can be 7.
ii) We could use the Sturge’s formula to determine the number of classes
(k): k  1 3.322 log n where n is the number of observations.
In this case, k =1+3.322log50, log 50 = 1.7 = 1+3.322x1.7=1+5.64 = 6.64
iii) Apply the
rule: This guide suggests you to select the smallest number (k) for the
number of classes such that
n = 50,
7
is greater than the number of observations.
= 32, 32 < 50,
= 64 > 50, so the recommended number of classes is 6.
3. Determine the class interval /width
Width = Range/Number of class
Highest value = 134
Lowest value = 100
Range = 134 – 100 = 34, k recommended = 7, width =
the nearest whole number 4.9
= 4.9 (round the answer up to
)
4. Select a starting point for the lowest class limit. This can be the smallest data value or
any convenient number less than the smallest data value. In this case let us use 100 as a
starting point. Add the width to the lowest score taken as the starting point to get the
lower limit of the next class. Keep adding until there are 7 classes. Subtract one unit from
40
the lower limit of the second class to get the upper limit of the first class. Then add the
width to each upper limit to get all the upper limits.
105 – 1 = 104
1st class = 100 – 104
2nd class = 105 – 109, etc
Find the class boundaries by subtracting 0.5 from each lower class limit and adding 0.5 to
each upper class limit.
99.5 – 104.5 = 99.5 ≤ x < 104.5, [99.5, 104.5), half closed interval.
104.5 – 109.5 = 104.5 ≤ x< 109.5, [104.5, 109.5)
5. Tally the data
6. Find the frequency from the tallies. The completed frequency distribution is given as:
Class
Upper
boundaries
boundaries frequency Frequency frequency boundaries frequency
100-104
99.5-104.5
104.5
2
0.04
2
99.5
50
105-109
104.5-109.5
109.5
8
0.16
10
104.5
48
110-114
109.5-114.5
114.5
18
0.36
28
109.5
40
115-119
114.5-119.5
119.5
13
0.26
41
114.5
22
120-124
119.5-124.5
124.5
7
0.14
48
119.5
9
125-129
124.5-129.5
129.5
1
0.02
49
124.5
2
130-134
129.5-134.5
134.5
1
0.02
50
129.5
1
50
1
Class
Absolute
Relative
Less than
Lower
More than
Limits
Total
* Note that the sum of the relative frequencies is always 1 or 100%. That is,
Proof:
fi
f1
n
 1 Therefore,
n
then
f2
f3
 ( n )  n  n  n  ......... 
fi
 ( n )  1.
f  f 2  f 3  ......  f n
f n f1  f 2  f 3  ......  f n
n ,

and 1
n
n
fi
( n ) 1
c) Interpretation

31 (18+13) of the households earn a monthly income from birr 110 – 119
41

62% of the households earn a monthly income from birr 110 – 119 (31/50*100%)

28 of the households earn a monthly income less than birr 114.5

40 of the households earn a monthly income at least birr 109.5
Note: One can construct several different but correct frequency distributions for the same data by using:

a different class width,

a different number of classes or

a different starting point
The reasons for constructing a frequency distribution are:
a)
To organize the data in a meaningful way
b) To enable researchers to draw charts and graphs for the presentation of data.
c)
To enable a reader to make comparisons among different data sets.
42
3.2.2. Graphic Method of Data Presentation
After the data have been organized into a frequency distribution, they can be presented in graphical form. Why
graphs? Graphs are used to:
Convey the data to the viewers in pictorial /graphic form,

Get the audiences’ attention in a publication or a speaking presentation,

Discuss an issue, reinforce a critical point, or summarize a data set,

Make more understandable than data presented in tables and frequency distribution,

Discover a trend or pattern in a situation over a period of time.
The three most common used graphs in research are:a) The Histogram
b) The frequency polygon
c) The cumulative frequency graph or O-give (pronounced as o -jive )
a) The Histogram: - is a graph that displays the data by using adjacent vertical rectangles (unless frequency of a
class is zero) of various heights to represent the frequencies of the classes. That is, in a histogram the class
boundaries are marked on the horizontal axis and the class frequencies on the vertical axis. N.B: The length of
adjacent rectangles of a histogram (a long the y-axis) can be the absolute or relative frequencies of a class. The
tallest rectangle in a histogram is associated with a class having the greatest number of observations (frequencies).
Example-1: Construct a histogram given the following frequency distribution.
Class
Absolute
boundaries
frequency
99.5-104.5
2
104.5-109.5
8
109.5-114.5
18
114.5-119.5
13
119.5-124.5
7
124.5-129.5
1
129.5-134.5
1
Total
50
Solution: Steps:
1) Draw x – y axis
2) Label the class boundaries on the x – axis and the frequency on the Y – axis.
3) Using the frequencies as the heights, draw vertical bars for each class
43
 The class with the greatest number of data values (18) is 109.5 – 114.5
 We should also know that we would have reached the same conclusions and the shape of the histogram
would have been the same had we used a relative frequency distribution instead of the absolute (actual)
frequencies. The only difference is that the vertical axis would have been reported in percents (proportions)
of households instead of the number of households.
b) The frequency polygon :The frequency plygon consists of line segments connecting the points formed by the
interesection of the class marks with the class frequencies. Relative frequencies or percentages may also be used in
constructing the figure. Empty classes are included at each end so the curve will intrsect the X – axis. Using the
frequency distribution given in example 1 above, construct a frequnecy polygon.
Solution:Steps
1. Find the class marks
Class boundaries
Class mark
Frequency
99.5 - 104.5
102
2
104.5 - 109.5
107
8
109.5 - 114.5
112
18
114.5 - 119.5
117
13
119.5 - 124.5
122
7
124.5 - 129.5
127
1
129.5 - 134.5
132
1
2. Draw the x – y axis. Label the x – axis with the class marks and use a suitable scale on the y – axis for the
frequencies (absolute or relative).
3. Connect the coordinated (x,y) with line segments.
44
The cumulative frequency graph ( o-give): The o-give is a graph that displays cumulative values for frequencies,
relative frequencis or percentages. These values can be either “more than” or “ Less than”
Example: construct an o-give for the frequency distribution given in example 1 above.
Solutions : Steps
1.
Find the cumulative frequency for each class
Less than
cumulative
Class boundaries
frequency
found by
99.5 - 104.5
2
2+0
104.5 - 109.5
10
2+8
109.5 - 114.5
28
2+8+18
114.5 - 119.5
41
2+8+18+13
119.5 - 124.5
48
2+8+18+13+7
124.5 - 129.5
49
2+8+18+13+7+1
129.5 - 134.5
50
2+8+18+3+7+1+1
2.
Draw the x – y axis and lable the x– axis with the class boundaries and y – axis with the cumultive
frequencies.
3.
Plot the cumulative frequency at each upper class boundary. Upper class boundaries are used since the
cumulative frequencies represent the number of data values accumulated upto the upper boundary of each
class.
45
Cumulative frequency graphs (less than cumulative frequency) are used to visually represent how many values are
below a certain upper class boundary. For example, to find how many households earn less than 114.50 birr, we can
locate 114.5 birr on the x – axis, draw a vertical line up until it intersects the graph, and then draw a horizontal line
at the point to the y – axis. The value is 28 households.
The “More than” Cumulative Frequency (More than the lower boundary)
Lower boundaries
CF
more than 99.5
50 (∑fi)
more than 104.5
48 (∑fi-2)
more than 109.5
40 (∑fi-10)
more than 114.5
22 (∑fi-28)
more than 119.5
9 (∑fi-41)
more than 124.5
2 (∑fi-48)
more than 129.5
1 (∑fi-49)
Note: The abscissa (x-value) of the point of intersection of the two o-give curves (less than and more than) gives the
median of the given data.
3.2.3. Other Methods of data presentation
a) Line graphs
b) Bar charts
c)
a)
Pie – charts
Line graphs (charts): Line charts are particularly effective for business and economic data because we can
show the change or trends in a variable overtime. Time series data are most effectively presented on a line chart.
The variable of interest, such as the number of units sold or the total values of sales, is scaled along the y – axis
and time along the x – axis. Line graphs are widely used by investors to support decisions to buy and sell stocks
and bonds in the financial market. The idea is to try to show a trend that will likely continue into the future, and
to use that pattern to make accurate prediction for the immediate future.
46
Example: Given the following data on unemployment rate over of a country from 1992 to 2000
Year
NB:
Two
or
more
Unemployment rate
1992
14.80%
1993
13.70%
1994
11%
1995
10.20%
1996
11.30%
1997
12.40%
1998
13.50%
1999
14.60%
2000
15.70%
series of data can be plotted on the same line chart. Thus a chart can show the trend of several different variables and
this allows for a comparison of several series over the same period of time.
b) Bar Charts: This is used when the horizontal axis deals with information that is qualitative or non –
continuous in nature, e.g. Gender, Marital status, etc. When we represent data using bar charts, the
bars are not joined together. All the bars must have equal width and the distance between bars must
be equal
Example
Education level
Earnings/year
High school Diploma
22,895.00
Bachelor Degree
40,478.00
Master’s Degree
73,165.00
47
c) Pie – Chart: - is useful for displaying a relative frequency distribution. A circle is divided
proportionally to the relative frequency and portions of the circle are allocated for the different
groups. Example: Samples of 200 athletes were asked to indicate their favorite type of running shoe.
it show cricle
Draw a pie-chart based on the following data.
Number
Relative
Angle
Types of shoe
of athletes
frequency
Percent
Nike
92
0.46
46%
46% x 3600 = 165.60
Adidas
49
0.245
24.50%
24.5% x 3600 = 88.20
Reebok
37
0.185
18.50%
0.185 x 3600 = 66.60
Asics
13
0.065
6.50%
0.065 x 3600 = 23.40
Other
9
0.045
4.50%
0.045 x 3600 = 16.20
Total
200
1
100%
48
3600
Review Exercises
Multiple Choice Questions
1) To find the class mark
a. We have to divide the class interval in to half
b. We have to find the average of the lower and upper class limits in a class
c. We have to divide the upper class limit in to half
d. All are true
2) One of the following is not true?
a. The sum of the relative frequencies is always 1
b. Telephone interview is an example of primary source of data
c. Face-to face interview is less costly than the mail survey
d. Internet is a secondary source of data
3) In a frequency distribution, the categories/classes must
a) Be mutually exclusive and exhaustive
b) Have at least 5 observations
c) Be of the same size
d) Contain open ended classes
4) To determine the class interval (width)
a) Divide the class frequencies in half
b) Divide the class frequency by the number of observations
c) Find the difference between consecutive lower class limits or upper class limits
d) Count the number of observations in the class
5) The class frequency is
a) The number of observations in each class
b) The difference between consecutive lower class limits
c) Always contains at least 5 observations
d) Usually a multiple of the lower limit of the first class
6) A research organization is making a study of the selling price of personal computers (PCs).
There are 45 PCs in the study. How many classes would you recommend? (Apply the 2
rule). a) 10 b) 20 c) 6 d) 3
7) To convert a frequency distribution to a relative frequency distribution
a) Find the difference between consecutive lower class limits
b) Divide the absolute frequency by the total number of observations
c) Divide the lower limit of the first class by the class interval
49
k
d) Multiple the class frequency by 100
8) Which one is not correct?
a) Pie-chart is important to show the trends or changes in a variable overtime.
b) A line obtained by taking class marks the y-axis and class limits/boundaries on the x-axis
is called frequency polygon.
c) Line chart is important to show the trends or changes in a variable overtime.
d) None of the above
9) Data which are collected as afresh and they happen to have original characteristics are
a) Chronological data b) Secondary data c) Quantitative data d) Primary Data
10) The difference between a histogram and a bar chart is:
a) The midpoints are connected with a histogram but not with a bar chart
b) The bars must be next to each other on a histogram and separated in a bar chart
c) Cumulative frequencies are required in a bar chart
d) None of the above
Workout/Short answers/ explanations
1) If the maximum and minimum heights of students in a class are 1.90m and 1.40m, and if it is
desired to group the students in to five classes based on their height, what will be the size of the
class width?
2) Explain the terms: Primary data and secondary data. Give some illustrations.
3) Define the following concepts:
a
Frequency Distribution
b
Frequency
c
Relative Frequency
d
Less than Cumulative Frequency
e
More than Cumulative Frequency
f
Class Mark
g
Class Width
4) From a certain frequency distribution table, if the 3rd class upper class boundary and lower class
limit are 20.5 and 16 respectively, determine the class mark of the 3rd class.
50
5) Given the following frequency distribution:, find the values of x , y z, a, b, and c
Class limit
Absolute frequency
Cumulative Frequency
Relative Frequency
100-104
2
2
0.02
105-109
8
10
0.08
110-114
X
Y
z
115-119
10
a
0.1
120-124
20
b
0.2
125-129
15
c
0.15
130-134
20
100
0.2
6) The following table is a grouped frequency distribution of money spent per visit by a random
sample of 100 customers a department store.
Amount Spent (in birr)
Number of customers
3-7
10
8-12
30
13-17
35
18-22
20
23-27
5
100
i)
State for each of the above classes
a. The class limits
b. The class boundaries
c. The class marks
d. The class width
ii)
Construct
a) A histogram
b) The cumulative frequency distribution
c) The relative frequency as well as the relative cumulative frequency distribution
iii)
If possible, find the number of customers who spent:
a) At most Birr 12.50
b) Birr 12.50 or more
51
c) Less than Birr 12.50
d) At least Birr 17.50
e) Exactly Birr 12
7) A distribution has a constant class width with 6 classes and 8 as class mark of the second class.
a) If the class mark of the 4th class is 18, find
i)
The class width
ii) The class limits and class boundaries of the distribution
8) The balances of payment (BOP) of Ethiopia over the years 1985-1990 were as follows.
Year
BOP (million)
1985
-40
1986
0
1987
10
1988
-5
1989
20
1990
30
Present the above data by an appropriate graph.
52
Chapter Four
Measures of Central Tendency
Chapter Objectives
When you have completed this chapter, you will be able to:
 Calculate the arithmetic mean, the weighted mean, the geometric mean, the harmonic mean, the
median and the mode for ungrouped and grouped data;
 Explain the characteristics properties/uses of each measure of central tendency;
 Identify the position of the mean, median and mode for symmetric and skewed distributions;
 Understand other measures of location (quartiles, deciles and percentiles).
4.1. Introduction
In this chapter, we shall continue to develop methods to describe data by finding a typical single value to
describe a set of data. We refer to this single value as a measure of central tendency. Measures of
central tendency describe a distribution near its center. They provide indications on middle values or most
likely or most frequent values. In other words, they tell us where the center of the distribution of the data
is located.
The Summation Notation
Often statistical formulae require the addition of many variables. Summation or sigma notation is a
convenient and simple form of shorthand used to give a concise expression for a sum of the values of a
variable. In statistics, the symbol
x
i
x
i

(Greek letter sigma) means to add or find the sum. For example,
means to add the numbers represented by the variable X. Thus, if X represents 5,2,8,4, and 6, then
5
=5+2+8+4+6=25. Sometimes a subscript notation is used, such as:
x
i 1
i
. This notation means to
find the sum of five numbers represented by X. This notation is read as follows: sum the values of Xi
5
from X1 through X5.
x
i 1
i
 x1 x2  x3  x4  x5 .
53
Generally,
In order to make formulas more general, variables can be used with the summation notation. For
n
example,
x
i 1
i
means to sum up values of X from 1 to n where n can be any number. Often an
abbreviated form of the summation notation is used. For example, ΣX means to sum all the values of X.
When only subsets of the values of X are to be summed then the full version is required. Thus, the sum of
n 1
all elements of X except the first and the last would be indicated as:
x
i2
i
which would be read as the sum
of X with i going from 2 to n-1. Some formulas require that each number be squared before the numbers
n
are summed. This is indicated by:
x
i 1
5
x
i 1
2
i
2
i
means to square each value before summing.
 5 2 2 2  8 2  4 2  6 2  25  4  64  16  36  135
It is very important to note that it makes a big difference whether the numbers are squared first and then
summed or summed first and then squared. The symbol (ΣX)² indicates that the numbers should be
summed first and then squared. For the present example, this equals: (5 + 2 + 8 + 4+6)² = 25² = 625.
This, of course, is quite different from 135.
Sometimes a formula requires that the sum of cross products be computed. For instance, given
X
Y
2
3
1
6
4
5
What is ΣXY? The sum of cross products (2 x 3) + (1 x 6) + (4 x 5) = 32
54
The notation
 ( x  x)
2
means perform the following steps:
1) find the mean ( x 
x )
n
2) Subtract the mean from each value
3) Square the answers
4) Find the sum
Example: Find the value of
 ( x  x)
2
for the values 5, 2,8,4,6.
x
x- x
(x- x )2
5
0
0
2
-3
9
8
3
9
4
-1
1
6
1
1
 ( x  x)
20
2
Basic properties of summation notation:
1. Σ(X  Y) = ΣX  ΣY
Example:
X
Y
3
8
2
3
4
1
Σ(X + Y) = 11 + 5 + 5 = 21
ΣX = 3 + 2 + 4 = 9
55
ΣY = 8 + 3 + 1 = 12
ΣX + ΣY = 9 + 12 = 21
2. (ΣX) (ΣY)  ΣXY
In the above example :( ΣX) (ΣY) = 9 *12 = 108
ΣXY=3*8+2*3+4*1=34.Thus, 108  34
3. ΣX2  (ΣX)2
In the above example, ΣX2=9+4+16=29
(ΣX)2= 9*9=81. Thus, 29  81
n
4.
For any constant c,
 c  nc ,
i 1
n
n
i 1
i 1
 cxi  c xi
4
Example:
 5  5  5  5  5  4 * 5  20
i 1
n
n
i 1
i 1
5
 5xi  5 xi  if xi  5,2,4,8,6
x
i 1
i
 5  2  4  8  6  25
5 xi  5 * 5, 5 * 2, 5 * 4, 5 * 8, 5 * 6
5
 5x
i 1
i
  25  10  20  40  30  125
5
5 xi  5 * 25  125
i 1
Solved Exercises
Data
i
xi
1
1
2
2
3
3
4
4
56
1. Find
2. Find
Data
i
xi
1
-1
2
3
3
7
and c which is a constant = 11
3. Find
4. Find
5. Find
Data
I
xi
yi
1
10
0
2
8
3
3
6
6
4
4
9
5
2
12
57
6. Find
7. Find
8. Find
9. Find
4.2.
Types of measures of central tendency
1) Arithmetic Mean: The arithmetic mean is the sum of the data set values divided by the number of
observations. Arithmetic mean or average value of a variable is the most important numerical
measures of central tendency. For ungrouped data, the population mean (usually denoted by “”) is
the sum of all the population values divided by the total number of population values:
N

X
i 1
i
N
where : N  number of elements in the population
  population mean
The population mean applies when the data represent all of the items within the population. For
ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample
values:
n
X 
X
i 1
i
n
X  sample mean
n  number of elements in the sample/sample size
A sample of five executives received the following salaries (Birr in thousands): 14.0, 15.0, 17.0, 16.0, and
15.0, find the mean salary.
X 
Xi 14.0  ...  15.0 77


 15.4
n
5
5
Therefore, the mean salary of the executives is Birr 15,400.00
58
Properties of Arithmetic mean
a) Arithmetic mean is the most widely used measure of location/central tendency.
b) All the values are included in computing the mean.
c) A set of data has a unique mean.
d) Every set of quantitative data has a mean.
e) The mean is affected by large or small data values, called outliers and may not be the appropriate
average to use in this situations.
f) We cannot determine a mean for open ended data.
g) The sum of the deviations of each value from the mean is always zero.
 ( x  x)  0

Example: Given xi  5,2,4,8,6 x  5
 ( x  x)  (5  5)  (2  5)  (4  5)  (8  5)  (6  5)  0  3  1  3  1  0
Mathematically,
 ( x  x)  0
 x - nx   x - n (
x) 
n
 ( x  x)   x -  x,
where x is a constant
 xx  0
00
h) If x1 and x 2 are the arithmetic mean of n1 and n 2 observations respectively, then the
combined mean will be : xc 
n1 x1  n2 x2
(is the same as the weighted mean)
n1  n2
Example:
1) The mean age of 12 men and 10 women are 45 and 42 respectively. What is the combined mean age?
Solution: xc 
i)
12 * 45  10 * 42
 43.6
12  10
Short cut formula can be used if the figures in the calculation have many digits. First transform
the observations (xi’s) as yi= xi-c, where c is any chosen value near the center, then x = y  c
j)
The arithmetic mean is affected by both change of origin and scale. That is,
 Given a mean for data values, if we add or subtract a constant number c from all data
values, the new mean will be the old mean plus or minus c (change of origin).
 Given a mean for data values, if we multiply all data values by a constant number c,
then the new mean will be c times the old one (change of scale).
Example: The mean life of a certain brand of bulbs is 1030 hours.
59
a) If a new process adds 50 hour to the life of each bulb, what will be the mean life of them? (ans.
1080 hours )
b) If you apply a recently developed method of production, the life of each bulb is doubled, what
will happen to the mean life of them? (ans. 2060 hours )
Arithmetic mean for grouped data
The mean of a sample of data organized in a frequency distribution is computed by the following formula:
k
X 
fX
i 1
k
i
f
i 1
fi  i th class frequency
i
where: X i  class mark of the i th class
i
k  number of classes
Example: Compute the arithmetic mean of for the following grouped data:
Class Boundaries
5.5-10.5
10.5-15.5
15.5-20.5
20.5-25.5
25.5-30.5
30.5-35.5
35.5-40.5
Class mark
(Xi)
8
13
18
23
28
33
38
fi
fiXi
1
2
3
5
4
3
2
8
26
54
115
112
99
76
7
f
i 1
i
 20
7
f X
i 1
i
i
 490  X 
490
 24.5
20
2) Weighted mean: It is a special case of arithmetic mean. It occurs when there are several
observations of the same value which might occur if the data have been grouped in to a frequency
distribution. It is the mean value of data values that have been weighted according to their relative
importance. The formula for the weighted mean for a population or a sample will be as follows:
 or X  ixi

 i
Where:   is population weighted mean
X =is sample weighted mean
i  Weight assigned to the ith data value
xi  The ith data value
Examples:
60
i.
During a one hour period on Saturday afternoon a waiter served fifty drinks. She sold 5 drinks for
birr 0.50, 15 for birr 0.75, 15 for birr 0.90, and 15 for birr 1.10. Compute the weighted mean price
of the soft drinks.
X  5 * 0.50  15(0.75  0.90  1.10)
50
 0.875
ii. A student scored an A in Sophomore English (3 credit hours), a C in Psychology (3 credit hours),
a B in Microeconomics-I (4 credit hours) and a D in Civics (2 credit hours). Assuming A has 4
grade points, B has 3 grade points , C has 2 grade points and D has 1 grade points, calculate the
grade point average (GPA). Ans. 32/16=2.66
3) Geometric mean: The geometric mean (GM) of n positive numbers is defined as the nth root of their
product. The formula is:
GM = n  X 1 X 2  X 3.... Xn   n xi ,  => multiplication
 The geometric mean is useful in finding the average of percents, ratios, indexes, or growth rates.
It has a wide application in business and economics because we are often interested in finding the
percentage changes in sales, revenues, profits, GDP, etc.
Examples
a) The GM of 4 and 16 is
b) The GM of 1,3,9 is
3
4 *16  8
1* 3 * 9  3
c) The interest rates on three bonds were 5, 21, and 4 percent. The average interest rate is:
GM  3 5  21  4  7.49
d) The returns on investment earned by a company for four successive years were 30%, 20%,
-40% & 200%, what is the geometric rate of return on investment?
Solution: 30%
return means additional gain from what we have (i.e. from 100%). Then 30% return is
expressed as 1.3, -40% implies reduction ( 1-0.4 = 0.6)
GM  4 (1.3) * (1.2) * (0.6) * (3.0 ) =1.294 The GM of the return is therefore 1.294-1=
29.4%
Another use of the geometric mean is to determine the percent increase in sales, production or other
business or economic series from one time period to another.
GM  n
value at end of period
value at beginning of period
 1,
n= time gap/time period
61
Example:
1) The production of soaps for a soap factory increased from 755,000 in 1992 to 835,000 in
2000. What would be the rate of production increase? Rate of production
increase  GM  8
835,000
 1  1.27%
755,000
2) If the population of Ethiopia increased from 53,000,000 in 1980 to 73,000,000 in 2000. What
is the average annual increase? GM =
20
73,000,000
1
53,000,000
= 0.016 = 1.6%
3) If a person receives a 20% raise after one year of service and a 10% raise after the second
year of service, the average percentage raise is not 15% (
20%  10%
) but 14.89% as shown
2
below:
GM  1.2 *1.1  1.1489 or
GM  120 *110  114.89%
His salary is 120% at the end of the first year and 110% at the end of the second year.
This is equivalent to an average of 14.89%, since 114.89%-100%=14.89%. This answer
can also be shown by assuming that the person earns Birr 10,000 to start and receives two
raises of 20% and 10%.
Raise 1=10,000*20%=Birr 2000
Raise 2=12,000*10%=1200
His total salary is Birr 3200. The total is equivalent to:
Birr 10,000*14.89%=Birr 1489
Birr 10,000 +1489=11,489*14.89%=Birr 1710.71
Total increase= Birr 1489 + Birr 1710.71=3199.71 (almost equal to Birr 3200)
4) The price of a certain commodity in 1970 was 1.06 times that of 1969, in 1971 it was 1.04
times that of 1970. In the next two years it was 1.10 and 1.23 times that of the respective
preceding years. What is the average annual percentage increase in the given period?
GM  4 1.06 *1.04 *1.10 *1.23  1.105  (1.105  1) *100%  10.5%
(the average annual increase is 10.5%)
For grouped data geometric mean is calculated as:
62
GM  n x1 1 * x2 2 * ...... * xm
f
f
fm
Where fi is the frequency of the ith class mark,
Xi is class mark
m is number of values
n=total number of observations
Example:
1) Find the geometric mean for the following grouped data on the percentage increase in salary of 16
employees of a company.
% increase in salary
Number of
Class mark
employees
0-4
5
2
5-9
6
7
10-14
3
12
15-19
2
17
5
6
3
2
Solution: GM  16 2 * 7 *12 *17  5.85% . The geometric mean percentage increase in salary is
5.85%
If 'n' is a large number, the computing the nth root of the product is a tedious work. To facilitate the
computation of GM, we make use of logarithms.
n
 X 1 X 2 X 3.... Xn   n xi
logGM=log
n
Take log
 X 1 X 2 X 3.... Xn 
log xi log x1  log x 2  ...  log x n


n
n
log GM=log (xi )
 logx i ]
GM  anti log[
n
1
n

 log x
i
n
4) Harmonic Mean
The harmonic mean of n positive observations is defined as the number of values divided by the sum
of the reciprocals of each value. That is, HM =
n
1 1
1

 ... 
x1 x 2
xn

n
n
1
x
i 1
i
It is used for average rates of change. Example: Speed. Example: Find HM of 60, 50 & 40
HM =
3
= 48.65
1
1
1


60 50 40
63
64
Example: Suppose a person drove 100kms at 40km/hr and returned driving at 50km/hr. What is the
average speed? Solution
Speed 
Dis tan ce
Time
t1 
Dis tan ce
S
100km


 2.5 hours to make the first trip
Speed
V
40km / hr
t2 
Dis tan ce
S
100km


 2 hours to return
Speed
V
50km / hr
Total time  2.5 hours  2 hours  4.5 hours
Total distance  100km  100km  200km
S
200km
V

 44.44km / hr
t
4.5 hr
Arithmetic mean (weighted mean) 
2.5 * 40  2 * 50
 44.44km/ hr
4.5
This value can be found by using the harmonic mean formula:
HM=
2
1
1

40 50
 44.44km/ h
Here, we don't calculate the arithmetic mean to find the average speed because the man traveled
equal distances by different speed on three days. If, however, he had traveled for equal times in 3
days the arithmetic mean would be had correct average. If we want to use arithmetic mean, we have
to take weights in to account:
Harmonic mean for grouped data
HM=
n
f
f1 f 2

 ...  n
x1 x 2
xn

n
n

i 1
fn
xi
Xi= class mark
Relationship between Arithmetic mean, Geometric Mean and Harmonic Mean
For a set of data containing n-positively valued observations, the following relationships always holds:
HM  GM  AM
65
The three means become equal iff all values in the set of data are equal.
5) Median (MD)
The median of a set of values arranged in the order of their magnitudes, i.e., in an array, is the middle
value or the arithmetic mean of two middle values. Median is that value of a variable which divides an
array of items in such a manner that the number of items below it is equal to the number of items above it.
a) Median for Ungrouped Data
 n  1
 observation
 2 
th
If the number of observations is odd, then, MD =value of 
Example: Find the median of the following data set: 1, 5, 3, 9, 10, 12, 6
Solution: First array the data: 1, 3, 5, 6, 9, 10, 12, n = 7  odd
 n  1
 7 1
th
MD = 
 observation = 
 observation = 4 observation = 6
 2 
 2 
th
th
th
If the number of observations is even, then, MD =
th
n
n 
  observation    1 observation
2
2 
2
Find the median of the following data set: 1, 5, 2, 9, 7, 10, 12, 13
Solution: First array the data: 1, 2, 5, 7, 9, 12, 13, n = 8  even
th
th
n
n 
  obsn    1 obsn
2
2 
MD =
=
2
th
th
8
8 
  obsn    1 obsn
4 th obsn  5th obsn
2
2 
=
2
2
The 4th observation is 7 & the 5th observation is 9, then, MD =
79
=8
2
b) Median for Grouped data
For grouped data, median is calculated by using the following formula:
n
  cf
MD  md   2
 f




*i



Where
md is the lower class boundary/class limit of the median class
n is total number of observations
cf is the cumulative frequency preceding the median class
i is the class interval/width
66
f is frequency of the median class
Example: find the median from the following frequency distribution
Class Limit
Frequency
Cumulative Frequency
30-40
2
2
40-50
18
20
50-60
24
44
60-70
20
64
70-80
8
72
80-90
3
75
total= 75
Solution:
Steps:
a. Find the cumulative frequency
b. Find
f
i
 n  75  odd
 75  1
observation  38 th observation

2


th
c. Find the median class: 
d. In which class does the 38th observation fall? In the 3rd class and thus the 3rd class is the
median class
e.
Find the cumulative frequency preceding the median class. 20 in this case.
f.
Find the class width. 10 in this case.
g. Find the frequency of the median class. 24 in this case.
 75

 20 

 *10  57.29
MD  50   2
 24 




Properties of Median
1. Array is a must before we calculate the median.
2. There is a unique median for each data set.
3. Geometrically, median divides the histogram or cumulative frequency curves into two parts
with equal area.
4. Median remains unaffected by the magnitude of the extreme values.
67
5. It can be calculated for an open ended frequency distribution if the median class doesn't lie in
an open ended class.
68
6) Mode (MO)
 Mode is the most frequent value in a data set.
 The mode is the value of the observation that appears most frequently.
 The mode of the distribution is the value that has the greatest concentration of tendencies, i.e., the
value that occurs with greatest number of times in a distribution.
 The data value that occurs with greatest frequency is a mode.
Example: the examination scores for ten students are: 81, 93,84,75,68,87,81,75, 81and 87. Because
the score of 81 occurs three times, it is the mode
A data set may have
A. No mode at all, e.g. 1, 3, 9, 0, 7, 8
B. One mode (unimodal) e.g. 1, 3, 1, 7, 1, 9, mode is 1
C. Two modes (bimodal) e.g. 7,2,4,4,7 , mode are 7 and 4
D. Many modes (multimodal) e.g. 1, 0, 0, 1, 3, 2, 2, 3, 7, 7, 4, 9, mode are 1, 0, 3, 2, 7
Mode of a grouped data
The approximate modal value grouped data is calculated by the following formula:
Mode  Lo 
f  f1
f  f1
i  L0 
i
 f  f 1  f  f 2
2 f  f 1  f 2 
Where:
Lo  lower classs boundary of the modal class (i.e., the class with the highest frequency)
f  is the frequency of the modal class
f1  frequency of the class immediatel y preceding the modal class class
f2  frequency of the class immediatel y following the modal class
i  class interval/w idth
Note: the data is to be arranged in an array. Example: Find the mode of the following distribution:
Class Limit
90-100
100-110
110-120
120-130
130-140
140-150
150-160
160-170
Solution: Mode  120 
Frequency
10
37
65
80
51
35
18
4
80  65
150
*10  120 
 123.41
2 * 80  65  51
44
69
Properties of mode

It is the easiest average to compute.

It can be obtained for both qualitative and quantitative data.

It is not affected by extreme values.

The mode may not exist for a data set.

It is not unique. A data set can have more than one mode.

The mode is not based on all observations.
Distribution, shape and measures of central tendency
The relative values of the mean, median and mode are very much dependent on the shape of the
distribution for the data they are describing. The data distributions may be described in terms of
symmetry and skewness. In other words, data can be either symmetric or skewed depending on how
the data are distributed around the center.
Symmetry (normal, bell shaped) distribution: occurs when the data values are evenly distributed
around the center. In a symmetrical distribution, the left and right sides of the distribution are mirror
images of each other, and the values of the mean, median and mode are equal.
Skewed distribution: occurs when the data values are not evenly distributed around the center.
Skewness refers to the tendency of the distribution to “tail off” to the right or left. Skewness is lack of
symmetry of a distribution.
Right (positively) skewed distribution: The mean is greater than the median, which in turn is
greater than the mode. In such distributions, the median tend to be a better measure of central
tendency than the mean. In a positively skewed distribution (when the majority of the data values fall
to the left of the mean and cluster at the lower end of the distribution), the arithmetic mean is the
largest of the three measures as the mean is influenced by a few extremely high values more than the
Median or Mode. Mode<Median<Mean
Left (negatively) skewed distribution: the mean is less than the median, which in turn is less than
the mode. As with the positively skewed distribution, the median is less influenced by extreme values
and tends to be a better measure of central tendency than the mean.
Mean<Median<Mode
70
4.3.
Quartiles, Deciles and Percentiles
Descriptive measures that describe the position (place) of value in a given data or distribution are
positional averages. Measures which divided data in to many equal parts are called quantiles
(fractiles). The most important of these are quartiles, deciles and percentiles. To obtain such measures,
first of all, we have to order the data in an increasing order.
Quartiles
Quartiles divide the data in to four equal parts. The j th quartile denoted as Qj where j=1, 2, 3 is defined
as
 j (n  1) 
Qj  
observation
 4 
th
Q1 gives the value where 25% of the observations lie below and 75% above it
Q2 gives the value where 50% of the observations lie below and 50% above it
Q3 gives the value where 75% of the observations lie below and 25% above it
Example: Find the quartiles (Q1, Q2, & Q3) from the following distribution 8, 4, 8, 3, 4, 8, 5, 5, 10,
Solution: Arrange first: 3,4,4,5,5,8,8,8,10
1(9  1) 
Q1  
item  (2.5) th item  2 nd item  0.5(3rd item  2 nd item )  4  0.5 * (4  4)  4

4


th
 2(9  1) 
Q2  
item  (5) th item  5

 4 
th
 3(9  1) 
Q3  
item  (7.5) th item  7 th item  0.5(8th item  7 th item)  8  0.5(8  8)  8

 4 
th
For grouped data,
i*n
 cf

4

Qj  i 
fi


 *w
i
Where i=1, 2,3
 i = lower class boundary of the ith quartile class (the class which contains the (
wi =class width
fi=frequency of the ith quartile class
n=total number of observations
71
i * n th
) item ).
4
cf=the cumulative frequency of the class preceding the ith quartile class
i*n
 cf

4

Q1   1 
fi
Class Boundaries
Fi
Cf
5.5-10.5
1
1
10.5-15.5
2
3
15.5-20.5
3
6
20.5-25.5
5
11
25.5-30.5
4
15
30.5-35.5
3
18
35.5-40.5
2
20


 *w
i
th
n
 20 
( ) th item    item  5 th item is Q1 and it falls in the 3rd class  15.5 - 20.5 is first quartile class
4
4
 1 * 20 
 3

4

 * 5  18.83
Q1  15.5 
3
Q2 ?
th
(
2n th
 40 
) item    item  10 th item is Q 2 and it falls in the 4 th class  20.5 - 25.5 is second quartile class
4
4
 2 * 20

 6

4
 * 5  20.5  4  24.5  median
Q2  20.5  
5
Q3 ?
th
3n
 60 
( ) th item    item  15 th item is Q 3 and it falls in the 5 th class  25.5 - 30.5 is third quartile class
4
4
 3 * 20

 11

4
 * 5  25.5  5  30.5
Q3  25.5  
4
72
73
Deciles
Deciles are measures that divide a distribution/data set in to ten equal parts
The jth decile for a simple frequency distribution (ungrouped data) denoted as Dj, where j=1, 2, 3.....9 is
defined as
 j (n  1) 
Dj  
observation
 10 
th
D1 gives the value where 10% of the observations lie below and 90% above it
D2 gives the value where 20% of the observations lie below and 80% above it
D3 gives the value where 30% of the observations lie below and 70% above it
.
.
D9 gives the value where 90% of the observations lie below and 100% above it
For grouped data,
i*n
 cf

10

Dj  i 
fi


 *w
i
Where i=1, 2,3,4.....9
 i = lower class boundary of the ith decile class (the class which contains the (
i * n th
) item ).
10
wi =class width
fi=frequency of the ith decile class
n=total number of observations
cf=the cumulative frequency of the class preceding the ith decile class
Percentiles
Percentiles divide a distribution/data set in to 100 equal parts.
The jth percentile for a simple frequency distribution (ungrouped data) denoted as Pj, where j=1, 2,
3.....99 is defined as
 j (n  1) 
Pj  
observation
 100 
th
74
P1 gives the value where 1% of the observations lie below and 99% above it
P2 gives the value where 2% of the observations lie below and 98% above it
P3 gives the value where 3% of the observations lie below and 97% above it
.
.
P99 gives the value where 99% of the observations lie below and 1% above it
For grouped data,
i*n
 cf

100

Pj   i 
fi


 *w
i
Where i=1, 2,3,4.....99
 i = lower class boundary of the ith percentile class (the class which contains the (
i * n th
) item ).
100
wi =class width
fi=frequency of the ith percentile class
n=total number of observations
cf=the cumulative frequency of the class preceding the ith percentile class
Observe that:
1. Q2= D5= P50=Median
2. Dj= P10j, j=1, 2, 3,4,5,6,7,8,9.
3. Qj= P25j, j=1, 2, 3
Review exercises
Choose the best answer
1. Which of the following measures of central tendency is affected most by extreme values
(outliers)?
a. Median b. Mean c. Mode d. Geometric Mean
2. In a set of observations, which measure of central tendency reports the value that occurs most
often?
a. Mean b. Median c. Mode d. Geometric Mean
3. The relationship between the geometric mean and the arithmetic mean is
a. They will always be the same
75
b. The geometric mean will always be larger
c. The geometric mean will be equal to or less than the mean
d. The mean will always be larger than the geometric mean
4. Suppose you compare the mean of raw data and the mean of the same raw data grouped into a
frequency distribution. These two means will be
a. Exactly equal
b. The same as the median
c. The same as the geometric mean
d. Approximately equal
5. In a set of 10 observations the mean is 20 and the median is 15. There are 2 values that are 6, and
all other values are different. What is the mode?
a. 15
b) 20
c)6
d) None of the above
6. Which of the measures of central tendency is the largest in a positively skewed distribution?
a) Mean b) Mode c) Median d) Geometric Mean
7. The weighted mean is a special case of the
a) Mean b) Mode c) Median d) Geometric Mean
Workout/explain the following questions
1. Show that the sum of the deviations of each value from the mean is always zero.
2. Show that
 ( x  x)
2
  x2  nx
2
3. Given the data values: 5,12,8,3,4, find
 x,  x , ( x) ,  (x - x),  (x - x)
2
2
2
4. Calculate the per-capital income (average income) from the following data.
Salary ( in birr)
No of Persons
120.00
4
400 .00
4
10,000.00
1
50,000.00
1
5. A teacher assigns weights 4, 2, 3 respectively to seminar work, class work and monthly tests of
students. What is the average academic performance of a student scoring the following marks:
Work
Marks (100%)-x
Weights-w
Wx
Seminar
45
4
180
Class work
62
2
124
Monthly test
52
3
156
Total
9
460
Weighted mean
460/9=51.1
76
The weight shows that seminar work is twice as important as the class work from the teacher’s
point of view.
6. Two colleges show the following results. Which one is better on average?
Category
College A
College B
First year
70% (200 students)
80% (150 students)
Second year
60% (150 students)
60% (100 students)
Third year
80% (100 students)
80% (50 students)
7. In a class of 40 students, 10 have failed and their average marks are 30. The total mark secured
by the entire class was 2400. Find the average mark of those who have passed.
8. The average salary of 20 individuals working in a small scale industry was Birr 1000. But five
qualified persons were employed and then increased the average salary into Birr 1200. What
was the mean salary of the newly employed employees?
9. A nation faces a rate of inflation of 2% in 1990, 5% in 1992, and 12.5% in 1993. Find the
geometric mean of the inflation rates?
10. A firm pays
5
1
of its labour force an hourly wage of Birr 5, of the labor force a wage of Birr
12
3
6 and ¼ a wage of Birr 7. What is the average wage paid by this firm?
11. In a certain examination, the average grade of all students in section A is 70 and students in
section B is 75. If the average of both classes combined is 72, find the ratio of the number of
students in section A to the number of students in section B.
12. The average weekly wage of workers in a certain firm is Birr 50. The mean wage of female
workers is Birr 52 and that of male workers is Birr 42. What is the percentage of female workers
and male workers in the firm?
13. A household purchased Birr 600 worth teff for consumption in three equal purchases of Birr 200
each over a three months period. The first pack of teff was Birr 2.95/kg, the second Birr 3.10/kg
and the third Birr 3.25/kg. What was the average price per kg paid for all the teff?
14. If sixty percent of the populations in Ethiopia earn average monthly income of Birr 1,000.00 and
the remaining populations earn Birr 2,000.00, calculate the average monthly income of the
whole population in Ethiopia.
15. The mean of 200 items is 50. Later on it is discovered that two items were wrongly taken as 92
and 8 instead of 192 and 88. Find the correct mean.
77
16. Find out the mean from the following data:
Series X
Series Y
Arithmetic Mean
12
20
No of items
80
60
17. The mean age of all students in a class of 50 students is 17 years. If the mean age of 30 of them
is 18 years, find the mean age of the remaining 20 students.
18. The mean marks obtained by 300 students are 56. The mean of the top 100 students of them was
found to be 80 and the mean of the bottom 100 of them was found to be 22. What is the mean of
the remaining 100 students?
19. The arithmetic mean of two observations is 10 and the geometric mean is 8. Find out the values
of the two items.
20. Central Statistical Authority has calculated the per-capita income of the one million individuals
and it was found to be Birr 1500. Later, it is found that a person with income of amount
20,000.00 is not taken in to account. Calculate the correct per capita income including this
person's income in to manipulation.
21. The arithmetic mean of 20 observations is found to be 20. Later on, sample values 5 and 15
were incorrect. The correct values are 9 and 12. Find the correct mean.
22. A student scored B, A, C, & B in ECON 211, ACCT 201, MGMT 211, and FLEN 201 having
credit hours 4, 3, 1, and 3 respectively. Calculate GPA of this student.
23. The average monthly salary of employees in a company was Birr 2,500.00. Recently, each
employee is given additional monthly salary of Birr 200.00. Calculate the new average monthly
salary of instructors.
24. The mean age of 100 persons was found to be 30. Later, it was discovered that age 60 was
misread as 40. Find the correct mean.
25. Out of the total population of Ethiopia, 60% earn mean income of Birr 2,000.00 and the rest
earn mean income of Birr 5,000.00. Find the average income of the entire population.
26. The mean weight of 150 students in a certain class is 60 kg. The mean weight of boys is 70 kg
and that of girls is 55 kg. Find the number of boys & girls.
27. A motor car covered a distance of 100kms at four times. The first time at 50km/hr, the second
time at 40km/hr, the 3rd time at 45km/hr and the 4th time at 30km/hr. Calculate the average
speed.
78
28. If the arithmetic mean of the following frequency distribution is 28, find the missing frequency.
Class Limit
0 – 10
10 – 20
20 - 30
30 – 40
40 – 50
50 - 60
29. If the median & mode are 25 and
Frequency
12
18
27
f1
17
6
24 respectively. Find the missing frequencies and arithmetic
mean from the following frequency distribution.
Class Limit
0 – 10
10 – 20
20 - 30
30 – 40
40 – 50
Total
Frequency
14
f1
27
f2
15
105
30. For a sample of 50 stocks traded yesterday on the American Stock Exchange, 10 showed a
decline of $1.00, 15 showed no change, and 25 increased by $2.00. Find the weighted mean.
31. In the following grouped data, X is the class mark and C is any constant. If the arithmetic mean
of the original distribution is 35.84. Find the value of X corresponding to the value X-C=0
X-C
-21
-14
-7
0
7
14
21
f
2
12
19
29
20
13
5
32. If the class midpoints in a frequency distribution of age of a group of persons are 25,
32,39,46,53 and 60. What are the class boundaries of the first class?
33. The following frequency distribution reports the number of students enrolled in each of the 50
sections of various courses taught in the College of Business last summer.
Students
Frequency
0 up to 10
3
10 up to 20
8
20 up to 30
16
30 up to 40
10
40 up to 50
9
50 up to 60
4
Total
50
a.
Determine the mean number of students per section.
b.
Determine the median number of students per section.
79
Chapter Five
Measures of Dispersion
Chapter Objectives:
Dear reader, when you have completed this chapter, you will be able to:

Compute and interpret the quartile deviation, the mean deviation, the variance and the standard
deviation of ungrouped and grouped data.

Explain the characteristics, uses, advantages and disadvantages of each measure of dispersion.

Compute and interpret the inter quartile range and its relative measure.

Compute and interpret the relative measures of dispersion

Compute and interpret the Z-score

Understand and measure Moments, Skewness and Kurtosis.
5.1. Types of Measures of Dispersion /Variation
Dispersion is the scatter or variation of items from a measure of central tendency. It measures the extent
to which the values vary among themselves.
Example 5.1. - Consider the following data on the expenditures of two groups of workers:
 Group A: Br 6200 2200 17000 17000 12000 (the mean is Br, 2400)
 Group B: Br 1600 1700 13000 4200 32000 (the mean is Br 2400)
We simply conclude that the two groups spend identical amount, if we were given only the average
expenditure of the two groups without knowing the actual expenditures. But the actual observations
indicate that more variation is observed in group A.
To be specific, it is often difficult to assert which set of data is better represented by its mean value unless
we refer to dispersion. This points to the possibility when any two or more sets of sample data having the
same mean (as in the previous example), may differ considerably in terms of the degree of dispersion. For
instance, the average income in a community is not an adequate indicator of the well being of the
community since it doesn’t show us the inequality among the residents. But, the measure of dispersion
can show us this inequality. Therefore, it is useful to have a measure of dispersion to observe variability
of data.
A measure of dispersion may be in an absolute form or relative form.
An absolute measure is said to be in an absolute form when it shows the actual amount of variation of an
item from a measure of central tendency while a relative measure is a quotient obtained by dividing the
absolute measure by a quantity in respect to which the absolute deviation has been computed. Relative
measures are unitless and are used to compare variability between different sets of data.
80
The following are some of the qualities of a good measure of dispersion.
 It should be based on all observations
 It should be easily calculated.
 It should be easily understandable
 It should be affected as little as possible by sampling fluctuations.
 It should be capable of further statistical treatment.
There are many types of measures of dispersion as listed below
1. Range
2. Quartile deviation
3. Mean deviation
4. Variance and standard deviation
5. Coefficient of variation
As stated so far, when these measures express the magnitude of dispersion in the same unit of
measurement in which the data are recorded, they are known as measures of absolute dispersion.
However, when dispersion is expressed in percentages or ratios, these measures are called measures of
relative dispersion.
1. Range
Range is defined as the difference between the smallest and the largest observations in a given set of raw
data. Obtaining range from raw data thus requires identifying only these two extreme values, and taking
the difference between them
Properties of range

Only two values are used in its calculation

It is influenced by an extreme value.

It is easy to compute and understand.

It is the crudest measure of dispersion.

It cannot be determined for an open ended data.

The grater the range, the higher the variability of the data and vice versa.
Example 5.2. Find the range of the raw data given in example 5.1. above.
Solution:
For Group A – The highest expenditure = 6200 birr
-
The lowest expenditure = 1200 birr
Range = highest value – lowest value
= 6200 – 1200 = 5000 Birr
For Group B – The highest expenditure = 4200
81
-
The lowest expenditure = 1300
Range = 4200 – 1300 = 2900 Birr
Therefore, in terms of expenditure more variation is observed in group A.
Note that: for discrete grouped data we use the same formula as given above, i.e, highest value minus
lowest value.
Example 5.3. Compute the range of the following data.
Table 5.1. Results (out of 35%) of 20 students in Econometrics test.
Xi
6
24
18
22
30
15
Fi
3
2
5
1
4
5
Maximum value = 30 marks
Minimum value = 6 marks
Range = Highest value – lowest value = 30 – 6 = 24
In case of continuous grouped data, range can be obtained in the following three ways:
i)
In the first, range is found by taking the difference between the upper class limit of the last
class and the lower limit of the first class. This is because the lowest and the highest
observations are not identifiable in the case of continuous grouped data. That is,
Range = UCLL – LCLF
Where UCLL = Upper class limit of the lest class
LCLF = Lower class limit of the first class
ii)
In the second, range is found by taking the difference between the upper class boundary of
the last class and the lower class boundary of the first class. That is,
Range = UCBL – LCBF
Where UCBL = Upper class boundary of the last class
LCBF = Lower class boundary of the first class.
iii)
In the third, range is found by taking the difference between the mid points of the first and the
last class. This does yield a result closer to the actual range as it reduces the margin by which
it is in error when computed by using the first the second methods.
Example 5.4. – Compute the range of the data given below in table 5.2.
Table 5.2. Results (out of 35%) of 40 students in Econometrics test
Score (35%)
Class Boundary
Number of Students (Fi)
6 – 10
5.5 – 10.5
5
11 – 15
10.5 – 15.5
10
16 – 20
15.5 – 20.5
15
21 – 25
20.5 – 25.5
7
82
26 – 30
25.5 – 30.5
3
Solution
Range = UCBL – LCBF
= 30.5 – 5.5
= 25 or
Range = UCLL – LCLF
= 30 – 6
= 24 or can be computed as the difference between the mid point of the last class and the
mid point of the first class. That is,
Range = 28 – 8 = 20
It may have been noted that range is measured in an absolute form in the above discussions. It implies that
such a measure cannot be used for comparing variabilities expressed in different units. Therefore, there is
a need to have a measure of relative dispersion /variation. The relative range or coefficient of range is
defined as:

Range
Highestvalue  LowestValu e
x100% 
x100% for raw data &
Sumofexter emevalue
Highestvalue  Losestvalue
discrete grouped data.

UCBL  LCBF
x100% for continuous grouped data.

LCB
F
UCBL
Example 5.5. Compute the coefficient of range for the following raw data.
2, 4, 6, 8, 16, 18, 20
Solution:Coefficient of range =
20  2
18
X 100% 
X 100% = 81.8%
20  2
22
Example 5.6. Find the coefficient of rage (relative range) for the data given in table 5.2.
Solution:UCBL = 30.5
LCBF = 5.5.
Coefficient of range =
=
30.5  5.5
X 100%
30.5  5.5
25
X 100% = 69.4%
36
83
Besides being simple to compute and understand, range is as good a measure of dispersion as any other
where the data consist of a few observations and is advantageous when one wants to know only the extent
of the extreme dispersion under “ordinary” conditions. However, its major drawbacks include; (i) it tells
us noting about the dispersion of the values which fall between the two extremes, (ii) it is highly sensitive
to sample size, (iii) highly affected if the value of the two extremes change. Despite these and some other
limitations, it is often used to express the degree of dispersion.
2. Quartile Deviations
Quartiles are the values which divide the array into four equal parts. Q1 gives the value of the item which
is
the way up the distribution, Q2 gives the value of the item which is half of the way and Q3 is the
value of the item 3/4th the way up the distribution.
Inter-quartile range is the difference between Q3 and Q1.
That is;
Inter-quartile range = Q3 – Q1
Quartile deviation, denoted as Q D , is defined as
QD =
Q3 Q1
2
Quartile deviation is also called semi-quartile range.
Example 5.7. Find the Quartile deviation of the following data.
Table 5.3. Results (out of 35%) of 40 students in Econometrics test.
Scores (35%)
Class Boundary)
Frequencies (fi)
Less than cumulative frequencies
6 –1 0
5.5 – 10.5
5
5
11 - 15
10.5 – 15.5
10
16 – 20
15.5 – 20.5
15
30 (Q1 value – 3oth value)
21 – 25
20.5 – 25.5
7
37
26 – 30
25.5 – 30.5
3
40
40
Solution:
since the ith quartile is computed as
Qi = LQi +
in 4  CF  xCWQi
PQi
FQi
Where: n = sample size
LQi = lower class boundary of the quartile class
84
15 (Q1 class, as
in
= 10th value)
4
CFPQi = Cumulative frequency of the preceding quartile class
CQWi = Class width of the quartile class
Fqi = frequency of the quartile class
1x40 4  5
Q1  10.5 
10
x5
= 10.5  25
15
= 13
Q3
3x 40  15x5

4
 15.5 
15
= 20.5
Quartile deviation (semi – quartile range) =
=
Q3  Q1
2
20.5  13
2
= 3.75
Note that: The coefficient of quartile deviation, which provides us a relative measure, is defined as
Coefficient of
Q3  Q1
Q  Q1
2
QD 
x100%  3
x100%
Q3  Q1
Q3  Q1
2
Example 5.8. Compute the coefficient of quartile deviation for the data given in table 5.3.
Solution
Q3 = 20.5
Q1 = 13
Coefficient of QD 
Q3  Q1 20.5  13 7.5


X 100% = 22.4%
Q3  Q1 20.5  13 33.5
Advantages of Quartile deviation include

It is easy to compute and understand

It can be computed for open-ended classes given that Q3 & Q1 can be found.

It is not affected by extreme values
Disadvantages of Quartile deviation include

It ignores the first 25% and the last 25% items

It is not capable of mathematical manipulations.

Its value is very much affected by sampling fluctuations.

It doesn’t show the scatter around the average, but only a distance on scale.
3. Mean Deviation
85
The mean deviation, also called the average deviation, measures the average deviation /scatters of a set of
observations about a central value, usually the mean or the median of the distribution. It is computed by
subtracting the mean/median from each individual observations, summing all the deviations ignoring the
negative sign, and dividing the sum by the total number of observations. The negative sign is ignored, for


instance, otherwise the sum of the deviation from the mean i.e,  X i  X
 will be zero. The mean
absolute deviation from the mean for a set of sample data consisting of n observations I computed as
MD
from the mean =
X
i
X
n
Similarly, MD from the median is obtained as MD from the median =
X
ungrouped data. It is obtained as
f X X
f
 f X  Md
f
i
M D from the mean =
i
i
i
M D from the median =
i
i
in case of grouped data, where Xi’s are the mid-points and
f
i
 n.
Example 5.9. The age of a sample of 10 students from a class is given below.
18, 19, 19, 19, 20, 21, 21, 22, 23, 24
Find mean deviation (i) from the mean (ii) from the median
Solution:Arithmetic mean =
X
i
n
 206
10
 20.6
n  value  n 2  1 value  20  21 = 20.5
Median = 2
th
th
2
Age
18
19
19
19
20
21
21
22
23
Mean Absolute deviation from
the mean
/18 – 20.6/ = 2.6
/19 – 20.6/ = 1.6
/19 – 20.6/ = 1.6
/19 - 20.6/ = 1.6
/20 - 20.6/ = 0.6
/21 - 20.6/ = 0.4
/21 - 20.6/ = 0.4
/22 - 20.6/ = 1.4
/23 - 20.6/ = 2.4
2
Mean absolute deviation from
the median
/18 – 20.5/ = 2.5
/19 – 20.5/ = 1.5
/19 – 20.5/ = 1.5
/19 – 20.5/ = 1.5
/20 – 20.5/ = 0.5
/21 – 20.5/ = 0.5
/21 – 20.5/ = 0.5
/22 – 20.5/ = 1.5
/23 – 20.5/ = 2.5
86
i
 Md
n
in the case of
24
/24 - 20.6/ = 3.4
16
/24 – 20.5/ = 3.5
16
87
Therefore,
MD
from the mean =
MD
from the mean =
X
i
X
=
n
X
i
 Md
n

16
= 1.6
10
16
= 1.6
10
Example 5.10. Find mean absolute deviation from the mean and from the median for the data given in
table 5.2.
Solution: First arrange the data as follows:
Score
Xi  Md
Fi X i  M d
Fi
Class mark
6 –10
5
8
9.125
45.625
9.167
45.835
11 - 15
10
13
4.125
41.250
4.167
41.67
16 – 20
15
18
0.875
13.125
0.833
12.495
21 – 25
7
23
5.875
41.125
5.833
40.831
25 – 30
3
28
10.875
31.625
10.833
32.499
Xi  X
Fi X i  X
(35%)
40
fX
i
i
173.75
 (5x8) + (10x13) + (15x18) + (7x23) + (3x28)
= 40 + 130 + 270 + 161 + 84
= 685
Mean =
fX
i
n
Median = Lmd
i

685
= 17.125
40
40  CF 

2

xCW
= 15.5 
PMd
md
FMd
20  15 x5 = 17.167
15
Therefore,
M D form the mean =
 f X  X = 173.75 = 4.344
40
f
 X  M  173.33 = 4.333
i
i
i
M D from the median
i
n
d
40
88
173.33
Note: coefficient of mean deviation, relative measures, form the mean and from the median are given as
follows:
(i)
Coefficient or M D form the mean
=
(ii)
M D from the mean
x100%
mean
Coefficient of M D from the median
=
M D from the median
x100%
median
Example 5.11. Compute the coefficient of mean deviation from the mean and from the median for the
data given in example 5.10.
Solution:-
MD
from the mean = 4. 344
MD
from the median = 4.333
Mean = 17.125
Median = 17.167
Thus, coefficient of M D from he mean =
4.344
x100%
17.125
= 25.37%
Coefficient of M D from the median =
4.344
x100%
17.167
= 25.24%
Advantages of Mean Deviation
 It is easy to understand and compute than standard deviation
 It is not unduly influenced by large or small values
 All values are used in its calculation
Disadvantages of Mean Deviation
 It ignores the algebraic sign of the deviations
 It is not suitable for further mathematical processing.
89
4. Variance and Standard Deviation
Like other measures, variance and standard deviation also quantities the dispersion of the observations
around the mean value.
The population variance is defined as the arithmetic mean of the squared deviations from the population
mean.
Properties of Population variance
 All values are used in calculation.
 The units are awkward, the square of the original units.
The formula for the population variance for raw data is:

2
 X

 
2
i
N
where:

N
S
2
= Mean (population)
= total number of observation
 X

i
X

2
n 1
Where;
n
= sample size
X
= mean
Alternatively, we can simplify it as follows
S2
 X

i
X
n 1
 =   X
2
=
  X
=
X
2
i
2
i
2
 X  2 X X i 

n 1
2
2
2
  X  2 X X i 
   Xi   X  2X  Xi
n 1
n 1
n 1
n 1
 X 
2
2
i
n 1
nX
2n X
 X i2 


n 1 n 1
n 1
2

2
i
n
n 1
n X i   X i 
2
=
nn  1
2
for small sample size.
90
n X i   X i 
2
2
n2
=
for large sample.
Why n-1?
The reason for this is, in small sample, if provides a better estimate of the variance of the population from
which the sample is drawn. However, as n increases above about 30, we can use n instead of n-1, as the
two versions given approximately the same result for practical purposes.
Example 5.12. The ages of a family (in years) are:
2, 18, 34, 42. What is the population variance
Solution:

X

i
96
= 24
4

 X   

2
2
2
2

2  24  18  14  34  24  42  24

2

2

4
944
= 236
4
=
the population standard deviation is the square root of the population variance.

 X
 
2
i
N
and the sample standard deviation is the square root of the sample variance.
S
 X
S
 X
i
X

2
n 1
i
X
n

for small sample size &
2
for large sample size
Alternatively, for small sample less than about 30
n X i2   X i 
2
S
nn  1
91
Example 5.13. From the sample data given below compute variance and standard deviation
10, 15, 30, 22, 41, 32
Solution:n=6
Xi
Xi2
10
100
15
225
30
900
22
484
41
1681
32
1024
X
i
 X   4414
 150
2
i
n X i2   X i 
2
So, S 
2
nn  1
64414  150
45
2
=
= 132.8
S  S 2  132.8 = 11.51

Variance and Standard deviations for grouped data
For grouped data the population and sample variance denoted by
 f X   

f

i
i
  f i X i2   f i X i 
2
2
2

2
i
S
2
 f X  X   n f X

f
i
and S2 respectively are given by:
i
i
  f i X i 
2
2
i
n2
i
in which Xi’s are the class mid-points and
f
i
 N for the population and
f
i
 n for the sample.
Alternatively for small sample size we can use:
S 
2
n f i X iw   f i X
nn  1

2
By definition, standard deviations in each case are the square roots of the respective variances.
92
Example 5.14. From the cotinions frequency distribution given in table 5.2, compute the sample
variance and standard deviation.
Solution:
Class
limits Class
fi
X X  X X 
2
fi X i
i
i

fi X i  X

2
X i2
f i X 2i
(scores)
mark
6 –10
8
5
40
-9.125
83.26
416.328
64
320
11 – 15
13
10
130
-4.125
17.016
170.16
169
1690
16 – 20
18
15
270
0.875
0.7656
11.48
324
4860
21 – 25
23
7
161
5.875
34.516
241.609
529
3703
26 – 30
28
3
84
10.875
118.26
254.80
784
2352
40
685
253.82
1194.8
12925
Therefore, for small sample size
S
2
 f X

Xi
i
n 1

2

1194.8
= 30.625
40  1
S  S 2  30.625 = 5.534
Alternatively, S 
2
n f i X i2   f i X
nn  1

2
4012925  685
4039
2
=
= 30.625
S  30.625
= 5.534
Important properties of Variance /Standard Deviation
The following are some of useful mathematical properties of variance and standard deviation:
1. The variance/standard deviation of any constant is always zero.
A standard deviation of zero implies that there is no variation at all in the data set. In other words
the data values are the same.
2. A variance/standard deviation never be a negative number.
3. If a constant is added or subtracted from each observation, the variance/standard deviation of the
resulting observations will not be affected.
93
If every observation is multiplied by a constant K, then the new variance will be K 2 times the
4.
original variance and the new standard deviation will be K times the original standard deviation.
2
2
5. If there are two sets of data consisting of n1 and n2 observations with S1 and S 2 as their respective
variances, the combined variance S C2 of (n1 + n2) observations is



n1 S12  d12  n2 S 22  d 22
S 
n1  n2
2
C

where d1 = X 1  X C
2

2

and d 22   X 2  X C  . Herein, the combined mean X C 
2


n1 X 1  n2 X 2
n1  n2
in case X 1  X 2 .
n1S12  n2 S 22
S 
n1  n2
2
C
Further, when n1 = n2
SC2 
S12  S 22
2
6. If Y represents a linear transformation of X as Y = a+bX, with a as the additive constant and b as
the multiplicative constant, then the variance of Y is:
SY2  b 2 S X2 , where S X2 is the variance of X. It follows that standard deviation of Y is bSX. Where
SX is the standard deviation of X.
Example 5.15. Calculate the standard deviation of the combined group of 400 items form the
following data.
Table 5.4.
Group A
Group B
Group C
Number of items (ni)
50
150
200
Mean X i
40
50
60
81
100
121
 
 
Variance S i2
Solution:-
XC 
=
n1 X 1  n2 X 2  n3 X 3
n1  n2  n3
50(40)  150(50)  200(60)
50  150  200
94
= 53.75
di  X  X C
d1 = 40 – 53.75
d2 = 50 – 53.75
= -13.75
d3 = 60 –53.75
= -3.75
= 6.25
Consequently, the combined variance is given as
SC2 




n1 S12  d13  n2 S 22  d 22  n3 S32  d 32
n1  n2  n3






50 81   13.75  150 100   3.75  200 121  6.25
=
400
=
2
2
2

13503  17109  32012
400
= 156.56
SC  156.56
= 12.512
5. Coefficient
of Variation
Coefficient of variation, developed by Karl person (1857 – 1936), is a relative measure of dispersion
which is a very useful measure when either the data are in different units or the data are in different
units or the data are in the same units but the means are far apart. It is defined as the ratio of the
standard deviation to the arithmetic mean (where mean is different from zero), expressed as a
percentage:
CV 
S tan darddeviation
X 100%
Mean
for population
CV 

N
X 100%
while for sample, it is obtained as
CV 
S
X 100%
N
Coefficient of variation (CV) helps us for comparing the
 Variability,
 Heterogeneity /homogeneity,
 Uniformity, &
 Consistency of two or more distribution.
95
A series /distribution with smaller coefficient of variation is said to be more homogenous /uniform/
consistent than the other distribution. And a series /distribution with larger CV is said to be more variable
or more heterogeneous than the other distribution.
Example 5.16. The number of employees, the average wages and the variance of the wages for two
factories are given below.
Table 5.5. Summary of wage & employees of two factories.
Factory A
Factory B
Number of employees
50
100
Average wages
120
85
9
16
Variance of the wages
Which factory is consistent in respect to the wages of employees?
Solution:
Factory A
Factory B
Given: nA = 50
XA
= 120
S A2 = 9
CVA 
SA
XX
Given: nB = 100
X B = 85
S B2 = 16
SB
CVB 
x100%
CVA  3
X 100% = 2.5%
120
CVB  4
XB
85
X 100%
X 100% = 4.7%
Conclusion: CVA < CVB => the wages of employees of factory A is more consistent than factory B.
Interpretation of Standard Deviation
Theorem: (GAUSSIAN RULE). If a data in a sample are approximately distributed, then
a. X  S , approximately include 68% of the data.
b. X  2S , includes approximately 95% of the data
c. X  3S , includes approximately 100% of the data.
Standard Scores (Z-Scores)
The Z-score is defined to indicate the number of standard deviations that an observation is below or above
the mean depending on whether the Z-score is negative or positive.
Z – is called the standard value which is given by
96
Z
Xi  X
S .d
Example 5.15. Helen scored 65 in Auditing and Samuel scored 70 in Auditing. If the average score of the
whole students in Auditing is 67 and standard deviation equal to 3, which student performs better?
Solution
Z Helen 
=
Z Helen  X
S
Z Samuel 
65  67
3
=
= -0.6
X Sami  X
S
70  67
3
=1
Therefore, Samuel performs better in Auditing than Helen and than the average result of the whole
students.
Exercise: In a sample, 100 students doing a master program in management were tested in a general
knowledge paper carrying 100 marks. At the end of the exercise, they were found distributed according to
marks obtained as follows:
Marks
obtained
Number
of
30 -40
35-39
40-44
45-49
50-54
55-59
60-64
5
8
12
20
27
20
8
students
Find
a) The range of the distribution,
b) Quartile deviation,
c) Mean absolute deviation form the mean,
d) Variance and standard deviation, and
e) Coefficient of variation.
Answer:
a) using class limits = 34/using mid-points = 30
b) QD = 5.375
c) MD= 6.46
d) S2 = 61.24 and S = 7.82
e) CV = 15.8%
97
5.2. Moments, Skewness, and Kurtosis
In this section, we will deal with two other important characteristics of a frequency distribution. One
refers to lack of symmetry in the distribution, or its departure from being bell-shaped. The other relates to
the degree of flatness or peakdness of a distribution at its top. The former is described as skewness and
the later kurtosis.
5.2.1. Moments
 Moments tell us information about the “shape” of the distribution
 It is represented by Mr, r =0, 1, …, r, which is called the rth moment.
 We can have moments about any constant number, about the mean, zero or any desired value.
In general, the rth moment about any arbitrary constant number, say A, is given by
 X
Mr 
 A
2
i
n
Example 5.18. Consider the following data and compute the first four moments bout five (5).
2, 2, 3, 4, 4, 5, 6, 7, 8
Solution:A=5
n=9
Mr
Xi
Xi-5
 X i  52
 X i  53
 X i  54
2
2
3
4
4
5
6
7
8
Total
-3
-3
-2
-1
-1
0
1
2
3
-4
9
9
4
1
1
0
1
4
9
38
-27
-27
-8
-1
-1
0
1
8
27
-28
81
81
16
1
1
0
1
16
81
278
 X

 5
r
i
n
98
n
M0 
M1
 X
 5
0
i

9
 X

 5
2
i 1
9

9
 1
9
1
i
= 4
9
 X
M 
1
 5
9
2
i
= 38
9
M3
 X

M4
 X

 5
3
i
9
=  28
9
 5
9
4
i
= 278
9
9
Note: For grouped data the rth moment about any constant number, say A, is given as:
 f  X  A

f
r
Mr
i
i
i
where;
f i => Frequency of Xi in case of discrete grouped data
f i => Frequency of the ith class in case of continuous groped data
and here Xi is the class mark of the ith class.
Note: M0 is always equal to 1.
Example 5.19. Find the first three moments about 4 for the data given in table 5.6
Table 5.6 Number of children in ten families
Xi
2
3
4
5
3
2
3
2
Solution:-
Xi
fi
Xi  4
f i  X i  4
 X i  42
f i  X i  4 2
f i  X i  4 3
f i  X i  4 4
2
3
4
5
Total
3
2
3
2
-2
-1
0
1
-6
-2
0
2
-6
4
1
0
1
12
2
0
2
16
-8
-1
0
1
-24
-2
0
2
-24
99
 f  X  4

f
2
M0
i
i

i
M1   6
M 2  16
10
i
10
10
= -0.6
= 1.6
10
M 3   14
 f 1  10 = 1
= -2.4
10
Central Moments (Moment about the mean)
th
The r central moment for ungrouped data is given by the formula.
X

Mr
 
r
i
N
Mr 
X
X
i
n
, for the population with N observations and mean  .
, for sample data with n sample size and mean X .
Similarly, for grouped data the central moment is defined as:
 f X   

f
r
Mr
i
i
for the population, and
i
 f X  X  for sample data.
M 
f
where;  f  N - for the population
 f  n - for sample
w
i
i
r
i
i
i
i
X = class mark of the ith class in case of continuous grouped
data.
= frequency of Xi in case of discrete grouped data & frequency
of the ith class in case of continuous grouped data.
Example 5.20. Find the first three central moments for the population data given by:X = 2, 3, 7
Solution

X
N
i

2  3  7 12

5 =4
3
M0 = 1
=0
100
=
=
Note:
 For central moments
 M0 = 1
 M1 = 0
 M2 =
=
(variance of X)
 M2 and M3 help us to measure Skewness and Kurtosis
 Moment about the origin (i.e, A = 0) is given by:-
Example 5.21. Compute the first four moments about the mean for the following sample data (discrete frequency
distribution)
Table 5.7
Xi
-3
1
2
3
5
Fi
2
1
4
2
3
Solution:=2
-3
2
-5
-10
25
50
-125
-250
625
1250
1
1
-1
-1
1
1
-1
-1
1
1
2
4
0
0
0
0
0
0
0
0
3
2
1
2
1
2
1
1
1
2
5
3
3
9
9
27
27
81
81
243
Total
0
80
M0 = 1
M1 = 0
M2 =
= 6.6667
M3 =
= -14.083
M4 =
= 124.67
101
-169
1496
5.2.2. Skewness
Skewness refers us lack of symmetry. We study skewness to have an idea about the shape of the
curve which we can draw with the help of the frequency distribution.
Frequency distributions often found skewed on either side of its central value. As a result, it has
a longer tail either to the left or to the right. When there is a longer tail to the right of the center,
the distribution is said to be positively skewed. If the tail is longer to the left of the center, the
distribution is said to be negatively skewed.
A positive skewness means a greater dispersal of individual observations towards the right of the
central value. A negative skewness, on the other hand, implies that individual observations have
greater dispersal towards the left of the central value.
Skewness, therefore, not only refers to the lack of symmetry in distribution, it also shows the
direction of dispersion of individual observations on either side of the center of the distribution.
Accordingly, a measure of skewness quantifies the extent of departure from symmetry and also
indicates the direction in which the departure takes place.
Diagrammatically, the shape of frequency curves:
a)
b)
Positively Skewed
Symmetrical distribution
c)
Of the measures of skewness,
two shall
be discussed here.
Negatively
skewed
102
a)
Moment coefficient of skewness
b) Pearsonian coefficient of skewness
a) Moment coefficient of Skewness
In terms of moment coefficient, skewness is defined as:
=
=
Where M2 = S2 = variance
Interpretation:
(1) If
= 0 => Symmetrical distribution
(2) If
< 0 => Negatively skewed distribution
(3) If
> 0 => positively skewed distribution
(4) A greater or smaller value of
means a greater or smaller degree of skewness.
Example 5.22. Find the skewness of the distribution given in example 5.18
Solution:
Thus
= 0.567
<0, therefore the distribution is negatively skewed.
b) Pearsonian coefficient of Skewness
Pearsonian coefficient of skewness is developed by Karl Pearson. This measure is based on the
fact that when a distribution drifts away from symmetry, its mean, median, and mode tend to
deviate from each other. This results about from the presences of exceptionally high or low
observations affecting the value of the mean the most, and that of the mode the least.
The value of the mean tends to be the highest and that of the mode the lowest when some
observations in a given set of data are exceptionally high. Consequently, a distribution having
exceptionally high observations has a longer tail towards the right. Contrarily, mean tends to be
the lowest, and mode the highest, when a set of data contain some exceptionally low
observations. As a result, the distribution will have a longer tail towards the left.
Thus, it is the direction in which mode drifts from mean that determines whether a distribution
will have positive or negative skewness. Using this conclusion, the pearsonian coefficient of
skewness, denoted as
, is defined as
103
In which S is standard deviation. Using the empirical relationship among mean, mode and median in a moderately
skewed distribution, i.e, mode = mean – 3(mean – median), the above equation can be modified as
Note:
1.
2.
If
the distribution is symmetrical
3.
If
the distribution is positively skewed
4.
If
, the distribution is negatively skewed
Example 5.23. Find the skewness of the following data using pearsonian’s coefficient of skewness.
Solution:Arrange the data in an increasing order
1, 2, 4, 5, 6, 7, 8, 10, 30, 32
= 6.5
= 10.5
= 124.06
= 11.14
Therefore,
=
=
= 1.077
Interpretation: The distribution is positively skewed.
5.2.3. Kurtosis
Another attribute of a frequency distribution is its peakdness, or flatness, at its top. A distribution may have a
smaller or greater degree of flatness at its top. Thus, it is the characteristics of flatness or peakdness at the top of the
distribution that kurtosis describes and measures.
Taking symmetrical distribution as a frame of reference, a distribution which is more peaked than the normal as in
(a) below is known as Leptokurtic distribution. The one whose polygon is flat at its top as in (c) below is called a
platikurtic distribution. A distribution with a polygon which is neither to high in peak, nor too flat at the top as in
(b) is termed as Mesokurtic distribution.
a. Leptokurtic
b. Mesokurtic
104
c. Platykurtic
We have two measures of Kurtosis
(i) The coefficient of Kurtosis
(ii) Moment coefficient of Kurtosis
(i) The coefficient of Kurtosis
The coefficient of kurtosis denoted by K is defined as a ratio of inter-quartile range to inter- decile range.
K=
Interpretation:

If K = 0.5, approximately the distribution is Mesokurtic

If K > 0.5, approximately the distribution is leptokurtic

If K<0.5, approximately the distribution is platykurtic.
(ii) Moment coefficient of Kurtosis
Moment coefficient of Kurtosis is Kurtosis in terms of the fourth moment about the mean, denoted by B 2, and is
defined as
Where S is standard deviation.
Interpretation:

If
=> Mesokurtic distribution

If
=> Leptokurtic distribution

If
=> Platykurtic distribution
105
Review Exercises
1. Which of the following is not a measure of dispersion
a) Range
b) Standard deviation c) Variance d) Harmonic mean
2. A disadvantage of the range is
a) Only two values are used in its calculation
b) It is in different units than the mean
c) It is easy to calculate
d) All of the above
3. The standard deviation is
a) Based on squared deviations from the mean
b) In the same units as the mean
c) Uses all the observations in its calculation
d) All of the above
4.
The variance is
a) Found by dividing the mean deviation by N
b) In the same units as the original data
c) Found by squaring the standard deviation
d) All of the above
5. In a positively skewed distribution
a) The mean, median, and mode are all equal
b) The mean is larger than the median
c) The median is larger than the mean
d) The standard deviation must be larger than the mean or the median
6.
In a symmetric distribution
a) The mean, median, and mode are equal
b) The mean is the largest measure of location
c) The median is the largest measure of location
d) The standard deviation is the largest value
7. A coefficient of skewness of -2.73 was computed for a set of data. We conclude that
a) The mean is larger than the median
b) The median is larger than the mean
c) The standard deviation is a negative number
d) Something is wrong as the coefficient of skewness can't be less than -1.00
106
8. Which of the following statements is true regarding the standard deviation?
a) It cannot assume a negative value
b) If it is zero, then all the data values are the same
c) It is in the same units as the mean
d) All the above are all correct
9. The standard deviation of a normal distribution is found to be 3. What must be the value of the
fourth central moment in order that the distribution to be:
a) Mesokurtic
b) Leptokurtic
c) Platykurtic
10. The mean and standard deviation of 25 observations were found to be 30 and 3 respectively. After
the calculations were made, it was found that two of the observations were recorded as 29 and 31
incorrectly. Find the mean and standard deviation if the incorrect observations are excluded
11. A person invested his money in to two areas A and B. His net profit (in Birr) for the first three
months are:
Area A
72
76
74
Area B
45
92
85
a) Find the mean net profit for each area of investment
b) Find the range of net profit in both areas.
c) Which area is risky to invest? In which area is the net profit more consistent?
12. The yearly salaries of all employees working for a company have a mean of Birr 42350 and a
standard deviation of Birr 3820. The years of schooling for the sample of employees have a mean
of 15 years and a standard deviation of 2 years. Is the relative variation in the salaries higher or
lower than that in years of schooling for these employees? Why?
13. The coefficient of variation of a distribution is 60% and its standard deviation is 12. Find out its
mean.
14. The mean and variance of five observations is 4.8 and 4.56 respectively. If the three of the five
observations are 2, 5 and 6, find the other two observations
15. Using the frequency distribution given below, find
a) The range,
b) Quartile deviation
c) Mean absolute deviation from mean
d) Variance and standard deviation
e) Pearsonian coefficient of skewness using two different formula
Class Intervals
50 - 51
53 - 55
56 - 58
107
59 - 61
62 - 64
Frequencies
5
10
21
8
6
Chapter 6
Simple linear Regression and Correlation
Chapter Objective:
Dear reader, after studying this chapter, you will be able to:

Define regression analysis

Define and fit simple linear regression

Predict the population average value of the dependent variable on the basis of known (fixed) values of the
independent variable.

Understand correlation

Compute the Pearsonian and rank correlation coefficients.
6.1. Simple Linear Regression
In the preceding chapters we have been dealing with data on a single variable. Here we shall focus on methods of
dealing with paired data, which may be related in some way.
Regression Analysis:- is concerned with describing and evaluating the relationship between a dependent variable
and one or more independent variables. Therefore, regression is used for bringing out the nature of relationship and
using it to know the best approximate value of the other variable. In what follows, therefore, we will deal with the
problem of estimating and/or predicting the population mean/average values of the dependent variable on the basis
of known values of the independent variable (s).
The variable whose value is to be estimated/predicted is known as dependent variable while the variables which
help us in determining the value of the dependent variable are known as independent variables.
A regression equation which involves only two variables, a dependent and an in dependent referred to us simple
regression. This model assumes that the dependent variable is influenced by only one systematic variable and the
error term. However, when several variables (necessarily more than two) are included in the model, it is called
multiple/multivariate regression.
The relationship between any two variables may be linear or non-linear. The former implies a constant absolute
change in the dependent variable in response to a unit changes in the independent variable while the latter implies
varying marginal change in the dependent variable in response to changes in the independent variable.
Consequently, in this chapter we will confine ourselves to the type of regression involving only tow variables and
the type of relationship between our variables which is linear. If this turns out to be the case, it is called simple linear
regression.
6.1.1. The Scatter Diagram
108
Consider the following data collected by taking a sample of five industries in a given industrial sector on their input
(number of workers) and output (thousands of birr).
Table 6.1.
(Yi)
(Xi)
Paired date
output (thousands of
Inputs (no of
(Xi, Yi)
Birr)
workers)
1
4
2
(2,4)
2
7
3
(3,7)
3
3
1
(1,3)
4
9
5
(5,9)
5
17
9
(9,17)
Industry
Output level (Yi) is believed to depend on number of workers (X i). Accordingly, Yi is a dependent variable and Xi is
independent variable.
In order to visualize the form of regression we plot these points on a graph as shown in fig. 6.1. What we get is a
scatter diagram.
Y
20
*
15
*
10
5
*
1
*
*
2
3
4 5 6
7
8
9
X
When carefully observed, the scatter diagram at least shows the nature of relationship; whether positive or negative
and whether the curve is linear or non-linear.
When the general course of movement of the paired points is best described by a straight line, the next task is to fit a
regression line which lies as close as possible to every point on the scatter diagram. This can be done by means of
either free hand drawing or the method of least squares. However, the latter is the most widely used method.
6.1.2. The regression
Equation
Regression equation is a statement of equality that defines the relationship between two variables. The equation of
the line which is to be used in predicting the value of the dependent variable takes the form Y e = a + bx. The most
universally used and statistically accepted method of fitting such an equation is the method of least squares.
The Method of Least Squares:109
This method requires that a straight line is to be fitted being the vertical deviations of the observed Y values from
the straight line (predicted Y values) is the minimum.
As shown in fig 6.1, if e1, e2, …… e5 are the vertical deviations of observed Y values from the straight line
(predicted Y values – Ye), fitting a straight line in keeping with the above condition requires that (for n sample size)
n
e
=
i 1
2
i
is minimum. This can be done by partially differentiating
respect to a and b and equating them to zero.

ei is the error made when taking Ye instead of Y. Therefore, ei = Yi – Ye.

 e =  Y  Y 
 e =  Y  a  bX 
 e
  (Y  a  bx )

0
2
2
i
i
e
2
2
i
i
2
i
2
i
a

i
a
 Y  a  bX   0
Y   a   bx  0
na  Y b X


 -2

i
i
i
i
i
i
n
i
n
n
a  Y  bX

  ei2
b

 -2
  (Yi  a  bxi ) 2
b
 Y
i

0
 a  bX i X i  0
=0

=0


Therefore,
b=
Or equivalently, multiplying both the numerator and denominator by n, we get:
110
e
2
i
with
Example 6.1. Suppose we want to study the relationship between input (number of workers) and output
(thousands of Birr) of five factories given in table 6.1. above. To fit the regression line of Yi (thousands of
Birr) on Xi (number of workers, we can employ the method of least squares as follows:
Solution. Table 6.2.
Arrange the data in tabular form
Where

Yi
Xi
YiXi
Xi2
Tab.
4
2
8
4
Mean of
6.2
7
3
21
9
Mean of
3
1
3
1
9
5
45
25
17
9
153
81

40
20
230
120
Mean
8
4
= summation /total
n = number of sample size
n=5
Substituting these values in the above equations, we get
=
=
=
=
=1
Therefore, the least square regression equation equals:

Estimate the amount of Birr that a factory will have if it has 8 workers.
Xi = 8
(8)
Consequently, if a factory has 8 workers, its level of output will be 15 thousand ETB.
Example 6.2. In what follows you are provided with sample observations on price and quantity supplied of a
commodity X by a competitive firm.
a)
Construct the scatter diagram
b) What is the linear regression of Yi(quantity supplies) on Xi(price of the commodity X).
c)
Suppose price of the commodity X be 32, what will be the quantity supplied by the firm?
111
Tab. 6.3. Data on price and quantity supplied.
(Yi)
40
45
40
50
55
60
60
65
70
75
55
60
675
Total
(Xi)
15
20
25
30
35
40
45
50
55
60
40
45
460
XiYi
600
900
1000
1500
1925
2400
2700
3250
3850
4500
2200
2700
27,525
*
a)
70
60
*
50
*
40
*
**
*
*
*
*
*
30
20
10
10
20
30
b)
40
50
= 0.7795
= 26.3718
Therefore, the estimated supply function is
Ye = 26.3718 + 0.7795 Xi
c)
60
Xi = 32
Ye = 26.3718 + 0.7795 Xi
= 26.3718 + 0.7795 (32)
= 26.3718 + 24.944
112
70
Xi2
225
400
625
900
1225
1600
2025
2500
3025
3600
1600
2025
19,750
= 51.3158
If the price of x is 32, the estimated quantity supplied will be approximately equal to 51 units.
6.1.3.
Regression of X on Y
In the above sub-topic 6.1.2. we have explored regression of Y on X type. Sometimes, it is possible and of interest to
fit the regression of X on Y type, i.e., being Y as independent and X dependent.
In such cases, the general form of the equation is given by:
Where Xe = expected value of X
a0 – X-intercept
b0 – slope of the regression
Applying the principle of least squares as before, the constants a 0 & b0 are given as follows:
N.B. The regression equation of Y on X type and of X on Y type coincide at
.
6.2. Correlation
The correlation coefficient measures the degree to which two variables are related /associated – simple correlation
denoted by r. For more than two variables we have multiple correlations.
Two variables may have either positive correlation, negative correlation or may not be correlated. Furthermore,
depending on the form of relationship the correlation between two variables may be linear or non-linear. Therefore,
in this section, we shall be concerned with quantifying the degree of association between two variables with linear
relationship.
Contrary to regression analysis explained in the previous section (6.1), the computation of coefficient of correlation
does not require one variable to be designated as dependent and the other as independent.
The measure of the degree of relationship between any two variables known as the pearsonian coefficient of
correlation, usually denoted by r, is defined
and is termed as the product – moment formula. It can be further simplified as
NB. The building blocks of this formula are, therefore,
and n(sample size).
Properties of pearsonian coefficient of correlation
1.
2.
3.
When r = 1/-1 perfect positive/negative correlation.
4.
Adding a constant number to each value of X and Y, as well as multiplying each value by a constant does
not affect the value of r.
5.
The closeness of the relationship is not proportional to the value of r.
113
6.
When r is positive and close to 1 then there is high positive correlation while when it is close to zero it
shows low positive correlation. Similarly, when r is negative and close to -1 then there is high negative
correlation while when it is close to zero it shows low negative correlation
7.
It is free of any units used.
Example 6.3. Find the pearsonian coefficient of correlation for the two variables in the data of table 6.1.
Solution
Table 6.4.
Total
Yi
Xi
Xi2
Yi2
XiYi
4
2
4
16
8
7
3
9
49
21
3
1
1
9
3
9
5
25
81
45
17
9
81
289
153
40
20
120
444
230
= 0.99
Interpretation: it implies strong positive relation:
Example 6.4. Find the pearsonian coefficient of correlation for the two variables in the data of table 6.3.
Solution: Table 6.5.
Total
Yi
40
45
40
50
55
60
60
65
70
75
55
60
675
Xi
15
20
25
30
35
40
45
50
55
60
40
45
460
Xi2
225
400
625
900
1225
1600
2025
2500
3025
3600
1600
2025
19,750
Yi2
1600
2025
1600
2500
3025
3600
3600
4225
4900
5625
3025
3600
39,325
= 0.974
Interpretation: It implies strong positive relation between X & Y.
114
XiYi
600
900
1000
1500
1925
2400
2700
3250
3850
4500
2200
2700
27,525
Therefore,
Example 6.5. Adding to each value of X and Y given in table 6.1 a constant number, say 1, show that property 4
holds true.
Solution
Table 6.6.
Total
=
Yi
Xi
Xi2
Yi2
XiYi
5
3
9
25
15
8
4
16
64
32
4
2
4
16
8
10
6
36
100
60
18
10
100
324
180
45
25
165
529
295
= 0.99
Therefore, we have shown that property 4 is true.
Spearman’s Rank Correlation Coefficient
The pearsonian coefficient of correlation cannot be used in cases when the direct quantitative measurement of the
phenomenon under study is not possible. In such cases, we make use of the rank correlation coefficient.
Steps involved to calculate the spearman’s coefficient of rank correlation:
1.
Rank the X values among themselves giving rank (1) to the largest (or smallest value and (2) to the next
largest (or smallest) value and so on.
2.
Rank the Y-values among themselves in a similar way to that of X.
3.
When there are ties in rank, i.e., when there are values sharing the same rank, assign toe ach of the filed
observation, the mean of the ranks they jointly occupy and the next rank to be over looked.
4.
Find the sum of the squares of the differences between ranks of two variables.
5.
Apply the formula
n = number of pairs of observations
di =ith difference between ranks of X and Y
As the steps above indicate, rs may be calculated for numerical data after ranking the values according to numerical
size.
Example 6.2. Consider the ranks given by two Judges for five ladies in a beauty contest:
Table 6.7
Judges
Ladies
AZEB
TIZITA
FATUMA
RA
RB
1
3
4
2
4
3
115
LEMLEM
CHALTU
2
5
1
5
Solution:
di
di2
1
1
1
1
-1
1
-1
1
0
0
Total
4
=
= 0.75
Interpretation: Since rs= 0.75, it implies that there is similarity between the ranks of Judge A and Judge B.
Review Exercises
1.
Define and distinguish between;
a)
Regression and correlation
b) Simple and multiple regression
c)
Linear and non-linear relationship
2.
Bring out the relevance of a scatter diagram in regression analysis.
3.
Explain the meaning and status of the two constants a and b in the regression equation Y e = a + bXi.
4.
The marks obtained by 10 students in their graduation with B.A. degree in management and the MBA
entrance test were found as given below.
Graduation (Xi)
50
52
55
60
62
65
65
66
70
75
Entrance test (Yi)
52
50
57
65
65
62
65
65
71
75
Therefore, find
a)
The two regression equations
b) The correlation coefficient between two sets of marks
5.
Obtain the regression equation of X on Y and Y on X for the paired data given below. Also compute the
coefficient of correlation.
6.
Market price of X
26
28
30
31
35
Market price of Y
20
27
28
30
25
Ten students got the following marks in Maths and Statistics
Student
A
B
116
C
D
E
F
G
H
I
J
Maths (X)
78
36
98
25
75
82
90
62
65
39
Statistics (Y)
84
51
91
60
68
62
86
58
58
47
Compute the coefficient of Rank correlation and interpret the result.
7.
For a certain set of paired data on X and Y, 3Xi + 2Yi – 26 = 0 and
6Xi + Yi – 31 = 0 are the two
regression equations.
a)
Find the mean values
b) Find the coefficient of correlation
8.
A leading company engaged in the production of detergents has 10 vacancies of salesman for which 15 (n)
persons were called for personal interviews. The interview board consisted of the sales manager and a
psychologist. The ranks given by the two to all 15 candidates who attend the interview is given below.
Sr.No. in the interview 1
2
4
5
8
9
10
11
13
14
15
17
18
19
20
sales 2
3
1
5
4
6
8
7
9
10
12
11
13
14
15
the 1
3
2
4
6
5
7
9
8
11
10
12
14
13
15
list
Ranking by the
manager (xi)
Ranking
by
psychologist (Yi)
Compute the rank correlation coefficient.
117
Chapter Seven
Elementary Probability
Chapter Objectives;
Dear learner, at the end of this chapter, you are expected to:
7.1.

Define probability.

Understand the basic terms such as experiment, outcome and event.

Calculate probabilities applying the rules of addition and multiplication.

Define the terms conditional probability and joint probability.

Understand permutation and combination.

Define the terms random variable and probability distribution.

Distinguish between a discrete and continuous probability distribution

Calculate the mean, variance and standard deviation of discrete probability distributions

Understand binomial and normal probability distributions.

Define and calculate the Z-value

Compute probabilities using the standard normal distribution.
Introduction
Probability as a general concept can be defined as the chance of an event occurring. Probability theory gives us
methods of dealing with uncertainty. As nothing is accurately predictable, uncertainty is common feature of every
decision making process. In such situations the probability theory comes to our aid, by providing the necessary
methods to take appropriate decisions even under conditions of risk and uncertainty.
7.2. Definition and basic concepts
An Experiment – is the process that leads to the occurrence of one or more possible observations.
Example:- Tossing a coin
-
Rolling two dice once
-
Drawing a card from a deck
Sample Space – is a complete listing of all elementary events of an experiment.
Example.

The sample space for the experiment of tossing a coin is (H,T). if two coins are tossed once, the sample space is
(H1, H2) (H1, T2) (T2 H2) (T1 T2).

The sample space for the roll of a single die is (1,2,3,4,5,6). If two dice are rolled once, the possible outcomes
(sample space) are:-
118
Sample points:- are elements of sample space.
Example. 2 is one sample point of rolling a die.
To find the number of sample spaces, apply the
formula where n is the number of experiments and K is the
number of possible outcomes of a single experiment.
An Event – is the collection of one or more outcomes of an experiment. Events are mutually exclusive if the
occurrence of any one event means that none of the others can occur at the same time. That is if two events
cannot occur at the same time, they are mutually exclusive. Events are independent if the occurrence of one
event does not affect the occurrence of another. Events are collectively exhaustive if at least one of the events
must occur when an experiment is conducted.
Example: A fair die is rolled once. The experiment is rolling a die. The possible outcomes are the numbers
1,2,,4,5, and 6. If an event is the occurrence of an even number, we should collect the outcome, 2,4 and 6.
Probability is a measure of the chance or likelihood that a particular event will happen in the future. It can only
assume between 0 and 1. For instance, probability of E which is written as P(E) as a number do have the
properties:

 P(E) = 0 means the event will not happen and is called impossible event.
 P(E) = 1 means we are 100% sure that the event will occur (sure event)
Probability can be defined in three different approaches.
(i) Classical probability
(ii) Relative frequency (Emperical) probability
(iii) Subjective probability
i)
Classical Probabilities:- It is based on the assumption that the outcomes of an experiment are equally
likely. It applies rules and laws and involves an experiment.
Where: N = total possible outcomes of an experiment
n = the number of outcomes in which the event occurs
out of N outcomes in an experiment.
Examples. In a coin tossing experiment, what is the probability of getting a head on one toss of a coin? As
there are only two possible outcomes, the probability is 50% or 0.5 or ½ .

ii)
An unbiased die is thrown. What is the probability that digit 2 appears? Ans.
.
Relative frequency (Emperical) Probabilities- This method is based on cumulative past historical data.
:
119
a)
Suppose that, of the last 70 days with conditions like those forecasts for today, it rained for 12 days,
what is the probability of rain today based on those historical days?
= 0.17 or 17%
b) Throughout her teaching career a professor has awarded 186 A’s out of 1200 students. What is the
probability that a student in her section this semester will receive an A grade?
= 0.1555
iii)
Subjective Probability:- It uses probability value based on an educated guess or estimate, employing
opinions and inexact information. For example, a seismologist might say that there is a 45% probability
that an earthquake will occur in Afar after thirty years.
7.3.
Basic Rules of Probability
If two events A and B are mutually exclusive, the special rule of addition states that the probability of A or
B occurring equals the sum of their respective probabilities: P (A or B) = P(A) + P(B)
Definition: Two events of a single experiment are said to be mutually exclusive if they cannot occur simultaneously
as a result of the experiment. This is equivalent to saying that mutually exclusive events must have disjoint event
sets.
Example: Abay Zuria transport association has recently supplied the following information on their trip
from Bahir Dar to Debre Markos:
Arrival
Frequency
Early
100
On time
800
Late
75
Cancelled
25
Total
1000
 If A is the event that a bus arrives early, then P(A) = 100/1000 = .10.
 If B is the event that a bus arrives late, then P(B) = 75/1000 = .075.
 The probability that a bus is either early or late is:
P (A or B) = P(A) + P(B) = .10 + .075 =.175.
The complement rule
The complement rule is used to determine the probability of an event occurring by subtracting the
probability of the event not occurring from 1.
If P(A) is the probability of event A and P(~A) is the complement of A, then P(A)+P(~A)=1 or P(A)= 1P(~A).
120
Examples:
1) Two events X and Y are mutually exclusive. Suppose P(X) =0.05 and P (Y) =0.02. What is the
probability that either X or Y will occur (0.07). What is the probability that neither X nor Y will
happen? (0.93)
2) Suppose the probability that you will score an A in this class is 0.25 and the probability that you
will get a B is 0.50. What is the probability that your grade will be above C? (0.75)
3) The probabilities of events A and B are 0.20 and 0.30 respectively. The probability that both A
and B occur is 0.15. What is the probability of either A or B will occur?(0.35)
4) A student is taking two courses, microeconomics and statistics. The probability that the student
will pass the microeconomics course is 0.60 and the probability of passing the statistics course is
0.70. The probability of passing both is 0.50. What is the probability of passing at least in one
course? (0.80)
The general rule of addition
If A and B are two events that are not mutually exclusive, then P(A or B) is given by the following
formula: P(A or B) = P(A) + P(B) - P(A and B)
Example: In a sample of 500 students, 320 said they had a radio, 175 said they had a TV, and 100 said
they had both:
 If a student is selected at random, what is the probability that the student has only a radio, only a
TV, and both a radio and TV? Solution: P(S) = 320/500 = .64.
P(T) = 175/500 = .35. P(S and
T) = 100/500 = .20.
 If a student is selected at random, what is the probability that the student has either a radio or a
TV in his or her room? Solution: P(S or T) = P(S) + P(T) - P(S and T)= .64 +.35 - .20 = .79.

Joint Probability
A joint probability measures the likelihood that two or more events will happen at the same time.
 An example would be the event that a student has both a radio and TV in his or her dorm room.
Special rule of multiplication
The special rule of multiplication requires that two events A and B are independent.
Two events A and B are independent, if the occurrence of one has no effect on the probability of the
occurrence of the other.
121
If the occurrence of one event has no effect on the probability of the occurrence of any other event, then
the events are called independent events. Two events originating from independent experiments will be
independent, while two events originating from the same experiment will not, in general, be independent.
Example: Suppose two coins are tossed, the outcomes of one coin (head or tail) is unaffected by the outcome of the
other coin (i.e. head or tail). That is, the outcome of the second event does not depend on the outcomes of the first
event.
 This rule is written:
P(A and B) = P(A)P(B)
7.4. Conditional Probability
A conditional probability is the probability of a particular event occurring, given that another event has
occurred. The probability of the event A given that the event B has occurred is written P(A|B).
General rule of multiplication
The general rule of multiplication is used to find the joint probability that two events will occur. It states
that for two events A and B, the joint probability that both events will happen is found by multiplying the
probability that event A will happen by the conditional probability of B given that A has occurred. The
joint
probability,
P(A
and
B)
is
given
by
the
following
formula:
P(A and B) = P(A)P(B/A) or
P(A and B) = P(B)P(A/B)
Where P (B/A) = probability of B given that event A has occurred.  Conditional probability
P( A / B) 
P( AandB )
, P( B)  0
B
Example: The Dean of the School of Business at a University collected the following information about
undergraduate students in her college:
Major
Male
Female
Total
Accounting
170
110
280
Finance
120
100
220
Marketing
160
70
230
122
Management
150
120
270
Total
600
400
1000
a) If a student is selected at random, what is the probability that the student is a female (F) and
Accounting major (A)
P (A and F) = 110/1000.
Given that the student is a female, what is the probability that she is an Accounting major?
P (A|F) = P (A and F)/P (F) = [110/1000]/[400/1000] = .275
Let an experiment have a sample space S with E as any event. We define the probability of E occurring written as P
(E) as a number of satisfying the following conditions.
P(S) = 1,
p
i
=1
Additional examples:
1.
An experiment is performed by tossing a normal coin and observing which side (H or T) is shown
uppermost.
2.
a.
Write down the sample space S = (H, T)
b.
Calculate P(H) = ½
c.
Show that P(S) = 1 = (
d.
Show that
1 1
  1)
2 2
E1 (H) and E2 (T) are mutually exclusive.
A fair dies is rolled once as an experiment with S = (1,2,3,4,5,6)
a.
P(1 or 2) = P(1)+P(2) = 1/6+/6=1/3
b.
P(X<4) = ½
c.
P(even number)= ½
d.
P(even or less than 4)=P(even number) + P(<4) – P(even number and <4)=1/2 +1/2 -1/6=5/6
7.5. Counting Procedures

Permutation is any arrangement of r objects selected from n possible objects. The formula to count the total
number of different permutation is
n
pr 
n!
where n! n(n  1)(n  2)........2 *1 By definition 0! (read as zero factorial)=1
(n  r )!
NB. The arrangements abc and bac are different permutations.
Example: If you have three guests (Abebe, Bekele, Chala) invited to come to your house,
a.
In how many ways can they sit on the chair available in your house?
Sitting Arrangement
Abebe, Bekele, Chala
Abebe, Chala, Bekele
123
Bekele, Abebe, Chala
Bekele, Chala, Abebe
Chala, Abebe. Bekele
Chala, Bekele, Abebe
3
p3 
3!
6
(3  3)!
Therefore, there are 6 different arrangements for the three guests.
b.
If you want to arrange a seat for two guests out of three, in how many ways can you arrange them?
Abebe, Bekele
Abebe, Chala
Bekele, Abebe
Bekele, Chala
Chala, Abebe
Chala, Bekele
3
p2 
3!
6
(3  2)!
Therefore, there are 6 different sitting arrangements for the two guests.
c.
What if you are trying to give a seat for a guest out of three guests?
Abebe, Bekele, Chala
3
p1 
3!
3
(3  1)!
Therefore, there are 3 different sitting arrangements for a guest.

Combination: is the number of ways to choose r objects from a group of n objects.
Formula
c 
n r
n!
r!(n  r )!
Example: If executives Abebe, Bekele and Chala are to be chosen as a committee to negotiate on the price of a car,
a.
How many combinations of these three executives are possible?
Solution:
c 
3 3
3!
 1.
3!(3  3)!
There is only one combination of these three. The committee of Abebe, Bekele and Chala is the same as the
committee of:
Bekele, Chala and Abebe or
Chala, Abebe and Bekele
Bekele, Abebe and Chala
Chala, Bekele and Abebe
Abebe, Chala and Bekele
b.
How many possible combinations are possible of two executives are supposed to negotiate to by a car?
Abebe, Bekele
124
Abebe, Chala
Bekele, Chala
c 
3 2
c.
3!
3!

 3 . Three combinations are possible.
2!(3  2)! 2!*1!
How many possible combinations are possible if one executive is supposed to negotiate to buy a new car?
Abebe, Bekele, Chala
c 
3 1
3!
3!

 3 Three combinations are possible.
1!(3  1)! 1!*2!
7.6. Probability Distributions and Random Variables

Probability Distribution: It is a listing of all the outcomes of an experiment and the probability of each of
these outcomes either tabular or graphically.
 Random Variables
A random variable is a numerical value determined by the outcome of an experiment.
 Types of Probability Distributions
 A discrete probability distribution can assume only certain outcomes.
 A continuous probability distribution can assume an infinite number of values within a given
range.
Examples of a discrete distribution are:
 The number of students in a class.
 The number of children in a family.
 The number of cars entering a carwash in a hour.
Examples of a continuous distribution include:
 The distance students travel to class.
 The time it takes an executive to drive to work.
 Features of a Discrete Distribution
The main features of a discrete probability distribution are:
 The sum of the probabilities of the various outcomes is 1.00.
 The probability of a particular outcome is between 0 and 1.00.
 The outcomes are mutually exclusive.
Example: Consider a random experiment in which a coin is tossed three times. Let x be the number of
heads. Let H represent the outcome of a head and T the outcome of a tail.
 The possible outcomes for such an experiment will be: TTT, TTH, THT, THH, HTT, HTH, HHT,
HHH.
 Thus the possible values of x (number of heads) are 0,1,2,3.
 The outcome of zero heads occurred once.
 The outcome of one head occurred three times.
125
 The outcome of two heads occurred three times.
 The outcome of three heads occurred once.
 From the definition of a random variable, x as defined in this experiment is a random variable.
The probability distribution is given as
X
P(X)
0
1/8
1
3/8
2
3/8
3
1/8
The Mean of a Discrete Probability Distribution
 The mean:
 reports the central location of the data.
 is the long-run average value of the random variable.
 is also referred to as its expected value, E(X), in a probability distribution.
 is a weighted average.
The mean is computed by the formula:  
 where
[( xP( x)]
represents the mean and P(x) is the probability of the various outcomes x.
The Variance of a Discrete Probability Distribution
 The variance measures the amount of spread (variation) of a distribution.
 The variance of a discrete distribution is denoted by the Greek letter (sigma squared).
 The standard deviation is the square root of Sigma Squared.
The variance of a discrete probability distribution is computed from the formula:
 2  [( x   )2 p( x)]
Examples:
1. The table listed below show random variables and their probabilities. However only one of these
is actually a probability distribution:
X
P (X)
X
P (X)
X
P (X)
5
0.30
5
0.10
5
0.50
10
0.30
10
0.30
10
0.30
15
0.20
15
0.20
15
-0.20
20
0.40
20
0.40
20
0.40
a) Which one is a probability distribution?
126
b) Using the correct probability distribution, find the probability that X is
1) Exactly 15 (0.20)
2) Not more than 10 (0.40)
3) More than 5 (0.90)
c) Calculate the mean, variance and standard deviation of the correct probability distribution.
Mean=5*.10+10*.30+15*.2+20*.4=0.5+3+3+8=14.5
2. According to recent information published in the capital magazine 36 percent of the households
in the Ethiopia have one TV set, 47 percent have 2 sets, 15 percent have 3 sets, and 2 percent
have 4 sets.
a) Depict the probability distribution
X
1
2
3
4
P(X)
0.36
0.47
0.15
0.02
b) What is the mean number of sets per household?
  1(.36)  2(.47)  3(.15)  4(.02)  1.83
127
c) What is the variance of the number of sets per household?
 2  1  1.832 (.36)  2  1.832 (.47)  3  1.832 (.15)  4  1.832 (.02)  .5611
3. The head of a department estimated the distribution of student admission to his department for
the next semester based on past experience as follows:
Admission
Probability
1000
0.60
1200
0.30
1500
0.10
a) What is the expected number of students who will admit to the department next semester?
(Ans. 1110)
b) Compute the variance and standard deviation
 The binomial distribution
 The binomial distribution has the following characteristics:
 An outcome of an experiment is classified into one of two mutually exclusive categories,
such as a success or failure.
 The data collected are the results of counts.
 The probability of success stays the same for each trial.
 The trials are independent
Mean & Variance of the Binomial Distribution
 The mean is found by:   n
 The variance is found by:   n (1   )
2
 To construct a binomial distribution, let

n be the number of trials

x be the number of observed successes

 be the probability of success on each trial
 The formula for the binomial probability distribution is:
 P( x)n cx x (1   ) n x
Example: The Department of Labor reports that 20% of the workforce is unemployed. From a
sample of 14 workers, calculate the following probabilities:
 Exactly three are unemployed.
 At least three are unemployed.
128
 At least one are unemployed.
129
Solution
 The probability of exactly 3:
P( x)n cx x (1   ) nx
P(3)14 c3 (.2)3 (1  .2)11  364.91* 0.008 * 0.859  0.2501
 The probability of at least 3 is:
 P( x  3)14 c3 (.2)3 (1  .2)1114c4 (.2) 4 (1  .2)10  ...14c14 (.2)14 (1  .2)0  0.551
 The probability of at least one being unemployed.
P( x  1)  1  P(0)  114 c0 (.2)0 (1  .2)14  0.956
 The Normal Probability Distribution
Characteristics of a Normal Probability Distribution
 The normal curve is bell-shaped and has a single peak at the exact center of the
distribution.
 The arithmetic mean, median, and mode of the distribution are equal and located at the
peak. Thus half the area under the curve is above the mean and half is below it.
 The normal probability distribution is symmetrical about its mean.
 The normal probability distribution is asymptotic. That is the curve gets closer and
closer to the X-axis but never actually touches it.
 It is a continuous probability distribution.
 Theoretically, curve extends to infinity
The Standard Normal Probability Distribution
The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1.
It is also called the z distribution. A z-value is the distance between a selected value, designated X, and
the population mean divided by the population standard deviation.
The formula is:
z
X 

Example: The bi-monthly starting salaries of recent MBA graduates follow the normal distribution with a
mean of Birr 2,000 and a standard deviation of Birr 200. What is the z-value for a salary of Birr 2,200?
z
What is the z-value of $1,700?
X 

z

2,200  2,000
 1.00
200
X   1,700  2,200

 1.50

200
130
A z-value of 1 indicates that the value of $2,200 is one standard deviation above the mean of $2,000. A zvalue of –1.50 indicates that $1,700 is 1.5 standard deviations below the mean of $2000.
Example: The daily water usage per person in New Providence, New Jersey is normally distributed with a
mean of 20 gallons and a standard deviation of 5 gallons. About 68 percent of those living in New
Providence will use how many gallons of water? About 68% of the daily water usage will lie between 15
and 25 gallons.
 What is the probability that a person from New Providence selected at random will use between
20 and 24 gallons per day?
z
X 


20  20
 0.00
5
z
X 


24  20
 0.80
5
 The area under a normal curve between a z-value of 0 and a z-value of 0.80 is 0.2881.
 We conclude that 28.81 percent of the residents use between 20 and 24 gallons of water per day.
What percent of the population use between 18 and 26 gallons per day?
z
X 


18  20
 0.40
5
z
X 


26  20
 1.20
5
 The area associated with a z-value of –0.40 is .1554.
 The area associated with a z-value of 1.20 is .3849.
 Adding these areas, the result is .5403.
 We conclude that 54.03 percent of the residents use between 18 and 26 gallons of water per day.
Review Exercises
1) Which of the following is a correct statement about a probability?
a. It may range from 0 to 1
b. It may assume negative values
c. It may be greater than 1
d. It cannot be reported to more than 1 decimal place
e. All the above are correct
2) An experiment is a
a. Collection of events
b. Collection of outcomes
c. Always greater than 1
d. The act of taking a measurement or the observation of some activity
e. None of the above is correct
131
3)
Events are independent if
a. By virtue of one event happening another cannot
b. The probability of their occurrence is greater than 1
c. We can count the possible outcomes
d. The probability of one event happening does not affect the probability of another event
happening
e. None of the above
4) When we find the probability of an event happening by subtracting the probability of the event
not happening from 1, we are using
a. Subjective probability
b. The complement rule
c. The general rule of addition
d. The special rule of multiplication
e. Joint probability
5) The Special Rule of Addition is used to combine
a) Independent events
b) Mutually exclusive events
c) Events that total more than one
d) Events based on subjective probabilities
e) Found by using joint probabilities
6) When we determine the number of combinations
a) We are really computing a probability
b) The order of the outcomes is not important
c) The order of the outcomes is important
d) We multiple the likelihood of two independent trials
e) None of the above
7) The difference between a permutation and a combination is
a. In a permutation order is important and in a combination it is not
b. In a permutation order is not important and in a combination it is important
c. A combination is based on the classical definition of probability
d. A permutation is based on the classical definition of probability
e. None of the above
132
8) Which of the following is not a requirement of a binomial distribution?
a. A constant probability of success
b. Only two possible outcomes
c. A fixed number of trials
d. Equally likely outcomes
9) The expected value of the a probability distribution
a. Is the same as the random variable
b. Is another term for the mean
c. Is also called the variance
d. Cannot be greater than 1
10) The normal distribution is a
a. Discrete distribution
b. Continuous distribution
c. Positively skewed distribution
d. None of the above
11) Which of the following are characteristics of the normal distribution?
a. It is a symmetric distribution
b. It is bell-shaped
c. It is asymptotic
d. All of the above
12) Which of the following statements is correct regarding the standard normal distribution?
a. It is also called the z distribution
b. Any normal distribution can be converted to the standard normal distribution
c. The mean is 0 and the standard deviation is 1
d. All of the above are correct
13) The area under a normal curve between 0 and -1.75 is
a) 0.0401
b) 0.9599 c) 0.4599 d) None
14) The area under a normal curve less than 1.75 is
a) 0.0401
b) 0.9599 c) 0.4599 d) None
15) In the standard normal distribution, what is the probability of finding a z value between -1.25 and
-1.00?
a) 0.3944 b) 0.3413 c) 0.7357 d) 0.0531
133
16) Which of the following is not a requirement of a probability distribution?
a) Equally likely probability of a success
b) Sum of the possible outcomes is 1.00
c) The outcomes are mutually exclusive
d) The probability of each outcome is between 0 and 1
17) In a continuous probability distribution
a) Only certain outcomes are possible
b) All the values within a certain range are possible
c) The sum of the outcomes is greater than 1.00
d) None of the above
18) In a normal distribution the relationship between the mean, median, and the mode is
a. They are all equal
b. The mean is the largest
c. The median is the largest
d. None of the above
Problems
19)
Sixty percent of the students at Scandia Tech drive to class and 30 percent have GPAs of at least
3.00. Ten percent of the students have a 3.00 GPA and drive to class. If we select a student at
random, what is the likelihood that the student had a GPA of 3.00 or drives to class?
20)
An insurance sales representative has an appointment with four clients today.
From long
experience she knows that the probability of selling a policy to a client is .80.
a. What is the probability of selling a policy to all 4 clients?
b. What is the probability of selling a policy to three or more clients?
21) There are 600 employees at the Tuesday Morning’s Department Store corporate headquarters in
Columbia.
See the following breakdown.
Gender
No College College
Total
Male
25
225
250
Female
75
275
350
100
500
600
Total
An employee is selected at random.
a.
What is the probability the employee is female?
b.
What is the probability the employee is either female or attended college?
c.
What is the probability the employee attends college given a female employee?
134
135
For a particular group of taxpayers, 25 percent of the returns are audited. Six taxpayers are randomly
selected from the group.
a.
What is the probability two are audited?
b.
What is the probability two or more are audited?
23) Suppose P (A) =0.75, P (B/A) =0.40, what is the joint probability of A and B?
136
Sample Answer for Review Exercises
Chapter one – Introduction
1.
2.
3.
4.
5.
6.
7.
8.
A
B
C
D
B
E
C
a.
b.
c.
d.
Inferential
Descriptive
Deferential
Inferential
a.
b.
c.
d.
Qualitative
Quantitative
Qualitative
Qualitative
9.
15. D
16. D
Chapter two - Sampling Theory
7. a. Probability
b. From
150 = 6 sample
100 = 4 sample
50 = 2 sample
9.
100 from 10,000; 50 from 5000; 150 from 15,000
2000 from 20,000 & 500 from 50,000
Choose the best answerer
1. D
2. C
3. D
4. D
137
5. C
Chapter three - Data Collection & Presentation
Choose the best answer
1. B
2. C
3. A
4. C
5. A
6. C
7. B
8. C
9. D
10. B
Work Out
5. X = 25, Y = 35, a= 45, b= 65, c= 80 , z = 0.25
6. iii) a = 40, b = 60, c = 40 d = 25
e = no answer
Chapter Four – Measures of central tendency
Choose the best answer
1. B
2. C
3. C
5. C
6. A
7. A
Work Out
7. 70
8. Birr 2000
9. 5%
10. Birr 5.83
11. 2/3
12. 20% & 80% respectively
13. Birr 3.095 /Kg
14. Birr 1400
15. 50.9
16. 15.43
138
17. 15.5 years
18. 66
19. 4 & 6
20. Birr 1500.0185
21. 20.05
22. 3.18
23. 2700
24. 30.20
25. 3200
26. 50 & 100 respectively
27. 39.08
28. f1 = 20
29. f1 = 25 & f2 = 24
30. $0.80
31. 35
32. 21.5 - 28.5
33. a. 30.2
b. 28.75
Chapter Five – Measures of Dispersion
1. D
2. A
3. A
4. D
5. B
9. a. 243
b. > 243
c. < 243
10. Mean = 30, S.D = 3.1
11. a. 74
b. 4 & 47
139
12. CV for salaries = 9%
CV for years of schooling = 13.33%
13. Mean = 20
14. X = 8, Y = 3
Chapter Six – simple linear regression & Correlation
4. a) Ye = 1.434 + 0.993 Xi
Xe = 6.182 + 0.886 Yi
b) r = 0.938
5. Xe = 20.59 + 0.36 Xi
Ye = 12.29 – 0.46 Xi
r = 0.40
6. rs = 0.818
7.
= 4,
= 7, r = -0.5
8. rs = 0.96
Chapter Seven – Introduction to Probability
1. A
2. D
3. D
4. B
5. B
11. D
19. P(A or D) = 0.6 + 0.3 – 0.1 = 0.8
20. P(4) = 0.4096
P(X 3) = 0.8192
21. P(F) = 0.5833
P(F or C) = 0.9583
22. P(2) = 0.2966
P(X 2) = 0.466
P(C/F) = 0.7857
23. 0.30
140
18. A
Bibliography
1. D. A Lind, W.G. Marchal and S.A. Wathen, Statistical Techniques in Business and
Economics, 12th edition
2. Elementary Statistics: A Step by Step Approach, A.G. Blumnan, 2nd and 5th edition
3. Gupta, Introduction to Statistics
4. Ghosh and Saha, Business Mathematics and Statistics, 10th edition
5. Introduction to Probability and Statistics, William Mendel, etal.
6. Monga, G.S. (1972), Mathematics and Statistics for Economics, Vikas Publishing House
141
Download