Uploaded by Eric Atakora

RESEARCH,SAMPLING NOTE

advertisement
RESEARCH
.


a. The systematic investigation into
the study of materials, sources, etc, in
order to establish facts and reach new
conclusions.
b. An endeavour to discover new or
collate old facts etc by the scientific
study of a subject or by a course of
critical investigation.
TYPES OF RESEARCH

From the viewpoint of objectives, a research
can be classified as
 Descriptive
 Correlational
 Explanatory
 exploratory
Descriptive research
Descriptive research attempts to describe
systematically a situation, problem,
phenomenon, service or programme, or
provides information about living condition of a
community, or describes attitudes towards an
issue.
Correlational research
Correlational research attempts to discover or
establish the existence of a relationship/
interdependence between two or more aspects
of a situation.
Explanatory research
Explanatory research attempts to clarify why
and how there is a relationship between two or
more aspects of a situation or phenomenon.
Exploratory research
Exploratory research is undertaken to explore
an area where little is known or to investigate
the possibilities of undertaking a particular
research study (feasibility study/ pilot study).
.
 From
the point of view of application, there
are two broad categories of research:
 Pure
Research
 Applied
Research
Pure Research

It Involves developing and testing theories and
hypotheses that are intellectually challenging to the
researcher but may or may not have practical
application at the present time or in the future.

The knowledge produced through pure research is
sought in order to add to the existing body of
research methods.
Applied research

Applied research is done to solve specific, practical questions;
for policy formulation, administration and understanding of a
phenomenon. It can be exploratory, but is usually descriptive.

It is almost always done on the basis of basic research.
Applied research can be carried out by academic or industrial
institutions.

Often, an academic institution such as a university will have a
specific applied research program funded by an industrial
partner interested in that Program.
.
From the process adopted to find answer to
research questions (inquiry mode) – the two
approaches
are:

Structured approach
 Unstructured
approach
Structured Approach

The structured approach to inquiry is usually classified as
quantitative research.

Everything that forms the research process- objectives,
design, sample, and the questions that you plan to ask of
respondents- is predetermined.

It is more appropriate to determine the extent of a
problem, issue or phenomenon by quantifying the variation.
e.g. how many people have a particular problem? How
many people hold a particular attitude?
Unstructured Approach
The unstructured approach to inquiry is usually
classified as qualitative research.
This approach allows flexibility in all aspects of the
research process.
Steps in Research Process
1. Formulating the Research Problem
2. Extensive Literature Review
3. Developing the objectives
4. Preparing the Research Design including Sample Design
5. Collecting the Data
6. Analysis of Data
7. Generalisation and Interpretation
8. Preparation of the Report or Presentation of Results-Formal
write ups of conclusions reached.
Considerations in selecting a
research problem:


Interest: a research endeavour is usually time consuming,
and involves hard work and possibly unforeseen problems.
One should select topic of great interest to sustain the
required motivation
Magnitude: It is extremely important to select a topic that
you can manage within the time and resources at your
disposal. Narrow the topic down to something manageable,
specific and clear.
.



Measurement of concepts: Make sure that you are
clear about the indicators and measurement of concepts (if
used) in your study
Level of expertise: Make sure that you have adequate
level of expertise for the task you are proposing since you
need to do the work yourself
Relevance: Ensure that your study adds to the existing
body of knowledge, bridges current gaps and is useful in
policy formulation. This will help you to sustain interest in
the study.
.


Availability of data: Before finalizing the topic, make
sure that data are available.
Ethical issues: How ethical issues can affect the study
population and how ethical problems can be overcome
should be thoroughly examined at the problem formulating
stage.
Steps in formulation of a
research problem


Working through these steps presupposes a
reasonable level of knowledge in the broad
subject area within which the study is to be
undertaken.
Without such knowledge it is difficult to
clearly and adequately ‘dissect’ a subject
area.
.
Step 1: Identify a broad field or subject area of
interest to you.
 Step 2: Dissect the broad area into sub areas.
 Step 3: Select what is of most interest to you.
 Step 4: Raise research questions.
 Step 5: Formulate objectives.
 Step 6: Assess your objectives.
 Step 7: Double check.

SOURCES OF DATA
INTRODUCTION

There are two types of data

The primary and the secondary sources of data
PRIMARY DATA

Data collected by investigator for his own purpose,
for the first time, from beginning to end, are called
primary data.

In other words data originally collected in the
process of investigation are known as primary data.
Primary data are original.
.

Primary data has not been published yet and is
more reliable, authentic and objective.

Primary data has not been changed or altered
therefore its validity is greater than secondary data.
SOURCE OF PRIMARY DATA

Sources for primary data are limited and at times it
becomes difficult to obtain data from primary sources
because of either scarcity of population or lack of
cooperation.

Regardless of any difficulty one can face in collecting
data; it is the most authentic and reliable.

Sources of primary data includes :
EXPERIMENTS

Experiments require an artificial or natural setting in
which to perform logical study to collect data.

Experiments are more suitable for medicine,
psychological studies, nutrition and for other scientific
studies.

In experiment, the experimenter has to keep control over
the influence of any extraneous variables on the results.
SURVEY

Survey is a commonly used method in social
sciences.

Survey can be conducted in different method like
questionnaire, interview and observation.
Questionnaire

Is the most commonly used method in survey.

Questionnaire are a list of questions either openended or close-ended for which the respondent
gives answers.

Questionnaire can be conducted through telephone,
mail, live in a public area, or in an institution,
through electronic mail or through fax and other
methods
Interview
Interview is a face-to-face conversation with the
respondent.
 In interview the main problem arises when the
respondent deliberately hides information otherwise
it is an in depth source of information .
 The interviewer can not only record the statements
the interviewee speaks but he can observe the body
language, expressions and other reactions to the
questions too.
 This enables the interviewer to draw conclusions
easily

Observation

Observation can be done while letting the observing
person know that he is being observed or without
letting him know.

Observations can also be made in natural settings as
well as in artificially created environment.
SECONDARY DATA

Secondary data is the information which is already
inexistence, and which has been collected, for some
other purpose than the answering of the question at
hand.

In other words data collected by other persons is
called secondary data. The data are therefore, called
second hand data. These are available in published or
unpublished forms.
.

The review of literature in any research is based on
Secondary data.

Mostly from books, journals and periodicals.
BOOKS

Books are available today on any topic that you want
to research.

The use of books start before even you have selected
the topic.

Books are most reliable secondary source.
PUBLISHED SOURES
Journals/Periodicals
 Journals
and Periodicals are becoming more
important as far as data collection is concerned.

The reason is that journals provide up-to-date
information which at times books cannot and
secondly, journals can give information on the very
specific topic on which you are researching rather
talking about more general topics.
.
Magazines/Newspapers

Magazine are also effective but not very reliable.

Newspaper on the other hand are more reliable and
in some cases the information can only be obtained
from newspapers as in the case of some political
studies
.
Electronic Sources
Internet: Information that is not available in printed
form is available on internet in the form of
E-Journals
Websites
Blogs
OTHER SOURCES

Personal Records: Some unpublished data may also
be useful in some cases.

Diaries: Diaries are personal records and are rarely
available but if you are conducting a descriptive
research then they might be very useful.

Letters: like diaries are also a rich source but should
be checked for their reliability before using them.
EDITING DATA

Editing is the process of checking data for errors
such as omissions, illegibility and inconsistency, and
correcting data where and when the need arises

Example 1: A questionnaire meant to be answered
by adults over the age of 30 years has also been
answered by some persons under the age of 30 years

Example 2: A respondent gives her birthday as 1865
or claims to have a car insurance but says she
doesn‘t own a car
.
Basic Principles of Editing:
 Checking of the no. of Schedules /
Questionnaire)
 Completeness (Completed in filling of
questions)
 Legibility
 To avoid Inconstancies in answers
 To Maintain Degree of Uniformity
 To Eliminate Irrelevant Responses
Data Consistency and
Completeness

The data obtained from a questionnaire must be
logically consistent, especially when questions are
related

Sometimes inconsistency of data may not be readily
apparent. In this case, the data editor must judge
what action to take (example: Salary of the CEO of
a big corporation is given as USD 25,000 per
annum)
.

Circumstances permitting, the data editor may have
to insert data if answers to questions have been
omitted by the respondent, but which can be
answered on the basis of the other data obtained
example: respondent does not answer a question
asking if his organization has a website, but
somewhere later answers that the organization has
three websites
Non-Responses and Out-OfOrder Answers

Often, questions are left unanswered by
respondents (Item Non-Response). In such cases,
where data must be inserted, the data editor has
some options such as using a „plug value“ according
to some prespecified rule

Sometimes respondents give answers to (openended) questions in other questions. In such cases,
data has to be shifted around the questions
.
There are two types of Editing :
1. Field Editing
2. Central Editing
Field Editing
Field Editing is a form of data editing which is
undertaken by the field supervisor while the data
collection is in process with a view to finding
omissions, checking the legibility of handwriting, and
clarifying responses by respondents that are logically
or conceptually inconsistent
Precautions you must take
while using Secondary Data
The investigator should take precautions before using the
secondary data. In this connection, following precautions
should be taken into account.
1. Suitable Purpose of Investigation:
The investigator must ensure that the data are suitable for
the purpose of enquiry.
2. Inadequate Data:
Adequacy of the data is to be judged in the light of the
requirements of the survey as well as the geographical area
covered by the available data.

.3. Definition of Units:
The investigator must ensure that the definitions of units
which are used by him are the same as in the earlier
investigation.
4. Degree of Accuracy:
The investigator should keep in mind the degree accuracy
maintained by each investigator.
5. Time and Condition of Collection of Facts:
It should be ascertained before making use of available data
to which period and conditions, the data was collected.
.
6. Comparison:
Investigator should keep in mind whether the secondary
data is reasonable, consistent and comparable.
7. Test Checking:
The use of the secondary data must do test checking and see
that totals and rates have been correctly calculated.
8. Homogeneous Conditions:
It is not safe to take published statistics at their face value
without knowing their means, values and limitations.
Sampling
INTRODUCTION
Sampling indicates the selection of a part of a group
or an aggregate with a view of obtaining an
information about the whole.
This aggregate or the totality of all members is
known as Population.
The selected part, which is used to ascertain the
characteristics of the population is called Sample.
.
The total number of members of the population and the
number included in the sample are called Population Size
and Sample Size respectively.
Sampling methodology can be used by an auditor or an
accountant to estimate the value of total inventory in the
stores without actually inspecting all the items physically.
Opinion polls based on samples is used to forecast the result
of a forthcoming election.
Advantages of sampling over
Census
The census or complete enumeration consists in
collecting data from each and every unit from the
population.
Sampling has a number of advantages as compared
to complete enumeration due to a variety of reasons:
.


Less Expensive
The first obvious advantage of sampling is that it is less
expensive. If we want to study the consumer reaction
before launching a new product it will be much less
expensive to carry out a consumer survey based on a
sample rather than studying the entire population which is
the potential group of customers.
Less time Consuming
The smaller size of the sample enables us to collect the data
more quickly than to survey all the units of the population
even if we are willing to spend money. This is particularly
the case if the decision is time bound
.

Greater Accuracy
Complete enumeration may result in accuracies of the
data.
Consider an inspector who is visually inspecting the
quality of finishing of a certain machinery. After
observing a large number of such items he cannot just
distinguish items with defective finish from good one's.
Once such inspection fatigue develops the accuracy of
examining the population completely is considerably
decreased.
.

On the other hand, if a small number of items is
observed the basic data will be much more accurate.

Physically impossibility of Complete
Enumeration
In many situations the elements being studied get
destroyed while tested.
TYPES OF SAMPLING
There are two basic types of sampling depending on
whom or what is allowed to govern the selection of
the sample. We have:

Probability Sampling

Non- Probability Sampling
Classification of Sampling
Methods
Sampling
Methods
Probability
Samples
Systematic
Cluster
Nonprobability
Stratified
Simple
Random
Convenience
Judgment
Snowball
Quota
PROBABILITY SAMPLING

In probability sampling the decision whether a particular
element is included in the sample or not is governed by
chance alone.

All probability sampling designs ensure that each
element in the population has some non zero probability of
getting included in the sample.

This would mean defining a procedure for picking up the
sample based on chance.
.
In the category of probability sampling, we have:
 Simple
random sampling
 Stratified sampling
 Systematic sampling
 Cluster sampling
SIMPLE RANDOM SAMPLING
In simple random sampling the selected items are
drawn “at random” from the population. It ensures
that:
Each of the samples of size n has equal probability
of being picked up as the chosen sample
.
Each element of the population has an equal
probability of getting included in the sample
Simple random sampling is the most widely-used
probability sampling method because it is easy to
implement and easy to analyze.
.

It is imperative to have all members of the population
before a simple random sample can be picked up.

Such an exhaustive list of all population members is called a
sampling frame.
One way to obtain simple random sample would be the
lottery method. Each of the N population members is
assigned a unique number(or marked). The numbers are
placed in a bowl and thoroughly mixed. Then, a blindfolded researcher selects n numbers. Population members
having the selected numbers are included in the sample.
Random Sampling With and without
replacement

Suppose we use the lottery method described above to select
a simple random sample. After we pick a number from the
bowl, we can put the number aside or we can put it back into
the bowl. If we put the number back in the bowl, it may be
selected more than once; if we put it aside, it can be selected
only one time.

When a population element can be selected more than one
time, we are sampling with replacement. When a
population element can be selected only one time, we are
sampling without replacement.
Random Sampling Numbers
The random sampling numbers are collection of digits generated
through a probabilistic mechanism. The numbers have the
following properties:

The probability that each digits 0,1,2,3,4,5,6,7,8,9, will appear at
any particular place is the same, namely 1/10

The occurrence of any two digits in any places is independent of
each other.
When reading from random number tables you can begin
anywhere (choose a number at random) but having once started
you should continue to read across the line or down a column.
An extract from a table of random sampling numbers:
 3680 2231 8846 5418 0498 5245 7071 2597
.

If we were doing market research and wanted to sample two
houses from a street containing houses numbered 1 to 48 we
would read off the digits in pairs
36 80 22 31 88 46 54 18 04 98 52 45 70
71 25 97 and take the first two pairs that were less than 48,
which gives house numbers 36 and 22.

If we wanted to sample two houses from a much longer road with
140 houses in it we would need to read the digits off in groups of
three:
368 022 318 846 541 804 985 245 707 1 25 97
and the numbers underlined would be the ones to visit: 22 and
125.
STRATIFIED RANDOM SAMPLING
When heterogeneity is present in the population
with regards to subject matter under consideration, it
is often a good idea to divide the population into
groups (segments or strata).
Stratified random sampling consists of selecting a
certain number of sampling units from each stratum to
ensure representation from all relevant segments in
order to increase efficiency.
.

Example:
Designing a suitable marketing strategy for
consumer durable, the population of consumers may
be divided into strata by income levels and a certain
number of consumers can be randomly from each
stratum.

Landscapes -stratified by habitat characteristics

People -stratified by characteristics (such as sex,
occupation, etc.).
.

In stratified sampling, the population of N units is first
divided into L sub-population of N
, units,
1,N
2,...,N
L
respectively. These sub-groups are non-overlapping so that
they comprise the whole population such that
NN
... N
N
12
L
A sample size is selected independently from each of the
different strata. Then the collection of these samples
constitute a stratified sample thus nn
. If a
n
...
n
1
2
L
simple random sample selection scheme is used in each
stratum then the corresponding sample is called a stratified
random sample.
.
The stratification should be performed in such a way
that the strata are homogeneous with themselves with
respect to the characteristics under study .
On the other hand , strata should be heterogeneous
between themselves.
.
NOTATION
 The suffix h (h=1,2,...,L) denotes the stratum and i the unit within
the stratum.

Nh
:- Total number of population units in stratum h.

nh
:- Total number of sample units in stratum h.
NL

Xh 
X
i1
Nh
hi
:-the population mean of stratum h
.


N h :The h-th stratum weight.
Wh 
N
X hi :- Value of the characteristic for the i-th
unit in stratum h.
NL

X h   X hi :-The total observation in stratum h
i 1
.

NL
2
(
X

X
)

hi
h
:- the population variance of stratum h
2
i1
sh 
Nh1
Allocation of Sampling Size in Different Strata
In stratified sampling, the sample to different strata is
allocated on the basis of considerations.
 The total number of units in the stratum i.e stratum size
 The variability within the stratum
 The cost of taking observation per sampling in each stratum
Allocation of a Sample to Strata
Equal Allocation: If the strata are presumed to be of roughly
equal size, and there is no additional information regarding the
variability or distribution of the response in the strata, equal
allocation to the strata is probably the best choice:
n
nh 
L
Proportional Allocation: If the strata differ in size, allocation
of sample sizes to strata might be performed proportional to
these stratum sizes:
Nh 
nh  n
 N
.

Optimum Allocation: The allocation which minimizes the
variance of the estimator of the mean (and total)
Optimum allocation (with equal cost):
Optimum allocation(with unequal cost):
W
h/ C
h
h
n
nL
h
W


/
C



h
1
h h
h
SYSTEMATIC METHODS
It is commonly used and simple to apply; it consists of taking
every k-th sampling unit after a random start.
Example:
suppose you want to sample 8 houses from a street of 120
houses. 120/8=15, so every 15th house is chosen after a
random starting point between 1 and 15. If the random
starting point is 11, then the houses selected are 11, 26, 41,
56, 71, 86, 101, and 116.
.

If there were 125 houses, 125/8=15.625, so should you take
every 15th house or every 16th house? If you take every 16th
house, 8*16=128 so there is a risk that the last house chosen
does not exist. To overcome this the random starting point
should be between 1 and 13.

On the other hand if you take every 15th house, 8*15=120 so
the last five houses will never be selected. The random
starting point should now be between 1 and 20 to ensure that
every house has some chance of being selected.
.
Example
Select a sample of size 5 from a population of size 30 by
using systematic sampling
solution
We first compute k(sample interval)= N 306
n
5
The number from 1 to 30 are then written as follows:
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
A number is then selected at random from the first row i.e.
1 to 6. If the selected number is 4 (i.e. at random start), then
the sample will be 4, 10, 16,22 and 28.
CLUSTER SAMPLING
Cluster sampling may be used when it is either
impossible or impractical to compile an exhaustive list
of the elements that make up the target population.
Usually, however, the population elements are already
grouped into subpopulations and lists of those
subpopulations already exist or can be created.
.
Example
Let’s say the target population in a study was
KNUST students. If there is no list of all KNUST
students in the school. The researcher could, however,
create a list of departments in the school, choose a
sample of departments, and then obtain lists of
students from those departments.
MULTISTAGE SAMPLING
The multi-stage sampling procedure is used for
large scale enquiry covering large geographical area
such as a region.
.
An illustration:
A bank may like to gather information regarding the
quality of customer service it is offering in a region. A
random sample of districts is selected from the list of
districts.
From each of the selected districts a number of
branches are randomly selected. From each of the
selected branches a number of depositors which is the
ultimate sample sampling unit is selected randomly for
collecting information.
.
The districts are called first stage units
 The branches are known as the second stage units
 The depositors are regarded as the third stage
units.
This is an illustration of three stage sampling, the
third stage units being the ultimate sampling
units.

NON-PROBABILITY SAMPLING
Non probability sampling is the sampling procedure which
does not provide any basis for estimating the probability
that each item in the population possesses to be included in
the sample.

In such a case, the sampling error is not measurable and the
error in the estimator tends to increase sharply because the
representativeness of the sample members is questionable.
.

Nevertheless, non probability samples are useful in certain
situations. This is the case when the representativeness is
not particularly the primary issue.
In general, some types of non probability sampling
methods includes:
 Convenience
 Judgement
 Quota sampling
 Snowballing

CONVENIENCE SAMPLING

Under convenience sampling, the samples are
selected at the convenience of the researcher or
investigator.

We have no way of determining the
representativeness of the sample. This results into
biased estimates.
.

Therefore, it is not possible to make an estimate of sampling
error as the difference between sample estimate and
population parameter is unknown both in terms of
magnitude and direction.

It is therefore suggested that convenience sampling should
not be used in both descriptive and causal studies as it is not
possible to make any definitive statements about the results
from such a sample
.
Convenience sampling:

may be quite useful in exploratory designs as a basis for
generating hypotheses.

is also useful in testing of questionnaire etc. at the pretest
phase of the study.

is extensively used in marketing studies and otherwise.
JUDGEMENT SAMPLING

Judgement sampling is also called purposive sampling.

A researcher deliberately or purposively draws a sample from
the population which he thinks is a representative of the
population.

But all members of the population are not given chance to be
selected in the sample
.

The personal bias of the investigator has a great
chance of entering the sample .

If the investigator chooses a sample to give results
which favours his view point, the entire study may
be vitiated.
.

However, if personal biases are avoided, then the
relevant experience and the acquaintance of the
investigator with the population may help to choose
a relatively representative sample from the
population.

It is not possible to make an estimate of sampling
error as we cannot determine how precise our
sample estimates are.
.
ILLUSTRATION
Suppose we have a panel of experts to decide about the
launching of a new product in the next year. If for some
reason or the other, a member drops out from the panel,
the chairman of the panel, may suggest the name of
another person whom he thinks has the same expertise
and experience to be a member of the said panel.

This new member was chosen deliberately - a case of
Judgement sampling
QUOTA SAMPLING

This is a very commonly used sampling method in
marketing research studies.

The sample is selected on the basis of certain basic
parameters such as age, sex, income and occupation
that describe the nature of a population so as to
make it representative of the population.
.

The investigators or field workers are instructed to
choose a sample that conforms to these parameters.

The field workers are assigned quotas of the
numbers of units satisfying the required
characteristics on which data should be collected.

However, before collecting data on these units the
investigators are supposed to verify that the units
qualify these characteristics.
.

If in our population, 20% of the population is in
high income group, 35% in the middle income
group and 45% in the low income group.

Suppose we decided to select a sample of size
200 from the population.

Then, samples of size 40, 70 and 90 should come
from high income, middle income and low income
groups respectively
SNOWBALL SAMPLING
• The sampling procedure in which the initial
respondents are chosen by probability or nonprobability methods, and then additional
respondents are obtained by information provided
by the initial respondents
Determining Sample Size
Determining
Sample Size
For the
Mean
For the
Proportion
Sampling Error
• The required sample size can be found to reach a desired
margin of error (e) with a specified level of confidence (1 )
• The margin of error is also called sampling error.
– the amount of imprecision in the estimate of the
population parameter
– the amount added and subtracted to the point estimate
to form the confidence interval
Determining Sample Size
• For the Mean
X  Z /2
σ
n
• Thus: the Sampling error (margin of error)
e  Z /2
σ
n

Making n the subject
Z /2 σ
n
2
e
2
2
To determine the required sample size for the mean,
you must know:
• The desired level of confidence (1 - ), which
determines the critical value, Zα/2
• The acceptable sampling error, e
• The standard deviation, σ
Required Sample Size Example
• If  = 45, what sample size is needed to estimate
the mean within ± 5 with 90% confidence?
Z σ
(1.645) (45)
n 2 
 219.19
2
e
5
2
2
2
2
• So the required sample size is n = 220
• Note: Always round up
If σ is unknown
• If unknown, σ can be estimated when using
the required sample size formula
– Use a value for σ that is expected to be at
least as large as the true σ
– Select a pilot sample and estimate σ with
the sample standard deviation, S
Determining Sample Size
For the Proportion
The sampling error (margin of error)
p(1  p)
eZ
n
Making n the subject
Z p (1  p)
n
2
e
2
To determine the required sample size for the proportion, you
must know:
• The desired level of confidence (1 - ), which determines
the critical value, Zα/2
• The acceptable sampling error, e
• The true proportion of events of interest, p
• p can be estimated with a pilot sample if necessary (or
conservatively use 0.5 as an estimate of p)
Required Sample Size Example
How large a sample would be necessary to estimate
the true proportion defective in a large population
within ±3%, with 95% confidence?
(Assume a pilot sample yields p = 0.12)

Solution:
For 95% confidence, use Zα/2 = 1.96
e = 0.03
p = 0.12, so use this to estimate p
Z /22 p (1  p) (1.96)2 (0.12)(1  0.12)
n

 450.74
2
2
e
(0.03)
Thus n = 451
Download