Uploaded by sirayez2013

Method of data Collection final

advertisement
Method of data Collection,
processing and Analysis
The concept of Sampling
A sample is “a smaller (but hopefully representative)
collection of units from a population used to determine
truths about that population”
Why sample?
Resources (time, money) and workload
Gives results with known accuracy that can be
calculated mathematically
The sampling frame is the list from which the potential
respondents are drawn
Registrar’s office
Class rosters
HR records
SAMPLING……
• What is your population of interest?
• To whom do you want to generalize your results?
• All doctors
• School children
• Ethiopians
• Women aged 15-45 years
• Can you sample the entire population?
3
SAMPLING…….
• Factors that influence sample representativeness
• Sampling procedure
• Sample size
• Participation (response)
• When might you sample the entire population?
• When your population is very small
• When you have extensive resources
• When you don’t expect a very high response
• This is called census
4
SAMPLING BREAKDOWN
5
SAMPLING…….
STUDY POPULATION
SAMPLE
TARGET POPULATION
6
Types of Sampling
• Probability (Random) Samples
• Simple random sample
• Systematic random sample
• Stratified random sample
• Cluster sample
• Non-Probability Samples
• Convenience sample
• Purposive sample
• Quota
7
The Sampling Process
• The sampling process comprises several stages:
• Defining the population of concern
• Specifying a sampling frame, a set of items or events possible to
measure
• Specifying a sampling method for selecting items or events from
the frame
• Determining the sample size
• Implementing the sampling plan
• Sampling and data collecting
• Reviewing the sampling process
8
Population definition
• A population can be defined as including all
people or items with the characteristic one
wishes to understand.
• Because there is very rarely enough time or
money to gather information from everyone
or everything in a population, the goal
becomes finding a representative sample (or
subset) of that population.
9
SAMPLING FRAME
sampling frame consists of a
list of items from which the
sample is to be drawn.
10
Types of Sampling
PROBABILITY SAMPLING
• A probability sampling scheme is one in which every
unit in the population has a chance (greater than zero)
of being selected in the sample, and this probability
can be accurately determined.
• When every element in the population does have the
same probability of selection, this is known as an
'equal probability of selection design.
11
PROBABILITY SAMPLING…….
•Probability sampling includes:
• Simple Random Sampling,
• Systematic Sampling,
• Stratified Random Sampling,
• Cluster Sampling
12
NON PROBABILITY SAMPLING
• Any sampling method where some elements of
population have no chance of selection (these are
sometimes referred to as 'out of
coverage'/'undercovered’).
• where the probability of selection can't be accurately
determined.
• It involves the selection of elements based on
assumptions regarding the population of interest,
which forms the criteria for selection.
13
NONPROBABILITY SAMPLING…….
• Accidental Sampling
• Quota Sampling and
• Purposive Sampling.
14
SIMPLE RANDOM SAMPLING
• Applicable when population is small, homogeneous & readily
available
• All subsets of the frame are given an equal probability. Each
element of the frame thus has an equal probability of selection.
• It provides for greatest number of possible samples. This is
done by assigning a number to each unit in the sampling
frame.
• A table of random number or lottery system is used to
determine which units are to be selected.
15
SIMPLE RANDOM SAMPLING……..
• Disadvantages
• If sampling frame large, this method impracticable.
• Minority subgroups of interest in population may
not be present in sample in sufficient numbers for
study.
16
SYSTEMATIC SAMPLING
• Systematic
• Relies on arranging the target population according
to some ordering scheme and then selecting elements
at regular intervals through that ordered list.
• Systematic sampling involves a random start and
then proceeds with the selection of every kth element
from then onwards. In this case, k=(population
size/sample size).
17
SYSTEMATIC SAMPLING……
18
SYSTEMATIC SAMPLING……
• ADVANTAGES:
• Sample easy to select
• Suitable sampling frame can be identified easily
• Sample evenly spread over entire reference population
• DISADVANTAGES:
• Sample may be biased if hidden periodicity in population coincides
with that of selection.
19
STRATIFIED SAMPLING

Where population embraces a number of distinct categories,
the frame can be organized into separate "strata." Each stratum
is then sampled as an independent sub-population, out of
which individual elements can be randomly selected.
• Every unit in a stratum has same chance of being selected.
• Using same sampling fraction for all strata ensures
proportionate representation in the sample.
• Adequate representation of minority subgroups of interest can
be ensured by stratification & varying sampling fraction
between strata as required.
20
STRATIFIED SAMPLING……
• Finally, since each stratum is treated as an independent
population, different sampling approaches can be applied to
different strata.
• Drawbacks to using stratified sampling.
• Sampling frame of entire population has to be prepared separately
for each stratum
• In some cases stratified sampling can potentially require a larger
sample than would other methods
21
CLUSTER SAMPLING
• Cluster sampling is an example of 'two-stage sampling' .
• First stage a sample of areas is chosen;
• Second stage a sample of respondents within those areas is
selected.
• Population divided into clusters of homogeneous units, usually
based on geographical contiguity.
• Sampling units are groups rather than individuals.
• A sample of such clusters is then selected.
• All units from the selected clusters are studied.
23
CLUSTER SAMPLING…….
• Advantages :
• Cuts down on the cost of preparing a sampling frame.
• This can reduce travel and other administrative costs.
• Disadvantages: sampling error is higher for a simple random
sample of same size.
24
MULTISTAGE SAMPLING
• Complex form of cluster sampling in which two or more levels of
units are embedded one in the other.
• First stage, random number of districts chosen in all
states.
• Followed by random number of villages.
• Then third stage units will be houses.
• All ultimate units (houses, for instance) selected at last step are
surveyed.
27
QUOTA SAMPLING
 judgment is used to select subjects or units from each segment
based on a specified proportion.
For example, an interviewer may be told to sample 200 females
and 300 males between the age of 45 and 60.
 In quota sampling the selection of the sample is non-random.
For example interviewers might be tempted to interview those who
look most helpful.
The problem is that these samples may be biased because not
everyone gets a chance of selection.
28
CONVENIENCE SAMPLING
• Sometimes known as grab or opportunity sampling or accidental or haphazard sampling.
• A type of nonprobability sampling which involves the sample being drawn from that part of
the population which is close to hand. That is, readily available and convenient.
• The researcher using such a sample cannot scientifically make generalizations about the total
population from this sample because it would not be representative enough.
• For example, if the interviewer was to conduct a survey at a shopping center early in the
morning on a given day, the people that he/she could interview would be limited to those given
there at that given time, which would not represent the views of other members of society in
such an area, if the survey was to be conducted at different times of day and several times per
week. Tracer 2011 proposal edited.doc
• In social science research, snowball sampling is a similar technique, where existing study
subjects are used to recruit more subjects into the sample.
29
Method of data Collection,
processing and Analysis
TYPES OF DATA
• The primary data
• are those which are collected afresh and for the first time, and thus happen to
be original in character.
• The secondary data
• on the other hand, are those which have already been collected by someone
else and which have already been passed through the statistical process.
Collection of Primary Data
• There are several methods of collecting primary
data, particularly in surveys and descriptive
researches.
• Observation method
• Interview method
• Questionnaires
• Schedules
Measurement and Scaling
• Measurement is the process of describing some property of a
phenomenon under study and assigning a numerical value to it
• For example, in case we are to find the male to female attendance
ratio while conducting a study of persons who attend some show,
then we may tabulate those who come to the show according to
sex.
• Measurement is considered as the foundation of scientific inquiry.
• In our daily life, many things are measured continuously in
ways for different purposes.
Measurement and Scaling
• The most widely used classification of measurement
scales are:
• nominal scale
• ordinal scale
• interval scale; and
• ratio scale.
Nominal Scale
• Nominal scale is simply a system of assigning number symbols to
events in order to label them.
• These numbers have no quantitative values; they only represent the category.
• So we cannot apply any arithmetic operations in this type of sale.
• We can only count the number of items in each category.
• frequency distribution table for representing this nominal data.
Nominal
• Nominal scale is the least powerful level of measurement.
• It indicates no order or distance relationship and has no arithmetic
origin.
• A nominal scale simply describes differences between things by
assigning them to categories.
• Nominal data are, thus, counted data.
• In spite of all this, nominal scales are still very useful and are widely
used in surveys research when data are being classified by major subgroups of the population.
Nominal Scale
A. Specify your gender
A. Male
B. Female
B. Are you Married?
A. Yes
B. No
C. You are from
A. Urban
B. Rural
D. Specify your working department
A.
B.
C.
D.
E.
F.
Marketing
HR
Finance
Sales
Production
Operations
E. Specify your food habit
A. Vegetarian
B. No-Vegetarian
• Here we can assign number to each option like 1 to Male and 2 to female, and 1 to Yes, and 2 to No, 1 to Urban, 2 to Rural, 1 to
Marketing, 2 to HR, 3 to Finance etc.
Ordinal Scale
• we measure according to the rank order of the data without considering the
degree of difference between the data.
• Here the “Ordinal” is the indication of “Order”.
• In ordinal measurement, we assign a numerical value to the variables based
on their relative ranking or positioning in comparison with other data in that
group.
• An ordinal scale is indicating the logical hierarchy among variables under
observation.
Example: Ordinal measurement
• Suppose in a 100-meter race Tirunesh first, Meseret finished
second, Hiwot finished 3 and Ayalnesh finished fourth.
• Here we explain the data in ranking scale. We arrange the
data according to the relative position of the data set.
• we not consider the magnitude of difference between
Tirunesh and Meseret, Meseret and Hiwot, Hiwot and
Ayalnesh.
• Here we do not consider this magnitude of difference, but
only the order of the finishing position.
Interval Scales
• Interval scale
• whereas the nominal scale
allows us only to qualitatively
distinguish
groups
by
categorizing them, and the
ordinal scale to rank-order the
preferences, the interval scale
lets us measure the distance
between any two points on the
scale.
Ratio Scales
• Ratio scale represents the actual amounts of variables.
• Measures of physical dimensions such as weight, height,
distance
• E.G: What is your age?
• Generally, all statistical techniques are usable with ratio scales
and all manipulations that one can carry out with real
numbers can also be carried out with ratio scale values.
• Multiplication and division can be used with this scale
but not with other scales mentioned above.
• Thus, proceeding from the nominal scale (the least
precise type of scale) to ratio scale (the most precise),
relevant information is obtained increasingly.
Ratio Scale
The differences between scales
Processing and Analysis of Data
• The data, after collection, has to be processed and analysed in
accordance with the outline laid down for the purpose at the time of
developing the research plan.
• Technically speaking, processing implies:
•
•
•
•
editing,
coding,
classification and
tabulation of collected data so that they are amenable to analysis.
• The term analysis refers to the computation of certain measures along
with searching for patterns of relationship that exist among datagroups.
Cont’d
• Thus, “in the process of analysis, relationships or
differences supporting or conflicting with original or
new hypotheses should be subjected to statistical tests
of significance to determine with what validity data
can be said to indicate any conclusions”
Cont’d
• Editing:
• Editing of data is a process of examining the collected raw
data (specially in surveys) to detect errors and omissions and
to correct these when possible. RM\Tracer 2011 proposal
edited.doc
• As a matter of fact, editing involves a careful scrutiny of the
completed questionnaires.
• Editing is done to assure that the data are accurate, consistent
with other facts gathered, uniformly entered, as completed as
possible and have been well arranged to facilitate coding and
tabulation.
Cont’d
• Coding
• Coding refers to the process of assigning
numerals or other symbols to answers so that
responses can be put into a limited number of
categories or classes.
• E.g Male = 1
Female =2
Married =1
Unmarried =2
Cont’d
• Classification
•
Most research studies result in a large volume of raw
data which must be reduced into homogeneous groups
if we are to get meaningful relationships.
• This fact necessitates classification of data which
happens to be the process of arranging data in groups or
classes on the basis of common characteristics.
• Data having a common characteristic are placed in one
class.
Cont’d
• (a) Classification according to attributes
• Data are classified on the basis of common characteristics
which can either be descriptive (such as literacy, sex, honesty,
etc.).
RM\CrystalViewer (4).pdf
• Descriptive characteristics refer to qualitative phenomenon
which cannot be measured quantitatively; only their presence
or absence in an individual item can be noticed.
• Data obtained this way on the basis of certain attributes are
known as statistics of attributes and their classification is said
to be classification according to attributes
Cont’d
• B. Classification according to class-intervals
• Unlike descriptive characteristics, the numerical characteristics refer to
quantitative phenomenon which can be measured through some statistical
units.
• Data relating to income, production, age, weight, etc. come under this
category.
• Such data are known as statistics of variables and are classified on the basis of
class intervals.
• E.g age <17, 18-25, 26-45, 46-60
• Income ETB 2000-5000, 6000-10000
• In this way the entire data may be divided into a number of groups or classes
or what are usually called, ‘class-intervals.’
• Each group of class-interval, thus, has an upper limit as well as a lower limit
which are known as class limits.
Cont’d
• Tabulation
• When a mass of data has been assembled, it becomes necessary for the
researcher to arrange the same in some kind of concise and logical order.
• Tabulation is the process of summarising raw data and displaying the
same in compact form (i.e., in the form of statistical tables) for further
analysis. RM\EJ1235718.pdf
• In a broader sense, tabulation is an orderly arrangement of data in
columns and rows.
• Tabulation is essential because of the following reasons.
• 1. It conserves space and reduces explanatory and descriptive statement to a
minimum.
• 2. It facilitates the process of comparison.
• 3. It facilitates the summation of items and the detection of errors and omissions.
• 4. It provides a basis for various statistical computations.
• Tabulation can be done by hand or by mechanical or electronic devices
Download