Professor Vipin 2015 Classification and Tabulation of Data

advertisement
Professor Vipin 2015
Classification and Tabulation of Data
Collection of Data – Census and Sampling
A census is a study of every unit, everyone or everything, in a population. It is known as a complete
enumeration, which means a complete count.
A sample is a subset of units in a population, selected to represent all units in a population of
interest. It is a partial enumeration because it is a count from part of the population.
Information from the sampled units is used to estimate the characteristics for the entire population
of interest.
Sampling Techniques
A sample must be robust in its design and large enough to provide a reliable representation of the
whole population. Aspects to be considered when designing a sample include the level of accuracy
required, cost, and the timing. Sampling can be random or non-random.
1. In a random (or probability) sample each unit in the population has a chance of being
selected, and this probability can be accurately determined.
2. Probability or random sampling includes, but is not limited to, simple random sampling,
systematic sampling, and stratified sampling. Random sampling makes it possible to produce
population estimates from the data obtained from the units included in the sample.
3. Simple random sample: All members of the sample are chosen at random and have the
same chance of being in the sample. A lottery draw is a good example of simple random
sampling where the numbers are randomly generated from a defined range of numbers (i.e.
1 through to 45) with each number having an equal chance of being selected.
4. Systematic random sample: The first member of the sample is chosen at random then the
other members of the sample are taken at intervals (i.e. every 4th unit).
5. Stratified random sample: Relevant subgroups from within the population are identified and
random samples are selected from within each strata.
6. In a non-random (or non-probability) sample some units of the population have no chance
of selection, the selection is non-random, or the probability of their selection can not be
determined.
7. In this method the sampling error cannot be estimated, making it difficult to infer population
estimates from the sample. Non-random sampling includes convenience sampling, purposive
sampling, quota sampling, and volunteer sampling
8. Convenience sampling: Units are chosen based on their ease of access;
9. Purposive sampling: The sample is chosen based on what the researcher thinks is
appropriate for the study;
10. Quota sampling: The researcher can select units as they choose, as long as they reach a
defined quota; and
www.VipinMKS.com
Page 1
Professor Vipin 2015
11. Volunteer sampling: participants volunteer to be a part of the survey (a common method
used for internet based opinion surveys where there is no control over how many or who
votes).
Classification of Data
1. Quantitative data are those that can be quantified in definite units of measurement. These
refer to characteristics whose successive measurements yield quantifiable observations.
Depending on the nature of the variable observed for measurement, quantitative data can
be further categorized as continuous and discrete data.
a) Continuous data represent the numerical values of a continuous variable. A
continuous variable is the one that can assume any value between any two points
on a line segment, thus representing an interval of values. The values are quite
precise and close to each other, yet distinguishably different. All characteristics such
as weight, length, height, thickness, velocity, temperature, tensile strength, etc.,
represent continuous variables. Thus, the data recorded on these and similar other
characteristics are called continuous data. It may be noted that a continuous
variable assumes the finest unit of measurement. Finest in the sense that it enables
measurements to the maximum degree of precision.
b) Discrete data are the values assumed by a discrete variable. A discrete variable is the
one whose outcomes are measured in fixed numbers. Such data are essentially
count data. These are derived from a process of counting, such as the number of
items possessing or not possessing a certain characteristic. The number of
customers visiting a departmental store every day, the incoming flights at an airport,
and the defective items in a consignment received for sale, are all examples of
discrete data.
2. Qualitative data refer to qualitative characteristics of a subject or an object. A characteristic
is qualitative in nature when its observations are defined and noted in terms of the presence
or absence of a certain attribute in discrete numbers. These data are further classified as
nominal and rank data.
a) Nominal data are the outcome of classification into two or more categories of items
or units comprising a sample or a population according to some quality
characteristic. Given any such basis of classification, it is always possible to assign
each item to a particular class and make a summation of items belonging to each
class. The count data so obtained are called nominal data.
b) Rank data, on the other hand, are the result of assigning ranks to specify order in
terms of the integers 1,2,3, ..., n. Ranks may be assigned according to the level of
performance in test, a contest, a competition, an interview, or a show. The
candidates appearing in an interview, for example, may be assigned ranks in
integers, depending on their performance in the interview. Ranks so assigned can be
viewed as the continuous values of a variable involving performance as the quality
characteristic.
www.VipinMKS.com
Page 2
Professor Vipin 2015
3. Data sources could be seen as of two types, viz., secondary and primary.
a) Secondary data: They already exist in some form: published or unpublished in an
identifiable secondary source. They are, generally, available from published
source(s), though not necessarily in the form actually required.
b) Primary data: Those data which do not already exist in any form, and thus have to be
collected for the first time from the primary source(s). By their very nature, these
data require fresh and first-time collection covering the whole population or a
sample drawn from it.
Preparing a Frequency Distribution Table
Frequency is how often something occurs. By counting frequencies we can make a Frequency
Distribution table.
(1) Find the range of the data: The range is the difference between the largest and the smallest
values.
(2) Decide the approximate number of classes: Which the data are to be grouped. There are no
hard and first rules for number of classes. Most of the cases we have 5 to 20 classes.
(3) Determine the approximate class interval size: The size of class interval is obtained by
dividing the range of data by number of classes and denoted by class interval size. In case of
fractional results, the next higher whole number is taken as the size of the class interval.
(4) Decide the starting point: The lower class limits or class boundary should cover the smallest
value in the raw data. It is a multiple of class interval.
(5) Determine the remaining class limits (boundary): When the lowest class boundary of the
lowest class has been decided, then by adding the class interval size to the lower class
boundary, compute the upper class boundary. The remaining lower and upper class limits
may be determined by adding the class interval size repeatedly till the largest value of the
data is observed in the class.
(6) Distribute the data into respective classes: All the observations are marked into respective
classes by using Tally Bars (Tally Marks) methods which is suitable for tabulating the
observations into respective classes. The number of tally bars is counted to get the
frequency against each class. The frequency of all the classes is noted to get grouped data or
frequency distribution of the data. The total of the frequency columns must be equal to the
number of observations.
Tabulation of Data
The process of placing classified data into tabular form is known as tabulation. A table is a symmetric
arrangement of statistical data in rows and columns. Rows are horizontal arrangements whereas
columns are vertical arrangements. It may be simple, double or complex depending upon the type of
classification.
www.VipinMKS.com
Page 3
Professor Vipin 2015
(1) Simple Tabulation or One-way Tabulation: When the data are tabulated to one characteristic, it is
said to be simple tabulation or one-way tabulation. For Example: Tabulation of data on population of
world classified by one characteristic like Religion is example of simple tabulation.
(2) Double Tabulation or Two-way Tabulation: When the data are tabulated according to two
characteristics at a time. It is said to be double tabulation or two-way tabulation. For Example:
Tabulation of data on population of world classified by two characteristics like Religion and Sex is
example of double tabulation.
(3) Complex Tabulation: When the data are tabulated according to many characteristics, it is said to
be complex tabulation. For Example: Tabulation of data on population of world classified by two
characteristics like Religion, Sex and Literacy etc…is example of complex tabulation.
www.VipinMKS.com
Page 4
Download