Chapter 2 – Data Collection and Presentation

advertisement
Chapter 2 – Data Collection and
Presentation
In chapter one, we discussed briefly the
importance of samples. When we
select a sample from a population, the
sample must be representative of the
population.
Let’s consider an example:
Sampling Designs
Methods by which a representative sample
can be chosen from a population.
Four sampling designs in common use:
1. Simple random sampling
2. Systematic sampling
3. Stratified sampling
4. Cluster sampling
Sampling Designs
Simple Random Sampling
The example of putting all students’
names and thoroughly mixing these
names before drawing each name
represents a simple random sampling.
Sampling Designs
Systematic Sampling
in this sampling design, every kth unit
(or item) is selected from a population
until the sample size is reached.
K
=
(size of population)
------------------------(size of sample)
Sampling Designs
Stratified Sampling
In this sampling, the entire population
is divided in to several groups, called
strata, and a subsample is selected
from each group. All subsamples are
then combined to form a sample. This
sampling design is used when a
population is not homogeneous.
Sampling Designs
Stratified sampling could be either
proportionate or disproportionate,
depending on the number of units
selected from each group.
Sampling Designs
Cluster Sampling
This sampling design involves selecting
at random a few groups, called clusters,
from a population, and then selecting
units from each cluster. Cluster
sampling is used when a population is
large, fairly homogeneous and
scattered over a large geographical
area
.
Data Organization
The process of selecting a sample from
a population amounts to data
collection. Once the data has been
collected, it must be organized to make
it meaningful. Unorganized data does
not convey any meaningful
information.
Raw Data
A set of unorganized data
Data Organization requires 2 major
steps:
1. Forming an array
2. Creating a frequency distribution
table.
Array and Frequency Distribution
Array
If a set of data is organized in either
ascending or descending order, an
array is formed.
From the array, one can get some
useful information, such as the lowest
and the highest data value.
Frequency Distribution
Table that arranges data into several
classes.
All classes have:
• A lower limit
• An upper limit
Two questions:
1. how many classes to select?
2. what are the class limits?
Number of Classes
Generally, the number of classes
should be no fewer than six and no
more than 20.
A Simple formula could be used to find
the total number of classes:
THE TOTAL NUMBER OF CLASSES IS k SUCH THAT
2k IS AT LEAST EQUAL TO THE TOAL NUMBER OF
OBSERVATIONS IN THE DATA SET
Class Limits
Once we know the number of classes, we
can find the class limits (lower and upper
limits) of the classes.
•Certain guidelines should be followed:
1. If the data values are integer, the lower
limit of the first class should be 0.5 less than
the lowest data value.
Class Limits
The midpoint of the class should be an
integer.
•For other classes, follow the guideline
below:
1. The lower limit is the same as the
upper limit of the preceding class.
2. The interval length is the same for all
classes.
FREQUENCY DISTRIBUTION TABLE
LOOK AT TABLE 2-2 ON PAGE 21
Relative Frequency Distribution
• A frequency distribution can be converted
into a relative frequency distribution. Look
at table 2-3 on page 22.
• The relative cumulative frequency column
is obtained by adding cumulatively relative
frequencies.
Data Presentation
• Data can be presented in several ways.
Histogram
Relative frequency histogram
Polygon
Ogive
Data Presentation
• Histogram
A type of bar chart in which class limits are
shown on x-axis and frequencies on Y-Axis.
Figure 2-1. (page 25)
• Relative Frequency Histogram
If relative frequencies are shown on Y-Axis,
a histogram is called a relative frequency
histogram. See Figure 2-2 on page 25.
Data Presentation
• The Polygon
If the mid-points of all classes of a
histogram are connected together, a
frequency polygon is formed. Figure 2-3
(page 26) is a frequency polygon.
A relative frequency polygon is created
from a relative frequency histogram by
connecting the mid-points of the classes as
in a histogram. See Figure 2-4 on page 26.
The Ogive
• On an ogive, the x-axis represents the upper limit
of each class and the y-axis represents cumulative
frequencies. The points are connected. The lower
limit of the first class is the beginning point with
zero frequency. Figures 2-5 on page 27 is an
ogive.
A relative cumulative frequency ogive can be
formed by replacing cumulative frequencies of an
ogive with relative cumulative frequencies. Look
at Figure 2-6 on page 27.
Other tools for data presentation are pie charts and
bar charts shown on pages 28 and 29.
Download