Chapter 2 – Data Collection and Presentation In chapter one, we discussed briefly the importance of samples. When we select a sample from a population, the sample must be representative of the population. Let’s consider an example: Sampling Designs Methods by which a representative sample can be chosen from a population. Four sampling designs in common use: 1. Simple random sampling 2. Systematic sampling 3. Stratified sampling 4. Cluster sampling Sampling Designs Simple Random Sampling The example of putting all students’ names and thoroughly mixing these names before drawing each name represents a simple random sampling. Sampling Designs Systematic Sampling in this sampling design, every kth unit (or item) is selected from a population until the sample size is reached. K = (size of population) ------------------------(size of sample) Sampling Designs Stratified Sampling In this sampling, the entire population is divided in to several groups, called strata, and a subsample is selected from each group. All subsamples are then combined to form a sample. This sampling design is used when a population is not homogeneous. Sampling Designs Stratified sampling could be either proportionate or disproportionate, depending on the number of units selected from each group. Sampling Designs Cluster Sampling This sampling design involves selecting at random a few groups, called clusters, from a population, and then selecting units from each cluster. Cluster sampling is used when a population is large, fairly homogeneous and scattered over a large geographical area . Data Organization The process of selecting a sample from a population amounts to data collection. Once the data has been collected, it must be organized to make it meaningful. Unorganized data does not convey any meaningful information. Raw Data A set of unorganized data Data Organization requires 2 major steps: 1. Forming an array 2. Creating a frequency distribution table. Array and Frequency Distribution Array If a set of data is organized in either ascending or descending order, an array is formed. From the array, one can get some useful information, such as the lowest and the highest data value. Frequency Distribution Table that arranges data into several classes. All classes have: • A lower limit • An upper limit Two questions: 1. how many classes to select? 2. what are the class limits? Number of Classes Generally, the number of classes should be no fewer than six and no more than 20. A Simple formula could be used to find the total number of classes: THE TOTAL NUMBER OF CLASSES IS k SUCH THAT 2k IS AT LEAST EQUAL TO THE TOAL NUMBER OF OBSERVATIONS IN THE DATA SET Class Limits Once we know the number of classes, we can find the class limits (lower and upper limits) of the classes. •Certain guidelines should be followed: 1. If the data values are integer, the lower limit of the first class should be 0.5 less than the lowest data value. Class Limits The midpoint of the class should be an integer. •For other classes, follow the guideline below: 1. The lower limit is the same as the upper limit of the preceding class. 2. The interval length is the same for all classes. FREQUENCY DISTRIBUTION TABLE LOOK AT TABLE 2-2 ON PAGE 21 Relative Frequency Distribution • A frequency distribution can be converted into a relative frequency distribution. Look at table 2-3 on page 22. • The relative cumulative frequency column is obtained by adding cumulatively relative frequencies. Data Presentation • Data can be presented in several ways. Histogram Relative frequency histogram Polygon Ogive Data Presentation • Histogram A type of bar chart in which class limits are shown on x-axis and frequencies on Y-Axis. Figure 2-1. (page 25) • Relative Frequency Histogram If relative frequencies are shown on Y-Axis, a histogram is called a relative frequency histogram. See Figure 2-2 on page 25. Data Presentation • The Polygon If the mid-points of all classes of a histogram are connected together, a frequency polygon is formed. Figure 2-3 (page 26) is a frequency polygon. A relative frequency polygon is created from a relative frequency histogram by connecting the mid-points of the classes as in a histogram. See Figure 2-4 on page 26. The Ogive • On an ogive, the x-axis represents the upper limit of each class and the y-axis represents cumulative frequencies. The points are connected. The lower limit of the first class is the beginning point with zero frequency. Figures 2-5 on page 27 is an ogive. A relative cumulative frequency ogive can be formed by replacing cumulative frequencies of an ogive with relative cumulative frequencies. Look at Figure 2-6 on page 27. Other tools for data presentation are pie charts and bar charts shown on pages 28 and 29.