COLLEGE OF BUSINESS MANAGEMENT AND ACCOUNTANCY MODULE IN STATISTICAL ANALYSIS WITH SOFTWARE APPLICATION Chapter 2: Collection, Organization & Presentation of Data LEARNING OBJECTIVES: At the end of the chapter, the students must be able to: Learn some important points in collecting data. Learn the different methods of data collection. Organize and display data into a frequency distribution. Present ways to portray quantitative and qualitative variables graphically. Collection of Data Collecting data is a method of gathering data for a specific purpose. The following guidelines can be used to manage the collection of data practically and effectively. Important Points in Collecting Data 1. If measurements of some characteristic from people (such as height) are being obtained, better results will be achieved if the researcher does the measuring instead of asking the respondent for the value. 2. The method of data collection used may expedite or delay the process. Avoid a medium that would produce low response rates. 3. Ensure that the sample side is sufficiently large for the required purposes. The size of the sample will depend on the variability of the population data, the cost of sampling, and the margin of error. 4. If possible, the sampling method to be used to collect data should result in a sample that is representative of the population. Methods of Data Collection 1. Direct or Interview Method The direct or interview method of data collection uses at least two persons (an interviewer and interviewee/s) exchanging information. This method will give us precise and consistent information because clarifications can be made. Also, questions not fully understood by the respondent, the interviewer could repeat the question until it suits the interviewee’s level. However, the method is time consuming, expensive and has limited field coverage. 2. Indirect or Questionnaire Method This is a method where written answers are given to prepared questions. The method requires less time and is inexpensive since the questionnaires can simply be mailed or hand-carried. Also, this will give a respondent a sense of freedom and honesty in answering the questions because of anonymity. Online procedure thru various social media is now a common way of administering questionnaires. 3. Registration Method This is a method by certain laws, civil codes, and regulations of each country. It is a process which imposes registration of vital events such as births, marriages, deaths, etc. 4. Observation Method This is a method which involves personal observation of the behavior of individuals or organizations in the study. This is also used when the respondents cannot read nor write wherein interview or observation method is not possible. 5. Experiment Method This method is used when the objective of the study is to determine the cause and effect of certain phenomena or event. It should be made clear that causality can only be established with the use of experimental method. Under this method, randomly choses subjects are randomly assigned to a particular treatment under a particular experimental design adopted by the researcher. Frequency Distributions A frequency distribution is a table which summarizes an arranged data into various classes or categories. When the data are organized in this form, analysis of data and interpretation can easily be managed. Parts of a Frequency Distribution Table 1. Table Heading – includes the table number and the title of the table. 2. Body – main part that contains the information or figures. 3. Stubs or classes – classification or categories describing the data and usually found at the leftmost side of the table. 4. Caption – designations or identifications of the information contained in a column, usually found at the topmost of the column, Remark: In a quantitative frequency distribution table, the classes are class intervals which are composed of Lower Limit (LL) and Upper Limit (UL) Example: Table 2.1: Frequency Distribution of Staff Perception of the Leadership Behavior of the Administrator Perception of Leadership Behavior Strongly Favorable Favorable Slightly Favorable Slightly Unfavorable Unfavorable Strongly unfavorable TOTAL Frequency 10 11 12 14 22 31 100 Types of Frequency Distribution Table (FDT) 1. Qualitative and Categorical FDT – a frequency distribution table where the data are grouped according to some qualitative characteristics, data are grouped into nonnumerical categories. Example of a Qualitative FDT: Table 2.2: Frequency Distribution of the Gender of Respondents of a Survey Gender of Respondents Frequency Male 65 Female 98 TOTAL 163 2. Quantitative FDT – a frequency distribution table where the data are grouped according to some numerical or quantitative characteristics. Example of Quantitative FDT: Table 2.3: Frequency Distribution for the Weights of 50 Pieces of Luggage Weight (in kilogram) Frequency 7-9 2 10-12 8 13-15 14 16-18 19 19-21 7 TOTAL 50 Steps in Constructing a Frequency Distribution Table 1. Determine the Range (R). R = highest value − lowest value 2. Determine the number of classes (Κ) Where N is the total number of observations in the data sheet 3. Determine the class size (c) by calculating first the preliminary class size of c’. Preliminary class size c’: Remarks: a. It should have the same number of decimal places as in the raw date; i.e. if the observations in the data set are all whole numbers, then you c should be a whole number. b. The class size of an interval is the difference between the Upper Class Boundary (UCB) and the Lower Class Boundary (LCB) of that interval. 4. Enumerate the classes or categories. Remark: We usually make the lowest value as the lowest lower limit. 5. Tally the observations. 6. Compute for values in other columns of the FDT as deemed necessary. Note: Sometimes the number of classes (k) is not followed. An extra class will be added to accommodate the highest observed value in the data set and a class will be deleted if it turns out to be empty. Other Columns in the FDT 1. Class Boundaries (CB) a. Lower class boundary (LCB) LCB = LL – ½ unit of measure b. Upper class boundary (UCB) UCB = LL + ½ of measure 2. Class Mark (CM) – midpoint of the class interval where the observations tend to clutter about. 3. Relative Frequency (RF) If we are to express the frequencies in a frequency distribution as percentages, it is a relative frequency distribution. It is obtained by dividing the frequency for each class by the total frequencies. 4. Cumulative Frequency Distribution This indicates the number of scores that fall below and above the class limits of class intervals. Two Kinds of Cumulative Frequency Distribution to 1. Less Than Cumulative Frequency (<CF) – total number of observations where values do not exceed the upper limit of the class. 2. Greater Than Cumulative Frequency (> CF) – total number of observations whose values are less than the lower limit of the class. Illustrative Example: Given below are the raw data of the daily wages of 40 workers in Pangasinan. Raw Data 201 324 649 623 458 322 486 234 650 493 453 129 568 357 145 540 583 349 695 698 124 127 389 405 340 267 653 321 276 295 390 489 680 395 601 212 175 489 203 392 Notice that in here, the lowest value is 124. Hence, it was assigned as the lowest lower limit. Graphical Presentations of Frequency Distributions 1. Histogram. The classes are plotted on the horizontal axis and the frequencies on the vertical classes. The lines that separate the bars intersect the v-axis at the lower and upper limits of the class intervals. The height of the bar corresponds to the frequency of the class interval. Since the intervals are continuous, the lower limit of any one interval is also the upper limit of the previous interval and the vertical bars must touch each other rather than be spaced apart. Example: Consider the frequency distribution of the words per minute of 60 individuals using a word processor is given in the table below. 2. Frequency Polygon. It consists of the segments connecting the points formed by the intersections of the class midpoints and the class frequencies. The polygon is closed by considering an additional class of each end and the ends of the lines are brought down to the horizontal axis at the midpoints of the additional classes. Example: Frequency polygon of Table 2.5 3. Ogive. It is a graph where a point is plotted above each class boundary at a height equal to the cumulative frequency corresponding to that boundary. Example: Ogive of Table 2.5 4. Stem-and-leaf Plots Stem-and-leaf plot is a new way of displaying data. It gives a quick picture of the shape of a distribution while including the actual numerical values in the graph. A stem is the common leading digit/s for a subset of the data set. A leaf is the trailing digit/s that follow the stem. To make a stem-plot: 1. Separate each observation into a stem and a leaf. Stems may have as many digits as needed, but each leaf contains only a single digit. 2. Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. 3. Write each leaf in the row to the right of its stem, in increasing order out from the stem. Example: Consider the following grades of 20 students in a statistics subject: 75 86 82 77 78 91 82 84 93 79 80 83 94 85 86 76 92 88 93 85 In constructing the stem-and-leaf plot, the choose the leading digits or stem from the data. We have 7, 8 and 9 and the final digit or leaf of each number to right of the appropriate leading digit. Then, arrange the leaves in ascending order. Figure 2.1: Stem-and-leaf Plot for the Grades Data Frequency Stem & Leaf 5.00 7 - 56789 10.00 8 - 02234 55668 5.00 9 - 12334 Stem width: 10.00 Each Leaf: 1 case (s) Graphing Qualitative Variables 1. Column and Bar Graph. It consists of bars or heavy lines of equal widths, either all vertical or horizontal; the lengths of bars represent the magnitudes of the quantities being compared. (for nominal, ordinal and interval data) Example: The data of the causes of death due to accidents or violence for males during a recent year is as follows: Causes of Death Due to Accidents or Violence for Males during the Recent Year Cause of Death Number Motor vehicle accident 30,500 All other accident 27,500 Suicide 20,234 Homicide 8,342 Example: Bar graph of Table 2.6 2. Pie Chart. It is a circular graph that is useful in showing how a total quantity is distributed among a group of categories. It is constructed by dividing a circle (a pie) into sectors, each sector having a size proportional to the percentage it represents. (for nominal data) Example: Pie chart of Table 2.6