STATISTICAL RESEARCH AND TRAINING CENTER J and S Building, 104 Kalayaan Avenue, Diliman, Quezon City Training Course on Basic Statistics for Research August 24-28, 2009 Methods of Organizing Data Prepared by: Josefina V. Almeda Professor and College Secretary School of Statistics University of the Philippines, Diliman August 2009 2 Quantitative Classification of Data * use quantitative classification if the observed values of the data are either a result of count or measurement * organize this type of data in tabular form in the form of a frequency distribution table. Frequency distribution is a summarized table wherein the classes are either distinct values or intervals with a frequency count. Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 3 Forms of the Frequency Distribution Single value grouping * is a frequency count of observed values wherein classes are distinct values * range of values is short and with many unique values occurring more than once Grouping by class intervals * is a frequency count of observed values wherein the classes are intervals. Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 4 Data for Single Value Grouping Suppose we have data on the number of children of 50 currently married women using any modern contraceptive method. Construct a summary table for the data set below. 0 0 1 2 2 2 3 3 4 4 0 0 1 2 2 3 3 3 4 4 0 1 1 2 2 3 3 3 4 4 0 1 1 2 2 3 3 3 4 5 0 1 1 2 2 3 3 3 4 5 Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 5 Example of Single Value Grouping Distribution of Currently Married Women Using Any Modern Method of Contraceptive by Number of Children: No. of Frequency of Children Married Women % 0 7 14 1 8 16 2 11 22 3 14 28 4 8 16 5 2 4 TOTAL 50 100 Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 Definition of Terms Used in a Frequency Distribution Table 6 Class interval contains the numbers defining a class. Class frequency is the number of observations falling under a class interval. Class limits are the end numbers of a class interval. * The lower class limit (LCL) is the lower end of the class interval and the upper class limit (UCL) is the upper end of the class interval. * The number of digits of the class limits should be the same as the number of digits of the raw data. Open class interval is a class interval with either no lower class limit or upper class limit. Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 7 Class boundaries are the true class limits. * There are no gaps in the class boundaries. * The number of decimal places is one more than the number of decimal place of the class limits. * The lower class boundary (LCB) is average of the lower class limit of the class interval and the upper class limit of the preceding class interval. * The upper class boundary (UCB) is the average of the upper class limit of the class interval and the lower class limit of the next class interval. Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 8 Class size is the size of the class interval. * It is the difference between two successive lower class limits, or two successive upper class limits, or two successive lower class boundaries, or two successive upper class boundaries. Class mark is the midpoint of a class interval. * It is the average of the lower class limit and the upper class limit or the average of the lower class boundary and upper class boundary of a class interval. Modal class is the class interval having the highest frequency. Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 Steps in Constructing a Frequency Distribution Table 9 1. Determine an adequate number of classes (K). * The number of classes should not be too many or not too few. * Usually, the number of classes is between 5 and 20. * The class intervals should be non-overlapping. 2. Determine the range (R). Range = Maximum – Minimum 3. Calculate the approximate class size (C’). C’ = R/K 4. Determine the class size (C ) by rounding off C’ to a number that is easy to work with. We recommend class sizes of multiples of 5, 10, 15, 20, etc. Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 10 5. List the required number (K) of class intervals. * Start with the lower class limit of the lowest class interval. * Its value should be less or equal to the minimum value of the data set. * Add the class size (C) to the lower class limit to get the next lower class limit. * The last class interval should include the maximum value. 6. Tally the frequency for each class interval. 7. Sum the frequency column and check against the total number of observations. Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 TABLE 3. Magnitude of Poor Population in the Philippines: 2000 NCR 848,962 Region 2 820,786 (National 1st District 120,663 (Cagayan Batanes Capital 2nd District 229,301 Valley) Cagayan Region)1 3rd District 292,611 Isabela 4th District 206,387 536,169 CAR (Cordillera Abra Administrative Region) 2,535 Region 4a 1,699,333 Batangas 440,603 251,222 Cavite 244,712 424,580 Laguna 207,184 Nueva Vizcaya 82,895 Quezon 667,385 Quirino 59,555 Rizal 139,449 110,937 Region 3 1,695,227 Apayao 28,770 (Central Aurora 59,985 Benguet 122,762 Luzon) Bataan 68,659 Ifugao 113,719 Bulacan 147,812 (CALABARZON) 11 Region 4b (MIMAROPA) 1,030,987 Marinduque 113,553 Occidental Mindoro 177,823 Oriental Mindoro 340,690 Kalinga 83,844 Nueva Ecija 532,961 Region 5 Mt. Province 76,137 Pampanga 331,739 (Bicol Albay 553,629 Tarlac 360,109 Region) Camarines Norte 301,147 Zambales 193,962 Camarines Sur 765,373 Region 1 1,447,638 2,540,618 (Ilocos Ilocos Norte 115,116 Region) Ilocos Sur 190,297 Catanduanes 116,866 La Union 253,382 Masbate 483,651 Pangasinan 888,844 Sorsogon 319,952 Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 12 Region 6 2,765,055 Region 8 1,646,371 Region 10 1,580,249 (Western Aklan 186,813 (Eastern Biliran 58,135 (Northern Bukidnon 449,647 Visayas) Antique 208,169 Visayas) Eastern Samar 202,680 Mindanao) Camiguin 41,017 Capiz 328,635 Leyte 680,536 Lanao Del Norte 424,819 37,838 Northern Samar 240,228 Misamis Occidental 260,764 690,639 Southern Samar 116,738 Misamis Oriental 404,002 Western Samar 348,054 Guimaras Iloilo Negros Occidental Region 7 1,312,961 2,017,162 Region 9 (Davao Davao del Norte 637,298 Region) Daval del Sur 412,442 Davao Oriental 172,627 Bohol 590,926 (Zamboanga Zamboanga del Norte 433,091 Visayas) Cebu 973,490 Peninsula) Zamboanga del Sur 821,793 Negros Oriental 427,509 Zamboanga Sibugay Siquijor 25,237 Isabela City3 Statistical Research and Training Center 1,222,367 1,254,884 (Central 2 Region 11 Compostela Valley 4 Training Course on Basic Statistics for Research August 24 - 28, 2009 13 Region 12 (SOCCSKSAR GEN) 1,596,785 Region 13 1,071,005 ARMM 1,648,441 Agusan del Norte 259,475 (Autonomous Region Basilan 123,825 223,279 Agusan del Sur 353,825 in Muslim Mindanao Lanao del Sur 432,307 South Cotabato 469,874 Surigao del Norte 232,065 Maguindanao 534,628 Sultan Kudarat 344,172 Sulu 397,119 Tawi-tawi 160,562 North Cotabato 509,463 Saranggani Cotabato City 49,997 (Caraga) 1 Districts of NCR cover the following: 1st District – Manila; end District – Mandaluyong, Marikina, Pasig, Quezon City and San Juan; 3rd District Valenzuela, Kaloocan City, Malabon and Navotas; and 4th District – Las Pinas, Makati, Muntinlupa, Paranaque, Pasay City, Pateros, and Taguig. 2 Zamboanga Sibugay was part of Zamboanga del Sur in 2000. Thus, 2000 estimates of Zamboanga del Sur includes Zamboanga Sibugay 3 Isabela City was part of Basilan in 2000. Thus, 2000 estimates of Basilan still includes Isabela City. 4 Davao del Norte estimates for 2000 include Compostela Valley. Source: National Statistical Coordination Board Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 14 TABLE 4. Sorted Data (Array) of Magnitude of Poor Population for the 82 provinces of the Philippines: 2000 2,535 76,137 122,762 193,962 240,228 331,739 424,819 534,628 973,490 25,237 82,895 123,825 202,680 244,712 340,690 427,509 553,629 1,312,961 28,770 83,844 139,449 206,387 251,222 344,172 432,307 590,926 37,838 110,937 147,812 207,184 253,382 348,054 433,091 637,298 41,017 113,553 160,562 208,169 259,475 353,825 440,603 667,385 49,997 113,719 170,917 223,279 260,764 360,109 449,647 680,536 58,135 115,116 172,627 225,640 292,611 397,119 469,874 690,639 59,555 116,738 177,823 228,004 301,147 404,002 483,651 765,373 59,985 116,866 186,813 229,301 319,952 412,442 509,463 821,793 68,659 120,663 190,297 232,065 328,635 424,580 532,961 888,844 Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 15 TABLE 5. Frequency Distribution Table on Magnitude of Poor Population for the 82 Provinces of the Philippines: 2000 TABLE 5a TABLE 5b CLASS LIMITS LCL UCL f 2,500 152,499 24 152,500 302,499 24 302,500 452,499 18 452,500 602,499 7 602,500 752,499 4 752,500 902,499 3 902,500 1,052,499 1 1,052,500 1,202,499 0 1,202,500 1,352,499 1 Statistical Research and Training Center 82 CLASS LIMITS LCL UCL f 2,500 202,499 31 202,500 402,499 26 402,500 602,499 16 602,500 802,499 5 802,500 1,002,499 3 1,002,500 1,202,499 0 1,202,500 1,402,499 1 82 Training Course on Basic Statistics for Research August 24 - 28, 2009 16 TABLE 5c CLASS LIMITS LCL UCL f 2,500 192,499 30 192,500 382,499 26 382,500 572,499 16 572,500 762,499 5 762,500 952,499 3 952,500 1,142,499 1 1,142,500 1,332,499 1 82 Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 17 Example: This illustrates the use of appropriate column labels in a frequency distribution table. TABLE 6. Frequency Distribution Table of the Magnitude of Poor Population in the Phils: 2000 Magnitude of Poor Population No. of Provinces 2,500 - 192,499 30 192,500 - 382,499 26 382,500 - 572,499 16 572,500 - 762,499 5 762,500 - 952,499 3 952,500 - 1,142,499 1 1,142,500 - 1,332,499 1 Total Statistical Research and Training Center 82 Training Course on Basic Statistics for Research August 24 - 28, 2009 18 TABLE 7. Frequency Distribution Table with Class Boundaries and Class Marks Class Limits LCL UCL Class Boundaries LCB UCB Class Mark f 2,500 - 192,499 2,500 - 192,499 97,500 30 192,500 - 382,499 192,500 - 382,499 287,500 26 382,500 - 572,499 382,500 - 572,499 477,500 16 572,500 - 762,499 572,500 - 762,499 667,500 5 762,500 - 952,499 762,500 - 952,499 857,500 3 952,500 - 1,142,499 952,500 - 1,142,499 1,047,500 1 1,142,500 - 1,332,499 1,142,500 - 1,332,499 1,237,500 1 82 Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 19 Relative Frequency and Relative Frequency Percentage Relative frequency * divide the class frequency of a class interval to the number of observations * the sum of the relative frequency column is one Relative frequency percentage * multiply the relative frequency by 100 * the sum of the relative frequency percentage column is one hundred percent. Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 TABLE 8. Frequency Distribution Table with Relative Frequency and Relative Frequency Percentage 20 Relative Class Limits LCL UCL Relative Frequency f Frequency Percentage 2,500 - 192,499 30 0.366 36.6 192,500 - 382,499 26 0.317 31.7 382,500 - 572,499 16 0.195 19.5 572,500 - 762,499 5 0.061 6.1 762,500 - 952,499 3 0.037 3.7 952,500 -1,142,499 1 0.012 1.2 1,142,500 -1,332,499 1 0.012 1.2 82 1.000 100.0 Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 21 TABLE 9. Frequency Distribution Table with Less than Cumulative Frequency and Greater than Cumulative Frequency Distributions Less than cumulative Class Limits LCL UCL f Frequency Greater than Cumulative Frequency 2,500 - 192,499 30 30 82 192,500 - 382,499 26 56 52 382,500 - 572,499 16 72 26 572,500 - 762,499 5 77 10 762,500 - 952,499 3 80 5 952,500 -1,142,499 1 81 2 1,142,500 -1,332,499 1 82 1 82 Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 22 Graphical Representation of the Frequency Distribution Frequency Histogram - use the class frequency on the vertical axis and the class boundaries on the horizontal axis Frequency Polygon - use the class frequency on the vertical axis and the class mark on the horizontal axis Statistical Research and Training Center Training Course on Basic Statistics for Research August 24 - 28, 2009 STATISTICAL RESEARCH AND TRAINING CENTER J and S Building, 104 Kalayaan Avenue, Diliman, Quezon City Training Course on Basic Statistics for Research August 24-28, 2009 Thank you.