3 - basicstat-srtc

advertisement
STATISTICAL RESEARCH AND TRAINING CENTER
J and S Building, 104 Kalayaan Avenue, Diliman, Quezon City
Training Course on Basic Statistics for Research
August 24-28, 2009
Methods of Organizing Data
Prepared by:
Josefina V. Almeda
Professor and College Secretary
School of Statistics
University of the Philippines, Diliman
August 2009
2
Quantitative Classification of Data
* use quantitative classification if the observed values of the
data are either a result of count or measurement
* organize this type of data in tabular form in the form of
a frequency distribution table.
Frequency distribution is a summarized table wherein the
classes are either distinct values or intervals with a frequency
count.
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
3
Forms of the Frequency Distribution
Single value grouping
* is a frequency count of observed values wherein classes are
distinct values
* range of values is short and with many unique values
occurring more than once
Grouping by class intervals
* is a frequency count of observed values wherein the classes
are intervals.
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
4
Data for Single Value Grouping
Suppose we have data on the number of children of 50
currently married women using any modern contraceptive
method. Construct a summary table for the data set below.
0
0
1
2
2
2
3
3
4
4
0
0
1
2
2
3
3
3
4
4
0
1
1
2
2
3
3
3
4
4
0
1
1
2
2
3
3
3
4
5
0
1
1
2
2
3
3
3
4
5
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
5
Example of Single Value Grouping
Distribution of Currently Married Women Using Any Modern
Method of Contraceptive by Number of Children:
No. of
Frequency of
Children Married Women
%
0
7
14
1
8
16
2
11
22
3
14
28
4
8
16
5
2
4
TOTAL
50
100
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
Definition of Terms Used in a Frequency
Distribution Table
6
Class interval contains the numbers defining a class.
Class frequency is the number of observations falling under a
class interval.
Class limits are the end numbers of a class interval.
* The lower class limit (LCL) is the lower end of the class
interval and the upper class limit (UCL) is the upper end
of the class interval.
* The number of digits of the class limits should be the
same as the number of digits of the raw data.
Open class interval is a class interval with either no lower class
limit or upper class limit.
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
7
Class boundaries are the true class limits.
* There are no gaps in the class boundaries.
* The number of decimal places is one more than the
number of decimal place of the class limits.
* The lower class boundary (LCB) is average of the
lower class limit of the class interval and the upper
class limit of the preceding class interval.
* The upper class boundary (UCB) is the average of
the upper class limit of the class interval and the
lower class limit of the next class interval.
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
8
Class size is the size of the class interval.
* It is the difference between two successive lower
class limits, or two successive upper class limits, or
two successive lower class boundaries, or two
successive upper class boundaries.
Class mark is the midpoint of a class interval.
* It is the average of the lower class limit and the upper
class limit or the average of the lower class boundary
and upper class boundary of a class interval.
Modal class is the class interval having the highest frequency.
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
Steps in Constructing a Frequency
Distribution Table
9
1. Determine an adequate number of classes (K).
* The number of classes should not be too many or not
too few.
* Usually, the number of classes is between 5 and 20.
* The class intervals should be non-overlapping.
2. Determine the range (R). Range = Maximum – Minimum
3. Calculate the approximate class size (C’).
C’ = R/K
4. Determine the class size (C ) by rounding off C’ to a number
that is easy to work with. We recommend class sizes of
multiples of 5, 10, 15, 20, etc.
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
10
5.
List the required number (K) of class intervals.
* Start with the lower class limit of the lowest class
interval.
* Its value should be less or equal to the minimum value of the data set.
* Add the class size (C) to the lower class limit to get
the next lower class limit.
* The last class interval should include the maximum
value.
6. Tally the frequency for each class interval.
7. Sum the frequency column and check against the total number of
observations.
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
TABLE 3. Magnitude of Poor Population in the Philippines: 2000
NCR
848,962
Region 2
820,786
(National
1st District
120,663
(Cagayan
Batanes
Capital
2nd District
229,301
Valley)
Cagayan
Region)1
3rd District
292,611
Isabela
4th District
206,387
536,169
CAR
(Cordillera
Abra
Administrative
Region)
2,535
Region 4a
1,699,333
Batangas
440,603
251,222
Cavite
244,712
424,580
Laguna
207,184
Nueva Vizcaya
82,895
Quezon
667,385
Quirino
59,555
Rizal
139,449
110,937
Region 3
1,695,227
Apayao
28,770
(Central
Aurora
59,985
Benguet
122,762
Luzon)
Bataan
68,659
Ifugao
113,719
Bulacan
147,812
(CALABARZON)
11
Region 4b
(MIMAROPA)
1,030,987
Marinduque
113,553
Occidental Mindoro
177,823
Oriental Mindoro
340,690
Kalinga
83,844
Nueva Ecija
532,961
Region 5
Mt. Province
76,137
Pampanga
331,739
(Bicol
Albay
553,629
Tarlac
360,109
Region)
Camarines Norte
301,147
Zambales
193,962
Camarines Sur
765,373
Region 1
1,447,638
2,540,618
(Ilocos
Ilocos Norte
115,116
Region)
Ilocos Sur
190,297
Catanduanes
116,866
La Union
253,382
Masbate
483,651
Pangasinan
888,844
Sorsogon
319,952
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
12
Region 6
2,765,055
Region 8
1,646,371
Region 10
1,580,249
(Western
Aklan
186,813
(Eastern
Biliran
58,135
(Northern
Bukidnon
449,647
Visayas)
Antique
208,169
Visayas)
Eastern
Samar
202,680
Mindanao)
Camiguin
41,017
Capiz
328,635
Leyte
680,536
Lanao Del
Norte
424,819
37,838
Northern
Samar
240,228
Misamis
Occidental
260,764
690,639
Southern
Samar
116,738
Misamis
Oriental
404,002
Western
Samar
348,054
Guimaras
Iloilo
Negros
Occidental
Region 7
1,312,961
2,017,162
Region 9
(Davao
Davao del
Norte
637,298
Region)
Daval del Sur
412,442
Davao
Oriental
172,627
Bohol
590,926
(Zamboanga
Zamboanga
del Norte
433,091
Visayas)
Cebu
973,490
Peninsula)
Zamboanga
del Sur
821,793
Negros
Oriental
427,509
Zamboanga
Sibugay
Siquijor
25,237
Isabela City3
Statistical Research and Training Center
1,222,367
1,254,884
(Central
2
Region 11
Compostela
Valley
4
Training Course on Basic Statistics for Research
August 24 - 28, 2009
13
Region 12
(SOCCSKSAR
GEN)
1,596,785
Region 13
1,071,005
ARMM
1,648,441
Agusan del Norte
259,475
(Autonomous
Region
Basilan
123,825
223,279
Agusan del Sur
353,825
in Muslim
Mindanao
Lanao del
Sur
432,307
South Cotabato
469,874
Surigao del
Norte
232,065
Maguindanao
534,628
Sultan Kudarat
344,172
Sulu
397,119
Tawi-tawi
160,562
North Cotabato
509,463
Saranggani
Cotabato City
49,997
(Caraga)
1 Districts of NCR cover the following: 1st District – Manila; end District –
Mandaluyong, Marikina, Pasig, Quezon City and San Juan; 3rd District Valenzuela, Kaloocan City, Malabon and Navotas; and 4th District – Las Pinas,
Makati, Muntinlupa, Paranaque, Pasay City, Pateros, and Taguig.
2 Zamboanga Sibugay was part of Zamboanga del Sur in 2000. Thus, 2000
estimates of Zamboanga del Sur includes Zamboanga Sibugay
3 Isabela City was part of Basilan in 2000. Thus, 2000 estimates of Basilan still
includes Isabela City.
4 Davao del Norte estimates for 2000 include Compostela Valley.
Source: National Statistical Coordination Board
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
14
TABLE 4. Sorted Data (Array) of Magnitude of Poor
Population for the 82 provinces of the Philippines: 2000
2,535
76,137
122,762
193,962
240,228
331,739
424,819
534,628
973,490
25,237
82,895
123,825
202,680
244,712
340,690
427,509
553,629
1,312,961
28,770
83,844
139,449
206,387
251,222
344,172
432,307
590,926
37,838
110,937
147,812
207,184
253,382
348,054
433,091
637,298
41,017
113,553
160,562
208,169
259,475
353,825
440,603
667,385
49,997
113,719
170,917
223,279
260,764
360,109
449,647
680,536
58,135
115,116
172,627
225,640
292,611
397,119
469,874
690,639
59,555
116,738
177,823
228,004
301,147
404,002
483,651
765,373
59,985
116,866
186,813
229,301
319,952
412,442
509,463
821,793
68,659
120,663
190,297
232,065
328,635
424,580
532,961
888,844
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
15
TABLE 5. Frequency Distribution Table on Magnitude of Poor
Population for the 82 Provinces of the Philippines: 2000
TABLE 5a
TABLE 5b
CLASS LIMITS
LCL
UCL
f
2,500
152,499
24
152,500
302,499
24
302,500
452,499
18
452,500
602,499
7
602,500
752,499
4
752,500
902,499
3
902,500
1,052,499
1
1,052,500
1,202,499
0
1,202,500
1,352,499
1
Statistical Research and Training Center
82
CLASS LIMITS
LCL
UCL
f
2,500
202,499
31
202,500
402,499
26
402,500
602,499
16
602,500
802,499
5
802,500
1,002,499
3
1,002,500
1,202,499
0
1,202,500
1,402,499
1
82
Training Course on Basic Statistics for Research
August 24 - 28, 2009
16
TABLE 5c
CLASS LIMITS
LCL
UCL
f
2,500
192,499
30
192,500
382,499
26
382,500
572,499
16
572,500
762,499
5
762,500
952,499
3
952,500
1,142,499
1
1,142,500
1,332,499
1
82
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
17
Example: This illustrates the use of appropriate column
labels in a frequency distribution table.
TABLE 6. Frequency Distribution Table of the
Magnitude of Poor Population in the Phils: 2000
Magnitude of Poor Population
No. of Provinces
2,500
-
192,499
30
192,500
-
382,499
26
382,500
-
572,499
16
572,500
-
762,499
5
762,500
-
952,499
3
952,500
-
1,142,499
1
1,142,500
-
1,332,499
1
Total
Statistical Research and Training Center
82
Training Course on Basic Statistics for Research
August 24 - 28, 2009
18
TABLE 7. Frequency Distribution Table with
Class Boundaries and Class Marks
Class Limits
LCL
UCL
Class Boundaries
LCB
UCB
Class Mark
f
2,500 -
192,499
2,500 -
192,499
97,500
30
192,500 -
382,499
192,500 -
382,499
287,500
26
382,500 -
572,499
382,500 -
572,499
477,500
16
572,500 -
762,499
572,500 -
762,499
667,500
5
762,500 -
952,499
762,500 -
952,499
857,500
3
952,500 - 1,142,499
952,500 -
1,142,499
1,047,500
1
1,142,500 - 1,332,499
1,142,500 -
1,332,499
1,237,500
1
82
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
19
Relative Frequency and Relative Frequency Percentage
Relative frequency
* divide the class frequency of a class interval to the number of
observations
* the sum of the relative frequency column is one
Relative frequency percentage
* multiply the relative frequency by 100
* the sum of the relative frequency percentage column is one
hundred percent.
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
TABLE 8. Frequency Distribution Table
with Relative Frequency and Relative
Frequency Percentage
20
Relative
Class Limits
LCL
UCL
Relative
Frequency
f
Frequency
Percentage
2,500
- 192,499
30
0.366
36.6
192,500
- 382,499
26
0.317
31.7
382,500
- 572,499
16
0.195
19.5
572,500
- 762,499
5
0.061
6.1
762,500
- 952,499
3
0.037
3.7
952,500
-1,142,499
1
0.012
1.2
1,142,500
-1,332,499
1
0.012
1.2
82
1.000
100.0
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
21
TABLE 9. Frequency Distribution Table
with Less than Cumulative Frequency and Greater than
Cumulative Frequency Distributions
Less than
cumulative
Class Limits
LCL
UCL
f
Frequency
Greater than
Cumulative
Frequency
2,500
- 192,499
30
30
82
192,500
- 382,499
26
56
52
382,500
- 572,499
16
72
26
572,500
- 762,499
5
77
10
762,500
- 952,499
3
80
5
952,500
-1,142,499
1
81
2
1,142,500
-1,332,499
1
82
1
82
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
22
Graphical Representation of the Frequency Distribution
 Frequency Histogram
- use the class frequency on the vertical axis and
the class boundaries on the horizontal axis
 Frequency Polygon
- use the class frequency on the vertical axis and
the class mark on the horizontal axis
Statistical Research and Training Center
Training Course on Basic Statistics for Research
August 24 - 28, 2009
STATISTICAL RESEARCH AND TRAINING CENTER
J and S Building, 104 Kalayaan Avenue, Diliman, Quezon City
Training Course on Basic Statistics for Research
August 24-28, 2009
Thank you.
Download