CHAPTER 2 : ORGANIZING DATA

advertisement
CHAPTER 2:
ORGANIZING DATA
RAW DATA
Definition
Data recorded in the sequence in which
they are collected and before they are
processed or ranked are called raw data.
2
Table 2.1
21
18
25
22
25
19
20
19
28
23
24
19
31
21
18
25
22
19
20
37
Ages of 50 students
29
19
23
22
27
34
19
18
22
23
26
25
23
21
21
27
22
19
20
25
37
25
23
19
21
33
23
26
21
24
3
Table 2.2
J
F
J
SO
SE
F
F
F
SE
SE
SO
J
SE
J
F
SE
F
SO
SO
SE
Status of 50 Students
J
F
SO
SO
J
J
F
F
J
SO
SE
SE
J
J
F
J
SO
F
SO
J
J
SE
SE
F
SO
J
J
SE
SO
SO
4
ORGANIZING AND GRAPHING
QUALITATIVE DATA



Frequency Distributions
Relative Frequency and Percentage
Distributions
Graphical Presentation of Qualitative Data


Bar Graphs
Pie Charts
5
TABLE 2.3
Variable
Category
Type of Employment Students Intend to
Engage In
Type of Employment
Private companies/businesses
Federal government
State/local government
Own business
Number of Frequency
column
Students
44
Frequency
16
23
17
Sum = 100
6
Frequency Distributions
Definition
A frequency distribution for qualitative
data lists all categories and the number of
elements that belong to each of the
categories.
7
Example 2-1
A sample of 30 employees from large
companies was selected, and these
employees were asked how stressful their
jobs were. The responses of these
employees are recorded next where very
represents very stressful, somewhat means
somewhat stressful, and none stands for
not stressful at all.
8
Example 2-1
Some what
None
Very
Somewhat Very
Very
None
Somewhat Somewhat Very
Somewhat
Somewhat
Very
Somewhat None
None
Somewhat
Somewhat
Very
Somewhat Somewhat Very
None
Somewhat
Very
very
Somewhat
Very
somewhat None
Construct a frequency distribution table for these
data.
9
Solution 2-1
Table 2.4
Frequency Distribution of Stress on Job
Stress on Job
Very
Somewhat
None
Tally
|||| ||||
|||| |||| ||||
|||| |
Frequency (f)
10
14
6
Sum = 30
10
Relative Frequency and
Percentage Distributions
Calculating Relative Frequency of a
Category
Frequency of that category
Re lative frequency of a category 
Sum of all frequencie s
11
Relative Frequency and
Percentage Distributions cont.
Calculating Percentage
Percentage = (Relative frequency) · 100
12
Example 2-2
Determine the relative frequency and
percentage for the data in Table 2.4.
13
Solution 2-2
Table 2.5 Relative Frequency and Percentage Distributions of
Stress on Job
Stress on Job
Very
Somewhat
None
Relative Frequency
10/30 = .333
14/30 = .467
6/30 = .200
Sum = 1.00
Percentage
.333(100) = 33.3
.467(100) = 46.7
.200(100) = 20.0
Sum = 100
14
Graphical Presentation of
Qualitative Data
Definition
A graph made of bars whose heights
represent the frequencies of respective
categories is called a bar graph.
15
Figure 2.1
Bar graph for the frequency distribution of
Table 2.4
16
14
Frequency
12
10
8
6
4
2
0
Very
Somewhat
None
Strees on Job
16
Graphical Presentation of
Qualitative Data cont.
Definition
A circle divided into portions that represent
the relative frequencies or percentages of a
population or a sample belonging to
different categories is called a pie chart.
17
Table 2.6
Calculating Angle Sizes for the Pie Chart
Stress on Job Relative Frequency
Very
Somewhat
None
.333
.467
.200
Sum = 1.00
Angle Size
360(.333) = 119.88
360(.467) = 168.12
360(.200) = 72.00
Sum = 360
18
Figure 2.2
Pie chart for the percentage distribution of
Table 2.5.
None, 20%
Very,
33.30%
Somewhat,
46.70%
19
ORGANIZING AND GRAPHING
QUANTITATIVE DATA




Frequency Distributions
Constructing Frequency Distribution Tables
Relative and Percentage Distributions
Graphing Grouped Data


Histograms
Polygons
20
Frequency Distributions
Table 2.7
Variable
Third class
Weekly Earnings of 100 Employees of a Company
Weekly Earnings
(dollars)
Number of Employees
f
401 to 600
601 to 800
801 to 1000
1001 to 1200
1201 to 1400
1401 to 1600
9
22
39
15
9
6
Lower limit of the
sixth class
Frequency
column
Frequency of the
third class
Upper limit of the
sixth class
21
Frequency Distributions cont.
Definition
A frequency distribution for quantitative
data lists all the classes and the number of
values that belong to each class. Data
presented in the form of a frequency
distribution are called grouped data.
22
Frequency Distributions cont.
Definition
The class boundary is given by the
midpoint of the upper limit of one class and
the lower limit of the next class.
23
Frequency Distributions cont.
Finding Class Width
Class width = Upper boundary – Lower boundary
24
Frequency Distributions cont.
Calculating Class Midpoint or Mark
Lower limit  Upper limit
Class midpoint or mark 
2
25
Constructing Frequency
Distribution Tables
Calculation of Class Width
Largest va lue - Smallest v alue
Approximat e class width 
Number of classes
26
Table 2.8
Class Limits
Class Boundaries, Class Widths, and Class
Midpoints for Table 2.7
Class Boundaries
401 to 600
400.5 to less than 600.5
601 to 800
600.5 to less than 800.5
801 to 1000 800.5 to less than 1000.5
1001 to 1200 1000.5 to less than 1200.5
1201 to 1400 1200.5 to less than 1400.5
1401 to 1600 1400.5 to less than 1600.5
Class Width
Class Midpoint
200
200
200
200
200
200
500.5
700.5
900.5
1100.5
1300.5
1500.5
27
Example 2-3
Table 2.9 gives the total home runs hit by all
players of each of the 30 Major League
Baseball teams during the 2002 season.
Construct a frequency distribution table.
28
Table 2.9
Team
Anaheim
Arizona
Atlanta
Baltimore
Boston
Chicago Cubs
Chicago White Sox
Cincinnati
Cleveland
Colorado
Detroit
Florida
Houston
Kansas City
Los Angeles
Home Runs Hit by Major League Baseball
Teams During the 2002 Season
Home Runs
152
165
164
165
177
200
217
169
192
152
124
146
167
140
155
Team
Milwaukee
Minnesota
Montreal
New York Mets
New York Yankees
Oakland
Philadelphia
Pittsburgh
St. Louis
San Diego
San Francisco
Seattle
Tampa Bay
Texas
Toronto
Home Runs
139
167
162
160
223
205
165
142
175
136
198
152
133
230
187
29
Solution 2-3
230  124
Approximat e width of each class 
 21.2
5
Now we round this approximate width to a convenient
number – say, 22.
30
Solution 2-3
The lower limit of the first class can be taken as
124 or any number less than 124. Suppose we
take 124 as the lower limit of the first class. Then
our classes will be
124 – 145, 146 – 167, 168 – 189, 190 – 211,
and 212 - 233
31
Table 2.10
Frequency Distribution for the Data of
Table 2.9
Total Home Runs
Tally
f
124 – 145
146 – 167
168 – 189
190 – 211
212 - 233
|||| |
|||| |||| |||
||||
||||
|||
6
13
4
4
3
∑f = 30
32
Relative Frequency and
Percentage Distributions
Relative Frequency and Percentage Distributions
Frequency of that class
Relative frequency of a class 

Sum of all frequencie s
f
f
Percentage  (Relative frequency)  100
33
Example 2-4
Calculate the relative frequencies and
percentages for Table 2.10
34
Solution 2-4
Table 2.11
Relative Frequency and Percentage Distributions for
Table 2.10
Total Home
Runs
Class Boundaries
Relative
Frequency
Percentage
124 – 145
146 – 167
168 – 189
190 – 211
212 - 233
123.5 to less than 145.5
145.5 to less than 167.5
167.5 to less than 189.5
189.5 to less than 211.5
211.5 to less than 233.5
.200
.433
.133
.133
.100
20.0
43.3
13.3
13.3
10.0
Sum = .999
Sum = 99.9%
35
Graphing Grouped Data
Definition
A histogram is a graph in which classes are
marked on the horizontal axis and the frequencies,
relative frequencies, or percentages are marked on
the vertical axis. The frequencies, relative
frequencies, or percentages are represented by the
heights of the bars. In a histogram, the bars are
drawn adjacent to each other.
36
Figure 2.3
Frequency histogram for Table 2.10.
15
Frequency
12
9
6
3
0
124 - 146 145
167
168 -
190 -
212 -
189
211
233
Total home runs
37
Figure 2.4
Relative frequency histogram for Table
2.10.
Relative Frequency
.50
.40
.30
.20
.10
0
124 145
146 167
168 -
190 -
212 -
189
211
233
Total home runs
38
Graphing Grouped Data cont.
Definition
A graph formed by joining the midpoints of
the tops of successive bars in a histogram
with straight lines is called a polygon.
39
Figure 2.5
Frequency polygon for Table 2.10.
15
Frequency
12
9
6
3
0
124 145
146 167
168 -
190 -
212 -
189
211
233
40
Frequency Distribution curve.
Frequency
Figure 2.6
x
41
Example 2-5
The following data give the average travel
time from home to work (in minutes) for 50
states. The data are based on a sample
survey of 700,000 households conducted by
the Census Bureau (USA TODAY, August 6,
2001).
42
Example 2-5
22.4
19.7
21.6
15.4
21.1
18.2
27.0
21.9
22.1
25.4
23.7
21.7
23.2
19.6
24.9
19.8
17.6
16.0
21.4
25.5
26.7
17.7
16.1
23.8
20.1
23.4
22.5
22.3
21.9
17.1
23.5
23.7
24.4
21.9
22.5
21.2
28.7
15.6
24.3
29.2
19.9
22.7
26.7
26.1
31.2
23.6
24.2
22.7
22.6
20.8
Construct a frequency distribution table. Calculate
the relative frequencies and percentages for all
classes.
43
Solution 2-5
31.2  15.4
Approximat e width of each class 
 2.63
6
44
Solution 2-5
Table 2.12
Frequency, Relative Frequency, and Percentage
Distributions of Average Travel Time to Work
Class Boundaries
15
18
21
24
27
30
to
to
to
to
to
to
less than
less than
less than
less than
less than
less than
18
21
24
27
30
33
f
Relative
Frequency
Percentage
7
7
23
9
3
1
.14
.14
.46
.18
.06
.02
14
14
46
18
6
2
Σf = 50
Sum = 1.00
Sum = 100%
45
Example 2-6
The administration in a large city wanted to know the
distribution of vehicles owned by households in that city. A
sample of 40 randomly selected households from this city
produced the following data on the number of vehicles
owned:
5 1 1 2 0 1 1 2 1 1
1 3 3 0 2 5 1 2 3 4
2 1 2 2 1 2 2 1 1 1
4 2 1 1 2 1 1 4 1 3
Construct a frequency distribution table for these data, and
draw a bar graph.
46
Solution 2-6
Table 2.13
Frequency Distribution of Vehicles Owned
Vehicles Owned
0
1
2
3
4
5
Number of
Households (f)
2
18
11
4
3
2
Σf = 40
47
Figure 2.7
Bar graph for Table 2.13.
20
18
16
Frequency
14
12
10
8
6
4
2
0
No Car
1 Car
2 Cars
3 Cars
4 Cars
5 Cars
Vehicles ow ned
48
SHAPES OF HISTOGRAMS
1.
2.
3.
Symmetric
Skewed
Uniform or rectangular
49
Figure 2.8
Symmetric histograms.
50
Figure 2.9
(a)
(a) A histogram skewed to the right. (b) A
histogram skewed to the left.
(b)
51
Figure 2.10
A histogram with uniform distribution.
52
Figure 2.11
(a) and (b) Symmetric frequency curves.
(c) Frequency curve skewed to the right.
(d) Frequency curve skewed to the left.
53
CUMULATIVE FREQUENCY
DISTRIBUTIONS
Definition
A cumulative frequency distribution gives
the total number of values that fall below the
upper boundary of each class.
54
Example 2-7
Using the frequency distribution of Table
2.10, reproduced in the next slide, prepare
a cumulative frequency distribution for the
home runs hit by Major League Baseball
teams during the 2002 season.
55
Example 2-7
Total Home Runs
f
124 – 145
146 – 167
168 – 189
190 – 211
212 - 233
6
13
4
4
3
56
Solution 2-7
Table 2.14
Class Limits
124
124
124
124
124
–
–
–
–
–
145
167
189
211
233
Cumulative Frequency Distribution of Home Runs by
Baseball Teams
Class Boundaries
123.5 to less than 145.5
123.5 to less than 167.5
123.5 to less than 189.5
123.5 to less than 211.5
123.5 to less than 233.5
Cumulative Frequency
6
6
6
6
6
+
+
+
+
13
13
13
13
=
+
+
+
19
4 = 23
4 + 4 = 27
4 + 4 + 3 = 30
57
CUMULATIVE FREQUENCY
DISTRIBUTIONS cont.
Calculating Cumulative Relative Frequency
and Cumulative Percentage
Cumulative frequency of a class
Cumulative relative frequency 
Total observatio ns in the data set
Cumulative percentage  (Cumulativ e relative frequency)  100
58
Table 2.15
Cumulative Relative Frequency and
Cumulative Percentage Distributions for
Home Runs Hit by baseball Teams
Class Limits
Cumulative
Relative Frequency
Cumulative
Percentage
124 – 145
124 – 167
124 – 189
124 – 211
124 - 233
6/30 = .200
19/30 = .633
23/30 = .767
27/30 = .900
30/30 = 1.00
20.0
63.3
76.7
90.0
100.0
59
CUMULATIVE FREQUENCY
DISTRIBUTIONS cont.
Definition
An ogive is a curve drawn for the
cumulative frequency distribution by joining
with straight lines the dots marked above
the upper boundaries of classes at heights
equal to the cumulative frequencies of
respective classes.
60
Cumulative frequency
Figure 2.12
Ogive for the cumulative frequency
distribution in Table 2.14
30
25
20
15
10
5
123.5 145.5 167.5 189.5 211.5 233.5
Total home runs
61
STEM-AND-LEAF DISPLAYS
Definition
In a stem-and-leaf display of quantitative
data, each value is divided into two
portions – a stem and a leaf. The leaves for
each stem are shown separately in a display.
62
Example 2-8
The following are the scores of 30 college
students on a statistics test:
75
69
83
52
72
84
80
81
77
96
61
64
65
76
71
79
86
87
71
79
72
87
68
92
93
50
57
95
92
98
Construct a stem-and-leaf display.
63
Solution 2-8
To construct a stem-and-leaf display for
these scores, we split each score into two
parts. The first part contains the first digit,
which is called the stem. The second part
contains the second digit, which is called the
leaf.
64
Solution 2-8
We observe from the data that the stems
for all scores are 5, 6, 7, 8, and 9 because
all the scores lie in the range 50 to 98
65
Figure 2.13
Stem-and-leaf display.
Stems
Leaf for 52
5
6
7
8
9
2
Leaf for 75
5
66
Solution 2-8
After we have listed the stems, we read the
leaves for all scores and record them next
to the corresponding stems on the right
side of the vertical line.
67
Figure 2.14
5
6
7
8
9
2
5
5
0
6
Stem-and-leaf display of test scores.
0
9
9
7
3
7
1
1
1
5
8
2
6
2
4
6 9 7 1 2
3 4 7
2 8
68
Figure 2.15
5
6
7
8
9
0
1
1
0
2
Ranked stem-and-leaf display of test
scores.
2
4
1
1
2
7
5
2
3
3
8
2
4
5
9
5 6 7 9 9
6 7 7
6 8
69
Example 2-9
The following data are monthly rents paid by
a sample of 30 households selected from a
small city.
880
1210
1151
1081 721 1075
985 1231 932
630 1175 952
1023 775
850 825
1100 1140
1235
1000
750
750 965 960
915 1191 1035
1140 1370 1280
Construct a stem-and-leaf display for these
data.
70
Solution 2-9
Figure 2.16
Stem-and-leaf
display of rents.
6
7
8
9
10
11
12
13
30
75
80
32
23
91
10
70
50
25
52
81
51
31
21
50
15
35
40
35
50
60 85 65
75 00
75 40 00
80
71
Example 2-10
The following stem-and-leaf display is
prepared for the number of hours that 25
students spent working on computers
during the last month.
72
Example 2-10
0
1
2
3
4
5
6
7
8
6
1
2
2
1
3
2
7
6
4
5
6
4
9
7 8
6 9 9
8
4 5 7
5 6
Prepare a new stem-and-leaf display by
grouping the stems.
73
Solution 2-10
Figure 2.17
0–2
3–5
6–8
Grouped stem-and-leaf display.
6 * 1 7 9 * 2 6
2 4 7 8 * 1 5 6 9 9 * 3 6 8
2 4 4 5 7 * * 5 6
74
Download