table

advertisement
Chapter 2
Presenting Data in Tables and Charts
2.1 Tables and Charts for Categorical
Data
•
Mutual Funds
–
•
Variables? Measurement scales?
Four Techniques
1.
2.
3.
4.
The summary table
The bar chart
The pie chart
The Pareto Diagram (and Pareto Principle)
2.2: Organizing Numerical Data
•
Big tables of data are difficult to fit into
our minds. Two basic techniques:
1. Ordered Array
– For each variable, arrange the data
points in order (lowest to highest, etc.).
Table 2.5 shows unarranged. Table 2.6
shows arranged. Interpret.
2. Stem and Leaf
–
–
–
–
–
–
–
For each variable, separate each data point
into leading digits (stems) and trailing digits
(leaves)
E.g. “49” = “4” for stem and “9” for leaf
Plot (smallest is on top)
Example on page 30 is awful (rounding).
Figure 2.7 awful.
Example 2.5 interpretation is quite good.
Our problem…
2.3: Tables and Charts for
Numerical Data
• Draw conclusions from a large set of data.
Summarization.
• Frequency = the number of times
something occurs.
• Frequency distribution = a presentation of
frequencies where the data set has been
arranged in groups or categories.
• The presentation may be a formula, a chart,
a rule, or a table.
Frequency Distributions (FD)
• How many categories or groups, usually
known as classes?
• How “wide” is each class, usually known as
the class interval or class width?
• What are the boundaries of the classes?
FDs
There will be many alternate ways to make a
correct FD: judgment is required.
• # of class intervals: between 5 and 15.
More data points means more intervals.
• Class width = data range / # of intervals (all
class widths are equal). Formula 2.1, p 33.
• Class boundaries must not overlap. Use
judicious rounding to make the data easy to
work with and easy to interpret.
Text Example of FD
•
•
•
•
•
•
•
•
n = 50
# of intervals = 10
Range = 63-14 = 49
Width = 49/10 = 4.9 or approximately 5
14 is approximately 10. 10+5 = 15. Etc.
Result is Table 2.7. Notice side-by-side.
Class Midpoint.
Pick a different # of intervals if it improves FD.
Relative FD
• Relative and Percentage FDs are possible
by dividing the frequency by the number of
points in the data set.
• Often more intuitively useful than plain old
frequencies.
• Very useful for comparing data sets.
Requirements for comparison?
• Table 2.9, p 35.
Cumulative Distribution
• Table 2.11, p 36.
• Successive addition of frequencies or
percentage frequencies.
• In other words, keep a running total of the
number or percentage of the data points that
have been used in the table.
Histogram
• Graphical version of a FD.
• Bar height (or bar length) represents the
frequency or percentage frequency.
• Bar widths are equal.
• Variable of interest on the horizontal axis.
• See Figure 2.8, p 37.
Frequency Polygons
• Plot the frequencies or percentage
frequencies (at the class midpoints) and
connect with lines. The polygon is the
shape created by this procedure.
• Variable of interest on the horizontal axis.
• Very useful for graphically comparing FDs.
• See Figure 2.10, p 39.
Cumulative Frequency Polygon
• “ogive”
• Same basic structure:
– Variable values on the x-axis (use the class
midpoints)
– Cumulative frequencies or cumulative
percentage frequencies on the y-axis. Y-axis
should start at “0”.
– Connect the points
• Best use is for comparing FDs of 2 or more
variables.
2.4 Cross Tabulations
• Cross-tab tables or contingency tables or
cross-classification tables.
• Two or more CATEGORICAL variables.
• Pivot Table is your best friend.
• Tables 2.14 and 2.15 are the best.
• Don’t use Tables 2.16 and 2.17 in this class.
• “Chartify” in side-by-side chart.
2.5 Scatter Diagrams and Time-Series
Plots
Scatter Diagram
• Two NUMERICAL variables.
• “… examine possible relationships….”
• anatomy of graphs and relationships
Time-Series Plot
• Variable on X-axis or horizontal-axis is
time.
Download