Chapter 2 Presenting Data in Tables and Charts 2.1 Tables and Charts for Categorical Data • Mutual Funds – • Variables? Measurement scales? Four Techniques 1. 2. 3. 4. The summary table The bar chart The pie chart The Pareto Diagram (and Pareto Principle) 2.2: Organizing Numerical Data • Big tables of data are difficult to fit into our minds. Two basic techniques: 1. Ordered Array – For each variable, arrange the data points in order (lowest to highest, etc.). Table 2.5 shows unarranged. Table 2.6 shows arranged. Interpret. 2. Stem and Leaf – – – – – – – For each variable, separate each data point into leading digits (stems) and trailing digits (leaves) E.g. “49” = “4” for stem and “9” for leaf Plot (smallest is on top) Example on page 30 is awful (rounding). Figure 2.7 awful. Example 2.5 interpretation is quite good. Our problem… 2.3: Tables and Charts for Numerical Data • Draw conclusions from a large set of data. Summarization. • Frequency = the number of times something occurs. • Frequency distribution = a presentation of frequencies where the data set has been arranged in groups or categories. • The presentation may be a formula, a chart, a rule, or a table. Frequency Distributions (FD) • How many categories or groups, usually known as classes? • How “wide” is each class, usually known as the class interval or class width? • What are the boundaries of the classes? FDs There will be many alternate ways to make a correct FD: judgment is required. • # of class intervals: between 5 and 15. More data points means more intervals. • Class width = data range / # of intervals (all class widths are equal). Formula 2.1, p 33. • Class boundaries must not overlap. Use judicious rounding to make the data easy to work with and easy to interpret. Text Example of FD • • • • • • • • n = 50 # of intervals = 10 Range = 63-14 = 49 Width = 49/10 = 4.9 or approximately 5 14 is approximately 10. 10+5 = 15. Etc. Result is Table 2.7. Notice side-by-side. Class Midpoint. Pick a different # of intervals if it improves FD. Relative FD • Relative and Percentage FDs are possible by dividing the frequency by the number of points in the data set. • Often more intuitively useful than plain old frequencies. • Very useful for comparing data sets. Requirements for comparison? • Table 2.9, p 35. Cumulative Distribution • Table 2.11, p 36. • Successive addition of frequencies or percentage frequencies. • In other words, keep a running total of the number or percentage of the data points that have been used in the table. Histogram • Graphical version of a FD. • Bar height (or bar length) represents the frequency or percentage frequency. • Bar widths are equal. • Variable of interest on the horizontal axis. • See Figure 2.8, p 37. Frequency Polygons • Plot the frequencies or percentage frequencies (at the class midpoints) and connect with lines. The polygon is the shape created by this procedure. • Variable of interest on the horizontal axis. • Very useful for graphically comparing FDs. • See Figure 2.10, p 39. Cumulative Frequency Polygon • “ogive” • Same basic structure: – Variable values on the x-axis (use the class midpoints) – Cumulative frequencies or cumulative percentage frequencies on the y-axis. Y-axis should start at “0”. – Connect the points • Best use is for comparing FDs of 2 or more variables. 2.4 Cross Tabulations • Cross-tab tables or contingency tables or cross-classification tables. • Two or more CATEGORICAL variables. • Pivot Table is your best friend. • Tables 2.14 and 2.15 are the best. • Don’t use Tables 2.16 and 2.17 in this class. • “Chartify” in side-by-side chart. 2.5 Scatter Diagrams and Time-Series Plots Scatter Diagram • Two NUMERICAL variables. • “… examine possible relationships….” • anatomy of graphs and relationships Time-Series Plot • Variable on X-axis or horizontal-axis is time.