Displaying & Describing Categorical Data To Analyze Data, Make a Picture! • Pictures show patterns in your data you may not otherwise see. • Pictures make it easier for others to understand your data. What kind of picture? It depends on the data… First Step – Frequency Table • Records the totals and category names for counts of categorical data • Associated Terms: –Proportion: Count of a given category divided by the total number of cases –Relative Frequency Table: Displays percentages instead of counts –Distribution: Breakdown of categories and associated frequencies of data Frequency Table Example • Suppose Target sells 50 stuffed animals, Walmart sells 75 stuffed animals and Toys’R’Us sells 150 stuffed animals in one week. This is the associated frequency table. FREQUENCY RELATIVE FREQ. STORE Count STORE % Target Walmart Toys’R’Us 50 75 150 Target Walmart Toys’R’Us 18.18 27.27 54.55 The Area Principle • The area occupied by a part of the graph should correspond to the magnitude of the value it represents. • Our eyes tend to be more impressed by area than other aspects like length so… • Neglecting the area principle is a very popular way to “lie” with Statistics. Example of Abuse of Area 150 75 50 TARGET WALMART TOYS’R’US With this area, Toys’R’Us is at least 8 times as large as Target but in reality, they only sell 3 times at many bears. Deceptive? I’d say so! 10 for 1! How could you fix it? Easy! 150 75 50 TARGET WALMART TOYS’R’US Bar Charts • An accurate visual representation of counts of categorical data • All bars should be the exact same width & equally spaced • Can be vertical or horizontal Bar Chart of Stuffed Animal Data Relative Frequency Bar Chart • All bars have same width and spacing • Draws attention to proportions rather than actual counts • Looks the same but shows percentages instead of frequencies Stuffed Animal Relative Frequency Bar Chart Pie Charts • Whole group of cases is represented as one circle. • The size of the slice is proportional to the fraction of the whole in a given category. 75 Target 150 Walmart Toys'R'Us 50 Contingency Table • Shows how individuals are distributed along two different variables • Margins of the table give the totals • Marginal Distribution: A frequency distribution that appears in the margins of a contingency table Example of Contingency Table Reagan Class 9th 10th 11th 12th Total Grade Grade Grade Grade Male 540 420 352 267 1579 E Female 505 400 321 275 1501 1045 820 673 542 3080 S X Total Conditional Distribution • Shows the distribution of one variable for just the individuals who satisfy some condition on another variable • Independent: When the distribution of one variable is the same for all categories of another variable Example of Conditional Distributions Male 9th 10th 11th Grade Grade Grade 12th Total Grade 540 420 34.2% 26.6% 267 16.9% 352 22.3% 1579 100% 9th 10th 11th 12th Total Grade Grade Grade Grade Female 505 400 321 275 1501 33.6% 26.6% 21.4% 18.3% 100% Segmented Bar Chart • One bar represents the entire population – compares percentages instead of counts • The bars are of equal height & divided proportionally into segments corresponding to percentages in each group Example of Segmented Bar Chart 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Male Female Freshmen Sophomores Juniors Seniors Simpson’s Paradox • Using unfair averaging techniques when computing data may give you misleading results. • It is always better to compare averages within one category since the overall average may be misleading.