Categorical Data AP Statistics: Chapter 3 MAKE A PICTURE! First, create a frequency table Example: number of students at CB South in each grade: Grade 10 11 12 TOTAL 534 552 515 Proportion = decimal: .30, .05 Percent = %: 30%, 5% Frequency = # of things (count) Relative frequency = % of things Distribution (of a variable)- shows the values of the variable ad how often the sample takes each value Examples: bar chart, pie chart, histogram, stemplot, etc. Categorical Distributions: 1. Bar Chart Percent of Students Taking AP Exam Number of Students Taking AP Exam 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% 80 # 40 20 9th 10th 11th 12th grade 2. Notice the spaces In between bars 60 Frequency % Percent Relative frequency 0 9th 10th grade Pie Chart Percent of Students Taking AP Exam by Grade Level 9th 6% 12th 43% 10th 19% 11th 32% Be sure to use labels and percents! 11th 12th 3. Contingency tables (aka 2-Way tables) Frosh Soph Male Female Total gender Junior grade Senior Total cells margins Identify: Row variable gender Column variable grade Values of the variable the different rows/columns Total (n) bottom right of chart # of Cells 8 (don’t count totals) Totals margins Example: Hospitals Hospital A Hospital B Died Survived 63 16 79 2037 784 2821 2100 800 What percent of people died? 79 P( D) .0272 2.72% 2900 Notation: Probability: P(event) 2900 Given/Of: P( A B) And: (overlap) Probability of A given B Of those people that went to Hospital A, what percent died? 63 P( D A) .03 3% 2100 Given that someone went to Hospital B, what is the chance that they died? 16 P( D B) .02 2% 800 Of those people who died, what percent went to Hospital A? 63 P( A D) .7975 79.75% 79 What percent of people died and went to Hospital B? 16 P( D B) .005 .5% 2900 What percent of people survived or went to Hospital A? 2837 P( S B) .9783 97.83% 2900 Or: 2 types of Distributions for Categorical Variables 1) MARGINAL DISTRIBUTIONS How to make: Convert totals into percentages Example: Hair color vs. Gender Brown Blonde Black Red Total MALE 26 24 10 3 63 FEMALE 20 35 12 6 73 TOTALs 46 59 22 9 136 - Find the marginal distribution for the HAIR COLOR variable 46 33.82% Brown: 136 59 43.38% Blonde: 136 22 16.18% Black: 136 9 6.62% Red: 136 - Find the marginal distribution for the GENDER variable 63 46.32% Male: 136 73 53.68% Female: 136 MAKE A PICTURE! BAR CHART 50 45 40 35 30 % 25 20 15 10 5 0 margins 56 54 52 50 % 48 46 44 42 Brown Blonde Black Hair Color Red Male Female Gender 2) CONDITIONAL DISTRIBUTIONS Look at … one variable Then look at … each value of the variable individually Break down … each value into its pieces ALWAYS … in % Example: Hair Color vs. Gender Brown Blonde Black Red Total MALE 26 24 10 3 63 FEMALE 20 35 12 6 73 TOTALs 46 59 22 9 136 - Find the conditional Distribution for the HAIR COLOR variable Brown: Blonde: Black: 26 24 10 56.52% 40.68% 45.45% 46 59 22 20 35 12 43.48% 59.32% 54.55% 46 59 22 Red: 3 33.33% 9 6 66.67% 9 - Find the conditional Distribution for the GENDER variable Male: Female: 26 20 41.27% 27.40% 63 73 24 35 38.10% 47.95% 63 73 10 12 15.87% 16.44% 63 73 3 6 4.76% 8.22% 63 73 Represented visually: SEGMENTED (or STACKED) BAR GRAPH o Each bar = 100% o Values of variable on the x-axis o Bars are segmented into parts of each value 100% 100% 80% 80% 60% 60% Red 40% Black % Female 40% Male 20% 0% % Blonde 20% Brown 0% Brown Blonde Black Hair Color Red Male Female Gender Independence: When one variable does not affect the other variable How do we tell independence? Independence exists when the conditional distributions looks the same throughout all values of the variable (when the sections look approximately the same). There is generally less than a 5 % difference between percentages. When categorical variables are dependent, they are said to be associated. Independent: Dependent: AP Stat- worksheet 3A- Categorical Variables practice In a survey of adult Americans, people were asked to indicate their age and to categorize their political preference (liberal, moderate, conservative). The results are as follows: Liberal Moderate Conservative Total 1. 2. 3. 4. 5. 6. under 30 83 140 73 296 30 - 50 119 280 161 560 over 50 88 284 214 586 total 290 704 448 1442 What are the row and column variables? What percent of Liberals are under 30? Of those over 50, what percent are Liberals? Of those that are moderates, what percent are 30-50? What percent of respondents are moderate and under 30? Calculate the marginal distribution for the AGE variable. Write these down. Then make a bar graph of the marginal distribution for age. 7. Calculate the marginal distribution for the PREFERENCE variable. Write these down. Then make a bar graph of this marginal distribution. 8. Calculate the conditional distribution of the AGE variable. Write these down. Then make a segmented bar graph of this marginal distribution. 9. Calculate the conditional distribution of the PREFERENCE variable. Write these down. Then make a segmented bar graph of this marginal distribution. 10. Are the two variables independent? AP Stat- worksheet 3B- Categorical Variable practice A 4-year study reported in The New York Times, on men more than 70 years old analyzed blood cholesterol and noted how many men with different cholesterol levels suffered nonfatal or fatal heart attacks. Nonfatal heart attacks Fatal heart attacks Low cholesterol Medium cholesterol High cholesterol 29 17 18 19 20 9 a. Calculate the marginal distribution for cholesterol level and make a bar graph. b. Calculate the marginal distribution for severity of heart attack and make a bar graph. c. Calculate three conditional distributions for the three levels of cholesterol and make a stacked bar graph. d. Calculate the conditional distributions for the type of heart attack and make a stacked bar graph. e. Are the two variables independent?