+ Chapter 1: Exploring Data Section 1.1 Analyzing Categorical Data The values of a categorical variable are labels for the different categories The distribution of a categorical variable lists the count or percent of individuals who fall into each category. Example, page 8 Frequency Table Format Variable Values Relative Frequency Table Count of Stations Format Percent of Stations Adult Contemporary 1556 Adult Contemporary Adult Standards 1196 Adult Standards 8.6 Contemporary Hit 4.1 Contemporary Hit 569 11.2 Country 2066 Country 14.9 News/Talk 2179 News/Talk 15.7 Oldies 1060 Oldies Religious 2014 Religious Rock 869 Spanish Language 750 Other Formats Total 1579 13838 7.7 14.6 Rock 6.3 Count Spanish Language 5.4 Other Formats 11.4 Total 99.9 Percent Analyzing Categorical Data Variables place individuals into one of several groups or categories + Categorical + categorical data Frequency tables can be difficult to read. Sometimes it is easier to analyze a distribution by displaying it with a bar graph or pie chart. Frequency Table Format Relative Frequency Table Count of Stations Format Percent of Stations Adult Contemporary 1556 Adult Contemporary Adult Standards 1196 Adult Standards 8.6 Contemporary Hit 4.1 Contemporary Hit 569 11.2 Country 2066 Country 14.9 News/Talk 2179 News/Talk 15.7 Oldies 1060 Oldies Religious 2014 Religious 7.7 14.6 Rock 869 Rock 6.3 Spanish Language 750 Spanish Language 5.4 Other Formats Total 1579 13838 Other Formats 11.4 Total 99.9 Analyzing Categorical Data Displaying Good and Bad Our eyes react to the area of the bars as well as height. Be sure to make your bars equally wide. Analyzing Categorical Data Bar graphs compare several quantities by comparing the heights of bars that represent those quantities. + Graphs: Good and Bad Avoid the temptation to replace the bars with pictures for greater appeal. Analyzing Categorical Data Bar graphs compare several quantities by comparing the heights of bars that represent those quantities. + Graphs: Good and Bad Bar chart should always start at zero. Analyzing Categorical Data Bar graphs compare several quantities by comparing the heights of bars that represent those quantities. + Graphs: When a dataset involves two categorical variables, we begin by examining the counts or percents in various categories for one of the variables. Definition: Two-way Table – describes two categorical variables, organizing counts according to a row variable and a column variable. What are the variables described by this twoway table? How many young adults were surveyed? + Tables and Marginal Distributions Analyzing Categorical Data Two-Way Definition: The Marginal Distribution of one of the categorical variables in a two-way table of counts is the distribution of values of that variable among all individuals described by the table. Note: Percents are often more informative than counts, especially when comparing groups of different sizes. To examine a marginal distribution, 1)Use the data in the table to calculate the marginal distribution (in percents) of the row or column totals. 2)Make a graph to display the marginal distribution. + Tables and Marginal Distributions Analyzing Categorical Data Two-Way Examine the marginal distribution of chance of getting rich. Chance of being wealthy by age 30 Response Percent Almost no chance 194/4826 = 4.0% Some chance 712/4826 = 14.8% + Tables and Marginal Distributions Analyzing Categorical Data Two-Way 35 30 A good chance Almost certain 1416/4826 = 29.3% 1421/4826 = 29.4% 1083/4826 = 22.4% Percent 25 A 50-50 chance 20 15 10 5 0 Almost none Some chance 50-50 chance Survey Response Good chance Almost certain + Check Your Understanding page 14 Marginal distributions tell us nothing about the relationship between two variables. Response Percent Almost no chance 194/4826 = 4.0% Some chance 712/4826 = 14.8% A 50-50 chance 1416/4826 = 29.3% A good chance 1421/4826 = 29.4% Almost certain 1083/4826 = 22.4% + Between Categorical Variables Analyzing Categorical Data Relationships Marginal distributions tell us nothing about the relationship between two variables. Definition: A Conditional Distribution of a variable describes the values of that variable among individuals who have a specific value of another variable. To examine or compare conditional distributions, 1)Select the row(s) or column(s) of interest. 2)Use the data in the table to calculate the conditional distribution (in percents) of the row(s) or column(s). 3)Make a graph to display the conditional distribution. • Use a side-by-side bar graph or segmented bar graph to compare distributions. + Between Categorical Variables Analyzing Categorical Data Relationships + Tables and Conditional Distributions Analyzing Categorical Data Two-Way Calculate the conditional distribution of opinion among males. Examine the relationship between gender and opinion. Chance by age Chanceofofbeing being wealthy wealthy by age 303030 Chance being wealthy by age 100% Response Male Female Almost no chance 98/2459 = 4.0% 96/2367 = 4.1% 286/2459 = 11.6% 426/2367 =35 18.0% 30 720/2459 = 29.3% 696/2367 = 20 29.4% 15 758/2459 = 30.8% 663/2367 =10 28.0% 5 A 50-50 chance A good chance Almost certain 597/2459 = 24.3% 80% Percent Percent Percent Some chance 90% 25 15 486/2367 = 20.5% 0 70% 60% 50% 40% Almost no chance Some chance 30% 50-50 chance 20% Almost no chance Almost chance Male Some chance Some chance 50-50 chance 50-50 chance 10% 0% Males Females Good chance Good chance Good chance s Males Almost certain Femal es Almost certain Almost certain Can we say there I san association between gender and opinion in the population of young adults? Making this determination requires formal inference, which will have to wait a few chapters. + Tables and Conditional Distributions Analyzing Categorical Data Two-Way No association mean the conditional distributions of opinion about becoming rich would be the same for males and females. + Check Your Understanding page 18 12, 19, 21 + Section 1.1 Homework, pages 20 - 24