DID YOU SIGN UP FOR MY STAT LAB? 1. 2. Yes No 0% 1 0% 2 Slide 3- 1 ANNOUNCEMENTS Homework #1 due Sunday at 11:59pm Quiz #1 in class Jan. 24th Part 1 of the Data Project due Jan. 31st REVIEW FROM LAST CLASS A categorical (or qualitative) variable names categories and answers questions about how cases fall into those categories. A quantitative variable is a measured variable (with units) that answers questions about the quantity of what is being measured. Quantitative examples: income ($), height (inches), weight (pounds) Slide 3- 3 REVIEW FROM LAST CLASS Ordinal variables there are no natural units for the variable interest in teaching, but the order of the number reveals information. Identifier variables are categorical variables with exactly one individual in each category. Slide 3- 4 HOMEWORK PROBLEM We want to study the law of demand and if it applies to hot dogs. Compile a list of 20 hotdogs, giving the brand, price, size in ounces, type (beef, pork, turkey, vegetarian), and overall taste rating (good, fair, bad). Implement the survey on Monday and Wednesday at 5 different grocery stores and also collect the daily sales. WHAT TYPE OF VARIABLE IS BRAND? 1. 2. 3. 4. Categorical Quantitative Ordinal Identifier 0% 0% 0% 0%Slide 3- 6 1 2 3 4 WHAT TYPE OF VARIABLE IS PRICE? 1. 2. 3. 4. Categorical Quantitative Ordinal Identifier 0% 0% 0% 0%Slide 3- 7 1 2 3 4 WHAT TYPE OF VARIABLE IS OVERALL TASTE RATING (GOOD, FAIR, BAD)? 1. 2. 3. 4. Categorical Quantitative Ordinal Identifier 0% 0% 0% 0%Slide 3- 8 1 2 3 4 WHAT TYPE OF VARIABLE IS DAILY SALES? 1. 2. 3. 4. Categorical Quantitative Ordinal Identifier 0% 1 0% 2 0% 3 0% 4 CHAPTER 3 Displaying and Describing Categorical Data Two datasets Students currently in my class Passengers on the Titanic. METHODS OF DISPLAYING DATA Frequency Table Relative Frequency table Bar Chart Relative Frequency bar chart Pie Chart Contingency table Contingency tables and Conditional Distributions Segmented Bar charts Slide 3- 11 DATA ON STUDENTS Gender Year in School Major My Class Kim B. Female Sr. Elem. Ed. ECO 138 Section 1 Stacie M. Female So. Math ECO 138 Section 1 Tom A. Male Gr. Econ ECO 435 Section 1 Tim B. Male Gr. Econ ECO 435 Section 1 Kelly Y. Male Gr. Econ ECO 435 Section 2 … Slide 3- 12 FREQUENCY TABLES: MAKING PILES We can “pile” the data by counting the number of data values in each category of interest. We can organize these counts into a frequency table, which records the totals and the category names. ECO 138 Male 22 Female 11 Total 33 FREQUENCY TABLES: MAKING PILES (CONT.) A relative frequency table is similar, but gives the percentages (instead of counts) for each category. ECO 138 Male 22 / 33 * 100 = 66.67% Female 11 / 33 * 100 = 33.33% Total 33/33 * 100 = 100 % BAR CHARTS A bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison. A bar chart stays true to the area principle. Thus, a better display for the ship data is: Slide 3- 15 BAR CHARTS (CONT.) A relative frequency bar chart displays the relative proportion of counts for each category. A relative frequency bar chart also stays true to the area principle. Replacing counts with percentages in the ship data: Slide 3- 16 WHAT YEAR IN SCHOOL ARE YOU? 1. 2. 3. 4. Freshman Sophomore Junior Senior 61% 17% 17% 6% Slide 3- 17 1 2 3 4 PIE CHARTS When you are interested in parts of the whole, a pie chart might be your display of choice. Pie charts show the whole group of cases as a circle. They slice the circle into pieces whose size is proportional to the fraction of the whole in each category. Slide 3- 18 METHODS OF DISPLAYING DATA Frequency Table (How much?) Relative Frequency table (What percentage?) Bar Chart (How much?) Relative Frequency bar chart (What percentage?) Pie Chart (What percentage? Or How much?) Contingency table and Marginal Distributions Contingency tables and Conditional Distributions Slide 3- 19 CONTINGENCY TABLES A contingency table allows us to look at two categorical variables together. It shows how individuals are distributed along each variable, contingent on the value of the other variable. Example: we can examine the class of ticket and whether a person survived the Titanic: Slide 3- 20 CONTINGENCY TABLES (CONT.) Each cell of the table gives the count for a combination of values of the two values. For example, the second cell in the crew column tells us that 673 crew members died when the Titanic sunk. Slide 3- 21 CONTINGENCY TABLES The two variables in this contingency table are gender and class/section number. ECO 138 – Section 1 ECO 435Section 1 Total Male 22 4 26 Female 11 3 14 Total 33 7 40 CONTINGENCY TABLES (CONT.) The margins of the table, both on the right and on the bottom, give totals and the frequency distributions for each of the variables. Each frequency distribution is called a marginal distribution of its respective variable. Slide 3- 23 MARGINAL DISTRIBUTIONS The two variables in this contingency table are gender and class/section number. ECO 138 – Section 1 ECO 435Section 1 Total Male 22/40*100= 55% 4/40*100= 10% 26/40*100=65 % Female 11/40*100= 27.5% 3/40*100= 7.5% 14/40*100=35 % Total 33/40*100= 82.5 7/40*100= 17.5% 40/40*100= 100% CONDITIONAL DISTRIBUTIONS A conditional distribution shows the distribution of one variable for just the individuals who satisfy some condition on another variable. The following is the conditional distribution of ticket Class, conditional on having survived: CONDITIONAL DISTRIBUTIONS (CONT.) The following is the conditional distribution of ticket Class, conditional on having perished: Slide 3- 26 CONDITIONAL DISTRIBUTIONS – CONDITIONED UPON GENDER The two variables in this contingency table are gender and class/section number. Male ECO 138 – Section 1 ECO 435Section 1 Total 22/26*100= 84.6% 4/26*100= 15.4% 26/26*100= 100% CONDITIONAL DISTRIBUTIONS – CONDITIONED UPON GENDER The two variables in this contingency table are gender and class/section number. Female ECO 138 – Section 1 ECO 435Section 1 Total 11/14*100= 78.6% 3/14*100= 21.4% 14/14*100= 100% CONDITIONAL DISTRIBUTIONS – CONDITIONED UPON CLASS The two variables in this contingency table are gender and class/section number. ECO 138 – Section 1 Male 22/33*100= 66.7% Female 11/33*100= 33.3% Total 33/33*100= 100% CONDITIONAL DISTRIBUTIONS – CONDITIONED UPON CLASS The two variables in this contingency table are gender and class/section number. ECO 435Section 1 Male 4/7*100= 57.1% Female 3/7*100= 42.9% Total 7/7*100= 100% WHAT CAN GO WRONG? (CONT.) Don’t confuse similar-sounding percentages—pay particular attention to the wording of the context. The percentage of students that are female & in ECO 138 Section 1 The percentage of females that are in ECO 138 Section 1 (cell distribution) (conditioned upon females) The percentage of ECO 138 Section 1 students that are females (conditioned upon ECO 138 Section 1) CONDITIONAL DISTRIBUTIONS (CONT.) The conditional distributions tell us that there is a difference in class for those who survived and those who perished. This is better shown with pie charts of the two distributions: Slide 3- 32 IF YOU ARE MALE, WHAT YEAR IN SCHOOL ARE YOU? 1. 2. 3. 4. Fr. So. Jr. Sr. Slide 3- 33 IF YOU ARE FEMALE, WHAT YEAR IN SCHOOL ARE YOU? 1. 2. 3. 4. Fr. So. Jr. Sr. Slide 3- 34 CONDITIONAL DISTRIBUTIONS (CONT.) We see that the distribution of Class/Section for the male is different from that of the female. This leads us to believe that Class/Section and Gender are associated, that they are not independent. The variables would be considered independent when the distribution of one variable in a contingency table is the same for all categories of the other variable. SEGMENTED BAR CHARTS A segmented bar chart displays the same information as a pie chart, but in the form of bars instead of circles. Here is the segmented bar chart for ticket Class by Survival status: Slide 3- 36 Slide 3- 37 Slide 3- 38 Slide 3- 39 Slide 3- 40 Slide 3- 41 Slide 3- 42 Slide 3- 43 WHICH OF THE COMPARISONS DO YOU CONSIDER MOST VALID? 1. 2. 3. Overall average, b/c it does not differentiate 93% between the four programs. Individual program comparisons, b/c they take into account the different number of applicants and admission rates for each of the four programs. Overall average, b/c it takes into account the differences in number of applicants and admission rates for each of the four programs. 7% 1 0% 2 3 DID YOU SIGN UP FOR MYLAB AND WHAT IS YOUR GENDER? 1. 2. 3. 4. Female – Yes Female – No Male – Yes Male – No 31% 31% 23% 15% Slide 3- 45 1 2 3 4 NEXT TIME… Chapter 4 – Displaying Quantitative Data Slide 3- 46