DID YOU SIGN UP FOR MY STAT LAB? 1. 2. Yes No 0% 1 0% 2 Slide 3- 1 ANNOUNCEMENTS Homework #1 due Sunday at 10:00 pm Quiz #1 in class August 28th Part 1 of the Data Project due September 4th DATA PROJECT Objective: Ask a question and try to answer it using statistics. Step 1: DATA COLLECTION - Due Wednesday September 4th in class. Step 2: DESCRIPTION OF DATA – Due Monday September 16th in class Step 3: QUESTIONS – Due Monday October 28th in class Step 4: FINAL DATA PROJECT – Due by Thursday December 5th 5PM COLLECT DATA Bureau of Labor Statistics (BLS): http://bls.gov/ Energy Information Administration (EIA): http://www.eia.gov/ Bureau of Economic Analysis (BEA): http://www.bea.gov/ Environmental Protection Agency (EPA): http://epa.gov/ U.S. Census Bureau: http://www.census.gov/ Google Data http://www.google.com/publicdata/directory REVIEW FROM LAST CLASS A categorical (or qualitative) variable names categories and answers questions about how cases fall into those categories. A quantitative variable is a measured variable (with units) that answers questions about the quantity of what is being measured. Quantitative examples: income ($), height (inches), weight (pounds) Slide 3- 5 REVIEW FROM LAST CLASS Ordinal variables there are no natural units for the variable interest in teaching, but the order of the number reveals information. Identifier variables are categorical variables with exactly one individual in each category. Slide 3- 6 HOMEWORK PROBLEM We want to study the law of demand and if it applies to hot dogs. Compile a list of 20 hotdogs, giving the brand, price, size in ounces, type (beef, pork, turkey, vegetarian), and overall taste rating (good, fair, bad). Implement the survey on Monday and Wednesday at 5 different grocery stores and also collect the daily sales. WHAT TYPE OF VARIABLE IS BRAND? 1. 2. 3. 4. Categorical Quantitative Ordinal Identifier 0% 0% 0% 0%Slide 3- 8 1 2 3 4 WHAT TYPE OF VARIABLE IS PRICE? 1. 2. 3. 4. Categorical Quantitative Ordinal Identifier 0% 0% 0% 0%Slide 3- 9 1 2 3 4 WHAT TYPE OF VARIABLE IS OVERALL TASTE RATING (GOOD, FAIR, BAD)? 1. 2. 3. 4. Categorical Quantitative Ordinal Identifier 0% 0% 0% 0%Slide 3- 10 1 2 3 4 WHAT TYPE OF VARIABLE IS DAILY SALES? 1. 2. 3. 4. Categorical Quantitative Ordinal Identifier 0% 1 0% 2 0% 3 0% 4 WHERE, WHEN, AND HOW When and Where give us some nice information about the context. Example: Values recorded at a large public university may mean something different than similar values recorded at a small private college. Slide 2- 12 WHERE, WHEN, AND HOW Class Grade of Econ 101 classes. Class 1 – 2.56 Class 2 – 3.34 Where – Washington State University When – during the fall and spring semesters WHERE, WHEN, AND HOW (CONT.) How the data are collected can make the difference between insight and nonsense. Example: results from voluntary Internet surveys are often useless Example: Data collection of ‘Who will win Republican Primary?’ Survey ISU students on campus Run a Facebook survey Rasmussen Reports national telephone survey IDENTIFY THE WHO IN THE FOLLOWING DATASET? Are physically fit people less likely to die of cancer? Suppose an article in a sports medicine journal reported results of a study that followed 22,563 men aged 30 to 87 for 5 years. The physically fit men had a 57% lower risk of death from cancer than the least fit group. WHO ARE THEY STUDYING? 1. 2. 3. 4. The cause of death for 22,563 men in the study 25% 25% 25% 25% The fitness level of the 22,563 men in the study The age of each of the 22,563 men in the study The 22,563 men in the study 1 2 3 4 ARE FIT PEOPLE LESS LIKELY TO DIE OF CANCER? -------------WHO IS THE POPULATION OF INTEREST? 1. 2. 3. 4. All people All men who exercise All men who die of cancer All men 25% 1 25% 25% 2 3 25% 4 CHAPTER 3 Displaying and Describing Categorical Data Two datasets Students currently in my class Passengers on the Titanic. METHODS OF DISPLAYING DATA Frequency Table Relative Frequency table Bar Chart Relative Frequency bar chart Pie Chart Contingency table Contingency tables and Conditional Distributions Segmented Bar charts Slide 3- 19 DATA ON STUDENTS Gender Year in School Major My Class Kim B. Female Sr. Elem. Ed. ECO 138 Section 1 Stacie M. Female So. Math ECO 138 Section 1 Tom A. Male Gr. Econ ECO 435 Section 1 Tim B. Male Gr. Econ ECO 435 Section 1 Kelly Y. Male Gr. Econ ECO 435 Section 2 … Slide 3- 20 FREQUENCY TABLES: MAKING PILES We can “pile” the data by counting the number of data values in each category of interest. We can organize these counts into a frequency table, which records the totals and the category names. ECO 138 Male 22 Female 11 Total 33 FREQUENCY TABLES: MAKING PILES (CONT.) A relative frequency table is similar, but gives the percentages (instead of counts) for each category. ECO 138 Male 22 / 33 * 100 = 66.67% Female 11 / 33 * 100 = 33.33% Total 33/33 * 100 = 100 % BAR CHARTS A bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison. A bar chart stays true to the area principle. Thus, a better display for the ship data is: Slide 3- 23 BAR CHARTS (CONT.) A relative frequency bar chart displays the relative proportion of counts for each category. A relative frequency bar chart also stays true to the area principle. Replacing counts with percentages in the ship data: Slide 3- 24 WHAT YEAR IN SCHOOL ARE YOU? 1. 2. 3. 4. Freshman Sophomore Junior Senior 61% 17% 17% 6% Slide 3- 25 1 2 3 4 PIE CHARTS When you are interested in parts of the whole, a pie chart might be your display of choice. Pie charts show the whole group of cases as a circle. They slice the circle into pieces whose size is proportional to the fraction of the whole in each category. Slide 3- 26 METHODS OF DISPLAYING DATA Frequency Table (How much?) Relative Frequency table (What percentage?) Bar Chart (How much?) Relative Frequency bar chart (What percentage?) Pie Chart (What percentage? Or How much?) Contingency table and Marginal Distributions Contingency tables and Conditional Distributions Slide 3- 27 CONTINGENCY TABLES A contingency table allows us to look at two categorical variables together. It shows how individuals are distributed along each variable, contingent on the value of the other variable. Example: we can examine the class of ticket and whether a person survived the Titanic: Slide 3- 28 CONTINGENCY TABLES (CONT.) Each cell of the table gives the count for a combination of values of the two values. For example, the second cell in the crew column tells us that 673 crew members died when the Titanic sunk. Slide 3- 29 CONTINGENCY TABLES The two variables in this contingency table are gender and class/section number. ECO 138 – Section 1 ECO 435Section 1 Total Male 22 4 26 Female 11 3 14 Total 33 7 40 CONTINGENCY TABLES (CONT.) The margins of the table, both on the right and on the bottom, give totals and the frequency distributions for each of the variables. Each frequency distribution is called a marginal distribution of its respective variable. Slide 3- 31 MARGINAL DISTRIBUTIONS The two variables in this contingency table are gender and class/section number. ECO 138 – Section 1 ECO 435Section 1 Total Male 22/40*100= 55% 4/40*100= 10% 26/40*100=65 % Female 11/40*100= 27.5% 3/40*100= 7.5% 14/40*100=35 % Total 33/40*100= 82.5 7/40*100= 17.5% 40/40*100= 100% CONDITIONAL DISTRIBUTIONS A conditional distribution shows the distribution of one variable for just the individuals who satisfy some condition on another variable. The following is the conditional distribution of ticket Class, conditional on having survived: CONDITIONAL DISTRIBUTIONS (CONT.) The following is the conditional distribution of ticket Class, conditional on having perished: Slide 3- 34 CONDITIONAL DISTRIBUTIONS – CONDITIONED UPON GENDER The two variables in this contingency table are gender and class/section number. Male ECO 138 – Section 1 ECO 435Section 1 Total 22/26*100= 84.6% 4/26*100= 15.4% 26/26*100= 100% CONDITIONAL DISTRIBUTIONS – CONDITIONED UPON GENDER The two variables in this contingency table are gender and class/section number. Female ECO 138 – Section 1 ECO 435Section 1 Total 11/14*100= 78.6% 3/14*100= 21.4% 14/14*100= 100% CONDITIONAL DISTRIBUTIONS – CONDITIONED UPON CLASS The two variables in this contingency table are gender and class/section number. ECO 138 – Section 1 Male 22/33*100= 66.7% Female 11/33*100= 33.3% Total 33/33*100= 100% CONDITIONAL DISTRIBUTIONS – CONDITIONED UPON CLASS The two variables in this contingency table are gender and class/section number. ECO 435Section 1 Male 4/7*100= 57.1% Female 3/7*100= 42.9% Total 7/7*100= 100% WHAT CAN GO WRONG? (CONT.) Don’t confuse similar-sounding percentages—pay particular attention to the wording of the context. The percentage of students that are female & in ECO 138 Section 1 The percentage of females that are in ECO 138 Section 1 (cell distribution) (conditioned upon females) The percentage of ECO 138 Section 1 students that are females (conditioned upon ECO 138 Section 1) CONDITIONAL DISTRIBUTIONS (CONT.) The conditional distributions tell us that there is a difference in class for those who survived and those who perished. This is better shown with pie charts of the two distributions: Slide 3- 40 IF YOU ARE MALE, WHAT YEAR IN SCHOOL ARE YOU? 1. 2. 3. 4. Fr. So. Jr. Sr. Slide 3- 41 IF YOU ARE FEMALE, WHAT YEAR IN SCHOOL ARE YOU? 1. 2. 3. 4. Fr. So. Jr. Sr. Slide 3- 42 CONDITIONAL DISTRIBUTIONS (CONT.) We see that the distribution of Class/Section for the male is different from that of the female. This leads us to believe that Class/Section and Gender are associated, that they are not independent. The variables would be considered independent when the distribution of one variable in a contingency table is the same for all categories of the other variable. SEGMENTED BAR CHARTS A segmented bar chart displays the same information as a pie chart, but in the form of bars instead of circles. Here is the segmented bar chart for ticket Class by Survival status: Slide 3- 44 Slide 3- 45 Slide 3- 46 Slide 3- 47 Slide 3- 48 Slide 3- 49 Slide 3- 50 Slide 3- 51 WHICH OF THE COMPARISONS DO YOU CONSIDER MOST VALID? 1. 2. 3. Overall average, b/c it does not differentiate 93% between the four programs. Individual program comparisons, b/c they take into account the different number of applicants and admission rates for each of the four programs. Overall average, b/c it takes into account the differences in number of applicants and admission rates for each of the four programs. 7% 1 0% 2 3 DID YOU SIGN UP FOR MYLAB AND WHAT IS YOUR GENDER? 1. 2. 3. 4. Female – Yes Female – No Male – Yes Male – No 31% 31% 23% 15% Slide 3- 53 1 2 3 4 NEXT TIME… Chapter 4 – Displaying Quantitative Data Slide 3- 54