Chapter 1 Categorical versus Measurement Data BPS - 5th Ed. Chapter 1 1 Individuals and Variables Individuals – the objects described by a set of data – may be people, animals, or things Variable – any characteristic of an individual – can take different values for different individuals BPS - 5th Ed. Chapter 1 2 Variables Categorical (Qualitative) – Places an individual into one of several groups or categories Quantitative (Measurement) – Takes numerical values for which arithmetic operations such as adding and averaging make sense BPS - 5th Ed. Chapter 1 3 Case Study The Effect of Hypnosis on the Immune System reported in Science News, Sept. 4, 1993, p. 153 BPS - 5th Ed. Chapter 1 4 Case Study 65 college students. – 33 easily hypnotized – 32 not easily hypnotized white BPS - 5th Ed. blood cell counts measured Chapter 1 5 Case Study Students randomly assigned to one of three conditions – subjects hypnotized, given mental exercise – subjects relaxed in sensory deprivation tank – control group (no treatment) BPS - 5th Ed. Chapter 1 6 A-Measurement or B-Categorical? 1. 2. 3. 4. Pre-study white blood cell count Group assignment Easy or difficult to achieve hypnotic trance Post-study white blood cell count BPS - 5th Ed. Chapter 1 7 Case Study Variables measured categorical quantitative BPS - 5th Ed. Easy or difficult to achieve hypnotic trance Group assignment Pre-study white blood cell count Post-study white blood cell count Chapter 1 8 Case Study Weight Gain Spells Heart Risk for Women “Weight, weight change, and coronary heart disease in women.” W.C. Willett, et. al., vol. 273(6), Journal of the American Medical Association, Feb. 8, 1995. (Reported in Science News, Feb. 4, 1995, p. 108) BPS - 5th Ed. Chapter 1 9 Case Study Study started in 1976 with 115,818 women aged 30 to 55 years and without a history of previous CHD. Each woman’s weight (body mass) was determined. Each woman was asked her weight at age 18. BPS - 5th Ed. Chapter 1 10 Case Study The cohort of women were followed for 14 years. The number of CHD (fatal and nonfatal) cases were counted (1292 cases). BPS - 5th Ed. Chapter 1 11 A-Measurement or B-Categorical? Incidence (count) of coronary heart disease 6. Family history of heart disease 7. Weight in 1976 8. Smoker or nonsmoker 9. Age (in 1976) 10. Weight at age 18 5. BPS - 5th Ed. Chapter 1 12 Case Study Variables measured quantitative categorical BPS - 5th Ed. Age (in 1976) Weight in 1976 Weight at age 18 Incidence of coronary heart disease Smoker or nonsmoker Family history of heart disease Chapter 1 13 Distribution Tells what values a variable takes and how often it takes these values Can be a table, graph, or function BPS - 5th Ed. Chapter 1 14 Displaying Distributions Categorical variables – Pie charts – Bar graphs Quantitative variables – Histograms – Stemplots (stem-and-leaf plots) BPS - 5th Ed. Chapter 1 15 Class Make-up on First Day Data Table Year Count Percent Freshman 18 41.9% Sophomore 10 23.3% Junior 6 14.0% Senior 9 20.9% Total 43 100.1% BPS - 5th Ed. Chapter 1 16 11. A-Measurement or B-Categorical? Pie Chart Senior 20.9% Freshman 41.9% Junior 14.0% Sophomore 23.3% BPS - 5th Ed. Chapter 1 17 12. A-Measurement or B-Categorical? 45.0% 41.9% Bar Graph 40.0% 35.0% Percent 30.0% 23.3% 25.0% 20.9% 20.0% 14.0% 15.0% 10.0% 5.0% 0.0% Freshman Sophomore Junior Senior Year in School BPS - 5th Ed. Chapter 1 18 13. A-Measurement or B-Categorical? 14. A-Measurement or B-Categorical? Data Table Material Weight (million tons) Percent of total Food scraps 25.9 11.2 % Glass 12.8 5.5 % Metals 18.0 7.8 % Paper, paperboard 86.7 37.4 % Plastics 24.7 10.7 % Rubber, leather, textiles 15.8 6.8 % Wood 12.7 5.5 % Yard trimmings 27.7 11.9 % Other 7.5 3.2 % Total 231.9 100.0 % BPS - 5th Ed. Chapter 1 20 15. A-Measurement or B-Categorical? Pie Chart BPS - 5th Ed. Chapter 1 21 16. A-Measurement or B-Categorical? 17. A-Measurement or B-Categorical? 18. A-Measurement or B-Categorical? 19. A-Measurement or B-Categorical? Bar Graph BPS - 5th Ed. Chapter 1 25 20. A-Measurement or B-Categorical? kaitlin s michelle a Katrina megan Rachel laquanda eric katie tracy allie Q tiffany kristina christain amanda shannon latoya aona BPS - 5th Ed. 68.3 70.3 70.7 71.0 71.0 71.3 71.7 72.0 72.7 73.0 74.0 74.0 74.3 74.7 76.7 76.7 77.0 81.3 84.3 Chapter 1 26 The End BPS - 5th Ed. Chapter 1 27 Examining the Distribution of Quantitative Data Overall pattern of graph Deviations from overall pattern Shape of the data Center of the data Spread of the data (Variation) Outliers BPS - 5th Ed. Chapter 1 28 Shape of the Data Symmetric – bell shaped – other symmetric shapes Asymmetric – right skewed – left skewed Unimodal, BPS - 5th Ed. bimodal Chapter 1 29 Symmetric Bell-Shaped BPS - 5th Ed. Chapter 1 30 Symmetric Mound-Shaped BPS - 5th Ed. Chapter 1 31 Symmetric Uniform BPS - 5th Ed. Chapter 1 32 Asymmetric Skewed to the Left BPS - 5th Ed. Chapter 1 33 Asymmetric Skewed to the Right BPS - 5th Ed. Chapter 1 34 Outliers Extreme values that fall outside the overall pattern – May occur naturally – May occur due to error in recording – May occur due to error in measuring – Observational unit may be fundamentally different BPS - 5th Ed. Chapter 1 35 Histograms For quantitative variables that take many values Divide the possible values into class intervals (we will only consider equal widths) Count how many observations fall in each interval (may change to percents) Draw picture representing distribution BPS - 5th Ed. Chapter 1 36 Histograms: Class Intervals How many intervals? – One rule is to calculate the square root of the sample size, and round up. Size of intervals? – Divide range of data (maxmin) by number of intervals desired, and round to convenient number Pick intervals so each observation can only fall in exactly one interval (no overlap) BPS - 5th Ed. Chapter 1 37 Case Study Weight Data Introductory Statistics class Spring, 1997 Virginia Commonwealth University BPS - 5th Ed. Chapter 1 38 Weight Data 192 152 135 110 128 180 260 170 165 150 BPS - 5th Ed. 110 120 185 165 212 119 165 210 186 100 195 170 120 185 175 203 185 123 139 106 Chapter 1 180 130 155 220 140 157 150 172 175 133 170 130 101 180 187 148 106 180 127 124 215 125 194 39 Weight Data: Frequency Table Weight Group 100 - <120 120 - <140 140 - <160 160 - <180 180 - <200 200 - <220 220 - <240 240 - <260 260 - <280 Count 7 12 7 8 12 4 1 0 1 sqrt(53) = 7.2, or 8 intervals; range (260100=160) / 8 = 20 = class width BPS - 5th Ed. Chapter 1 40 Weight Data: Histogram 14 Number of students 12 10 8 6 Frequency 4 2 0 100 120 140 160 180 200 Weight 220 240 260 280 * Left endpoint is included in the group, right endpoint is not. BPS - 5th Ed. Chapter 1 41 Stemplots (Stem-and-Leaf Plots) For quantitative variables Separate each observation into a stem (first part of the number) and a leaf (the remaining part of the number) Write the stems in a vertical column; draw a vertical line to the right of the stems Write each leaf in the row to the right of its stem; order leaves if desired BPS - 5th Ed. Chapter 1 42 1 2 192 152 135 110 128 180 260 170 165 150 BPS - 5th Ed. Weight Data 110 120 185 165 212 119 165 210 186 100 195 170 120 185 175 203 185 123 139 106 Chapter 1 180 130 155 220 140 157 150 172 175 133 170 130 101 180 187 148 106 180 127 124 215 125 194 43 Weight Data: Stemplot (Stem & Leaf Plot) Key 20|3 means 203 pounds Stems = 10’s Leaves = 1’s BPS - 5th Ed. Chapter 1 10 11 12 13 5 14 15 2 16 17 18 19 2 20 21 22 23 24 25 26 192 152 135 44 Weight Data: Stemplot (Stem & Leaf Plot) Key 20|3 means 203 pounds Stems = 10’s Leaves = 1’s BPS - 5th Ed. Chapter 1 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 0166 009 0034578 00359 08 00257 555 000255 000055567 245 3 025 0 0 45 Extended Stem-and-Leaf Plots If there are very few stems (when the data cover only a very small range of values), then we may want to create more stems by splitting the original stems. BPS - 5th Ed. Chapter 1 46 Extended Stem-and-Leaf Plots Example: if all of the data values were between 150 and 179, then we may choose to use the following stems: 15 15 16 16 17 17 BPS - 5th Ed. Leaves 0-4 would go on each upper stem (first “15”), and leaves 5-9 would go on each lower stem (second “15”). Chapter 1 47 Time Plots A time plot shows behavior over time. Time is always on the horizontal axis, and the variable being measured is on the vertical axis. Look for an overall pattern (trend), and deviations from this trend. Connecting the data points by lines may emphasize this trend. Look for patterns that repeat at known regular intervals (seasonal variations). BPS - 5th Ed. Chapter 1 48 Class Make-up on First Day (Fall Semesters: 1985-1993) Class Make-up On First Day 70% 60% Percent of Class That Are Freshman 50% 40% 30% 20% 10% 0% 1985 1986 1987 1988 1989 1990 1991 1992 1993 Year of Fall Semester BPS - 5th Ed. Chapter 1 49 Average Tuition (Public vs. Private) BPS - 5th Ed. Chapter 1 50