Lecture 3. Data Compression for Two Variables: Scatterplots, CrossTabulations, and Correlation David R. Merrell 90-786 Intermediate Empirical Methods for Public Policy and Management Lecture 3: Agenda Review of Lecture 2 Cross-Tabulations Comparison Bar Charts Parallel Box Plots Scatterplots Correlation Coefficients Review of Lecture 2 Mean or Median Models for Data Mean or Median Complaints have reached the city manager that Tardy City is taking too long to pay its bills. Data are days taken to pay seven bills: 34 27 64 31 30 26 35 Calculate the mean and median. What do you conclude? Models for Data Data = Fit + Residual Fit as a Center Mean Median Mode Example: Number of Stat Courses Taken by Students in 90-786 Bin More Frequency Cumulative % 0 1 5.26% 1 15 84.21% 2 2 94.74% 3 1 100.00% 0 100.00% Frequency Histogram 20 15 10 5 0 150.00% 100.00% Frequency 50.00% Cumulative % .00% 0 1 2 Bin 3 More Frequency 15 10 5 0 0 1 2 C1 3 Summary Statistics (Excel) Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Confidence Level(95.0%) 1.157894737 0.138140489 1 1 0.602140432 0.362573099 4.885489992 1.659166502 3 0 3 22 19 0.290222623 Summary Statistics (Minitab) Descriptive Statistics Variable N Mean Median Tr Mea StDev SE Mean C1 19 1.158 1.000 1.118 0.602 0.138 Variable C1 1.000 Min 0.000 Max 3.000 Q1 1.000 Q Measures of Error Sum of Squared Residuals 2 ) a X ( i Sum of Absolute Residuals X Mean Median Mode i a Sum Squared Residuals Sum Absolute Residuals Percent Misses 6.50 7.05 100.00 7.00 5.00 21.05 7.00 5.00 21.05 Data Compression for Two Variables...And More Two-Variable Description Cross-Tabulations Comparison Bar Charts Parallel Box Plots Scatterplots Scatterplot Matrix Correlation Coefficients Two-Variable Description Dependent Variable Independent Variable Nominal or Ordinal Interval Level of Measurement Level of Measurement Cross-tabulation Nominal or Ordinal Level of Measurement Cross-tabulation (group inteval data) Table or chart Interval Level of Measurement Scatterplot Structure of a Cross-Tabulation Dependent Variable Independent Variable Row Total Group 1 Group 2 0 a b a+b 1 c d c+d 2 e f e+f b+d+f a+b+c+ d+e+f Column Total a+c+e Street Repair Practices Study street repair practices of local government Cities and counties handle street repairs: using their own public employees exclusively by contracting out part of the work contracting out all the work Table 1. Street Repair: Counts Street Repair Practices by Type of Government: Public Employees and Contracting by Cities and Counties in the United States Type of Local Government Street Repair Practice Only Public Public and Contracting out Only Contracting out Total City No. County No. Total 966 396 172 61 1,138 457 36 8 1,398 241 44 1,639 Table 2. Street Repair: Percents Street Repair Practices by Type of Government: Public Employees and Contracting by Cities and Counties in the United States Type of Local Government Street Repair Practice City County % % 69.1% 28.3% 71.2% 25.3% 69.4% 27.9% 1,138 457 Only Contracting out 2.6% 3.3% 2.7% 44 Total 100% 1,398 100% 241 100% Only Public Public and Contracting out % Number Total % Number 1,639 Educational Achievement Residents of Allegheny County that are in labor force Random sample survey of Allegheny County residents in labor force in 199? Variables: gender and highest educational achievement Educational Achievement: Coding of Ordinal Variables 1 2 3 4 5 6 7 8 if if if if if if if if grade 4 or less grades 5-7 grade 8 high school incomplete (9-11) high school graduate (12) technical, trade, or business after high school college/ university incomplete college/university graduate or more Educational Achievement Table Education Female No. Male % No. Total % No. % 3 1 0.21% 1 0.21% 2 0.21% 4 5 6 7 8 25 173 49 76 150 5.27% 36.50% 10.34% 16.03% 31.65% 29 137 32 88 196 6.00% 28.36% 6.63% 18.22% 40.58% 54 310 81 164 346 5.64% 32.39% 8.46% 17.14% 36.15% Total 474 100.00% 483 100.00% 957 100.00% Bar Chart 45% 40% 35% 30% 25% Female Male 20% 15% 10% 5% 0% 3 4 5 6 7 8 Job Satisfaction and Income for Postal Employees Job Satisfaction Low Medium High Total Low Income Medium High 50% 30% 20% 100% (n=200) 20% 53.3% 26.7% 100% (n=150) 13.3% 20% 66.7% 100% (n=75) Five Number Summary Age of Allegheny County residents by location: individuals in labor force in 199?. Age Maximum Upper quartile Median Lower quartile Minimum Mon Valley Location Pittsburgh Other 69 45 36 27 17 71 43.5 33 26 16 77 47 37 29 16 Parallel Box Plots 80 oo 70 o o 60 50 40 30 20 10 The Mon Valley Pittsburgh Other Scatterplots Creating via Excel ChartWizard Transformation of Variables Scatterplot Matrices Scatterplot 1 $100,000 $90,000 Salary $80,000 $70,000 $60,000 $50,000 $40,000 $30,000 $20,000 $10,000 $0 0 5 10 15 Years employed 20 25 30 Scatterplot 2 $45,000 Salary $40,000 $35,000 $30,000 $25,000 $20,000 $15,000 0 5 10 15 Years employed 20 25 30 Scatterplot 3 $45,000 Salary $40,000 $35,000 $30,000 $25,000 Female employees $20,000 Male employees $15,000 0 5 10 15 Years employed 20 25 30 Scatterplot Matrix Years Salary Age Hired Correlation Coefficient, r (X r i X )(Yi Y ) S X SY Properties of r 1 r 1 r 1 data all on negatively sloping straight line r = 0 data in "shot - gun" pattern r = +1 data all on positively sloping straight line International Adoption Visas: 1991 vs 1988 r:/academic/90-786/ Chatterjee/ Adopt.dat International Adoption Visas Country 1988 1991 Africa Belize Bolivia Brazil Cambodia Canada Chile China 28 6 21 164 0 12 252 52 41 4 51 178 59 12 263 62 Etc. 1992 63 8 74 139 16 6 176 201 log 1992 International Adoption Visas 3.5 3 2.5 2 1.5 1 0.5 0 Series1 0 1 2 log 1988 3 4 8 7 log 1992 6 5 4 3 2 0 1 2 3 4 5 log 1988 6 7 8 9 Excel Calculation of r Use statistical function, correl Eliminate missing data values Identify X data Identify Y data Finish Value: r = 0.879098 (.88) Minitab Calculation of r Correlations (Pearson) Correlation of log 1988 and log 1992 = 0.873 Next Time ... Ethics and the Value of Data Social Value of Data Privacy Issues Confidentiality Applications in Health Care