Introduction to Statistics Measures of Central Tendency and Dispersion • The phrase “descriptive statistics” is used generically in place of measures of central tendency and dispersion for inferential statistics. • These statistics describe or summarize the qualities of data. • Another name is “summary statistics”, which are univariate: – Mean, Median, Mode, Range, Standard Deviation, Variance, Min, Max, etc. Measures of Central Tendency • These measures tap into the average distribution of a set of scores or values in the data. – Mean – Median – Mode What do you “Mean”? The “mean” of some data is the average score or value, such as the average age of an MPA student or average weight of professors that like to eat donuts. Inferential mean of a sample: X=(X)/n Mean of a population: =(X)/N Problem of being “mean” • The main problem associated with the mean value of some data is that it is sensitive to outliers. • Example, the average weight of political science professors might be affected if there was one in the department that weighed 600 pounds. Donut-Eating Professors Professor Weight Weight Schmuggles 165 165 Bopsey 213 213 Pallitto 189 410 Homer 187 610 Schnickerson 165 165 Levin 148 148 Honkey-Doorey 251 251 Zingers 308 308 Boehmer 151 151 Queenie 132 132 Googles-Boop 199 199 Calzone 227 227 194.6 248.3 The Median (not the cement in the middle of the road) • Because the mean average can be sensitive to extreme values, the median is sometimes useful and more accurate. • The median is simply the middle value among some scores of a variable. (no standard formula for its computation) What is the Median? Professor Weight Weight Rank order and choose middle value. Schmuggles 165 Bopsey 213 Pallitto 189 Homer 187 Schnickerson 165 Levin 148 Honkey-Doorey 251 Zingers 308 Boehmer 151 199 Queenie 132 213 Googles-Boop 199 227 Calzone 227 251 194.6 308 132 148 151 If even then average between two in the middle 165 165 187 189 Percentiles • If we know the median, then we can go up or down and rank the data as being above or below certain thresholds. • You may be familiar with standardized tests. 90th percentile, your score was higher than 90% of the rest of the sample. The Mode (hold the pie and the ala) (What does ‘ala’ taste like anyway??) • The most frequent response or value for a variable. • Multiple modes are possible: bimodal or multimodal. Figuring the Mode Professor Weight Schmuggles 165 Bopsey 213 Pallitto 189 Homer 187 Schnickerson 165 Levin 148 Honkey-Doorey 251 Zingers 308 Boehmer 151 Queenie 132 Googles-Boop 199 Calzone 227 What is the mode? Answer: 165 Important descriptive information that may help inform your research and diagnose problems like lack of variability. Measures of Dispersion (not something you cast…) • Measures of dispersion tell us about variability in the data. Also univariate. • Basic question: how much do values differ for a variable from the min to max, and distance among scores in between. We use: – Range – Standard Deviation – Variance (standard deviation squared) • To glean information from data, i.e. to make an inference, we need to see variability in our variables. • Measures of dispersion give us information about how much our variables vary from the mean, because if they don’t it makes it difficult infer anything from the data. Dispersion is also known as the spread or range of variability. The Range (no Buffalo roaming!!) • r=h–l – Where h is high and l is low • In other words, the range gives us the value between the minimum and maximum values of a variable. • Understanding this statistic is important in understanding your data, especially for management and diagnostic purposes. The Normal Curve • Bell-shaped distribution or curve • Perfectly symmetrical about the mean. Mean = median = mode • Tails are asymptotic: closer and closer to horizontal axis but never reach it. Sample Distribution • What does Andre do to the sample distribution? • What is the probability of finding someone like Andre in the population? • Are you ready for more inferential statistics? Normal curves and probability Dr. Boehmer would be here Andre would be here The Standard Deviation • A standardized measure of distance from the mean. • In other words, it allows you to know how far some cases are located from the mean. How extreme our your data? • 68% of cases fall within one standard deviation from the mean, 97% for two deviations. Formula for Standard Deviation S = 2 ( X X ) (n - 1) =square root =sum (sigma) X=score for each point in data _ X=mean of scores for the variable n=sample size (number of observations or cases X X- mean x-mean squared Smuggle 165 -29.6 Bopsey 213 18.4 Pallitto 189 -5.6 31.2 Homer 187 -7.6 57.5 Schnickerson 165 -29.6 875.2 Levin 148 -46.6 2170.0 Honkey-Doorey 251 56.4 3182.8 Zingers 308 113.4 12863.3 Boehmer 151 -43.6 1899.5 Queeny 132 -62.6 3916.7 Googles-boop 199 4.4 19.5 Calzone 227 32.4 1050.8 Mean 194.6 875.2 339.2 2480.1 49.8 We can see that the Standard Deviation equals 165.2 pounds. The weight of Zinger is still likely skewing this calculation (indirectly through the mean). Std. Deviation practice • What is the value of Democracy one std. deviation above and below the mean? Descriptive Statistics N Democ Valid N (lis twis e) 319 319 Minimum -10.00 Maximum 10.00 Mean 3.4859 Std. Deviation 6.71282 The answer is 10.20872 and -3.22692 What percentage of all the cases fall within 10.2 and 3.2? Roughly 68% Std. Deviation practice What is the value of Urban population one std. deviation above and below the mean? Descriptive Statistics N Urbanpop Valid N (lis twis e) 139 139 Minimum 19.77 Maximum 97.12 Mean 66.1166 The answer is 83.86509 and 48.36811 What percentage of all the cases fall within 83.86 and 48.36? Roughly 68% Std. Deviation 17.74849 Organizing and Graphing Data Goal of Graphing? 1. Presentation of Descriptive Statistics 2. Presentation of Evidence 3. Some people understand subject matter better with visual aids 4. Provide a sense of the underlying data generating process (scatterplots) What is the Distribution? • Gives us a picture of the variability and central tendency. • Can also show the amount of skewness and Kurtosis. Graphing Data: Types Creating Frequencies • We create frequencies by sorting data by value or category and then summing the cases that fall into those values. • How often do certain scores occur? This is a basic descriptive data question. Ranking of Donut-eating Profs. (most to least) Zingers 308 Honkey-Doorey 251 Calzone 227 Bopsey 213 Googles-boop 199 Pallitto 189 Homer 187 Schnickerson 165 Smuggle 165 Boehmer 151 Levin 148 Queeny 132 Here we have placed the Professors into weight classes and depict with a histogram in columns. Weight Class Intervals of Donut-Munching Professors 3.5 3 2.5 2 Number 1.5 1 0.5 0 130-150 151-185 186-210 211-240 241-270 271-310 311+ Here it is another histogram depicted as a bar graph. Weight Class Intervals of Donut-Munching Professors 311+ 271-310 241-270 211-240 Number 186-210 151-185 130-150 0 0.5 1 1.5 2 2.5 3 3.5 Pie Charts: Proportions of Donut-Eating Professors by Weight Class 130-150 151-185 186-210 211-240 241-270 271-310 311+ Actually, why not use a donut graph. Duh! Proportions of Donut-Eating Professors by Weight Class 130-150 151-185 186-210 211-240 241-270 271-310 311+ See Excel for other options!!!! 19 81 19 82 19 83 19 84 19 85 19 86 19 87 19 88 19 89 19 90 19 91 19 92 19 93 19 94 19 95 19 96 19 97 19 98 19 99 20 00 20 01 Approval Line Graphs: A Time Series 100 90 80 Approval 70 60 50 40 30 20 Economic approval 10 0 Month Scatter Plot (Two variable) Presidential Approval and Unemployment 100 Approval 80 60 Approve 40 20 0 0 2 4 6 Unemployment 8 10 12