Name that tune. Song title? Performer(s)? R.G. Bias | rbias@ischool.utexas.edu | 1 Descriptive Statistics “Finding New Information” 3/23/2011 R.G. Bias | rbias@ischool.utexas.edu | 2 Standard Deviation σ = SQRT(Σ(X - 2 µ) /N) (Does that give you a headache?) 3 R.G. Bias | rbias@ischool.utexas.edu | Statistics: The only science that enables different experts using the same figures to draw different conclusions. – Evan Esar (1899 - 1995), US humorist 4 R.G. Bias | rbias@ischool.utexas.edu | USA Today has come out with a new survey - apparently, three out of every four people make up 75% of the population. – David Letterman 5 R.G. Bias | rbias@ischool.utexas.edu | The last 2 lectures . . . . . . we’ve been talking about the scientific method. When you conduct an experiment, at some point you’ll have some data. “Statistics” is the field of study that addresses how we deal with, manipulate, interpret those data. R.G. Bias | rbias@ischool.utexas.edu | 6 How to talk about a set of numbers We can list ‘em. – Can get WAY unwieldy. – Plus hard to make any sense out of them. First step – put ‘em in order. Second step – – Graph ‘em, and/or – Calculate percentiles/deciles 7 R.G. Bias | rbias@ischool.utexas.edu | Frequency Distributions Histograms # of pets ever owned – – – – – – – – – – 8 13 2 1 4 0 1 3 0 5 1 Put ‘em in order – – – – – – – – – – 0 0 1 1 1 2 3 4 5 13 R.G. Bias | rbias@ischool.utexas.edu | Freq Dist Raw Scores (in order) – – – – – – – – – – 9 0 0 1 1 1 2 3 4 5 13 Raw Score 0 1 2 3 4 5 13 Freq 2 3 1 1 1 1 1 Cumu Freq 2 5 6 7 8 9 10 R.G. Bias | rbias@ischool.utexas.edu | Histogram 3 2.5 2 1.5 # of pets 1 0.5 0 0 1 2 3 4 5 13 R.G. Bias | rbias@ischool.utexas.edu | 10 Percentiles LOCATION of 25th percentile: – X.25 = (N+1) .25 LOCATION of 50th percentile: – X.50 = (N+1) .50 LOCATION of 75th percentile: – X.75 = (N+1) .75 Example: If we had 10 scores, – the 25th percentile would be the (11).25=2.75th score or part way (half way!) between the 2nd and 3rd scores. – The 50th percentile would be the (11).50=5.5th score, or half way between the 5th and 6th scores. 11 R.G. Bias | rbias@ischool.utexas.edu | Note . . . With an odd number of scores, the 50th percentile will be an actual score: Raw Scores (in order) – – – – – – – – – – – 0 0 1 1 1 2 3 4 5 13 100 50th percentile = (N+1).50 = (12).5 = 6th score = 2. 12 R.G. Bias | rbias@ischool.utexas.edu | Earlier . . . We learned about frequency distributions. I asserted that a frequency distribution, and/or a histogram (a graphical representation of a frequency distribution), was a good way to summarize a collection of data. There’s another, even shorter-hand way. 13 R.G. Bias | rbias@ischool.utexas.edu | Measures of Central Tendency Mode – Most frequent score (or scores – a distribution can have multiple modes) Median – “Middle score” – 50th percentile Mean - µ (“mu”) – “Arithmetic average” – ΣX/N 14 R.G. Bias | rbias@ischool.utexas.edu | Let’s calculate some “averages” Here’s a distribution of scores 2 2 5 Measures of Central Tendency Mode? Median? Mean? R.G. Bias | rbias@ischool.utexas.edu | 15 Let’s calculate some “averages” Here’s a distribution of scores 0 0 0 1 1 10 Measures of Central Tendency Mode? Median? Mean? R.G. Bias | rbias@ischool.utexas.edu | 16 A quiz about averages 1 – If one score in a distribution changes, will the mode change? __Yes __No __Maybe 2 – How about the median? __Yes __No __Maybe 3 – How about the mean? __Yes __No __Maybe 4 – True or false: In a normal distribution (bell curve), the mode, median, and mean are all the same? __True __False 17 R.G. Bias | rbias@ischool.utexas.edu | More quiz questions about measures of central tendency 5 – (This one is tricky.) If the mode=mean=median, then the distribution is necessarily a bell curve? __True __False 6 – I have a distribution of 10 scores. There was an error, and really the highest score is 5 points HIGHER than previously thought. a) What does this do to the mode? __ Increases it __Decreases it __Nothing __Can’t tell b) What does this do to the median? __ Increases it __Decreases it __Nothing __Can’t tell c) What does this do to the mean? __ Increases it __Decreases it __Nothing __Can’t tell 7 – Which of the following must be an actual score from the distribution? a) Mean b) Median c) Mode d) None of the above 18 R.G. Bias | rbias@ischool.utexas.edu | OK, so which do we use? Means allow further arithmetic/statistical manipulation. But . . . It depends on: – The type of data • Can’t use means with nominal or ordinal scale data (more on the Monday) • With nominal data, must use mode – The distribution of your data • Tend to use medians with distributions bounded at one end but not the other (e.g., salary). – The question you want to answer • “Most popular score” vs. “middle score” vs. “middle of the see-saw” • “Statistics can tell us which measures are technically correct. It cannot tell us which are ‘meaningful’” (Tal, 2001, p. 52). 19 R.G. Bias | rbias@ischool.utexas.edu | Mean – “see saw” (from Tal, 2001) 20 R.G. Bias | rbias@ischool.utexas.edu | Have sidled up to SHAPES of distributions Symmetrical Skewed – positive and negative Flat 21 R.G. Bias | rbias@ischool.utexas.edu | “Pulling up the mean” 22 R.G. Bias | rbias@ischool.utexas.edu | Why . . . . . . isn’t a “measure of central tendency” all we need to characterize a distribution of scores/numbers/data/stuff? “The price for using measures of central tendency is loss of information” (Tal, 2001, p. 49). 23 R.G. Bias | rbias@ischool.utexas.edu | Didja hear the one about . . . the Aggies who were on a march and came to a river? The Aggie captain asked the farmer how deep the river was.” “Oh, it averages two feet deep.” All the Aggies drowned. 24 R.G. Bias | rbias@ischool.utexas.edu | Note . . . We started with a bunch of specific scores. We put them in order. We drew their distribution. Now we can report their central tendency. So, we’ve moved AWAY from specifics, to a summary. But with Central Tendency, alone, we’ve ignored the specifics altogether. – Why isn’t a Measure of Central Tendency, alone, satisfactory? – Note MANY distributions could have a particular central tendency! If we went back to ALL the specifics, we’d be back at square one. 25 R.G. Bias | rbias@ischool.utexas.edu | Measures of Dispersion (or Spread) Range Semi-interquartile range Standard deviation – σ (sigma) 26 R.G. Bias | rbias@ischool.utexas.edu | Range Highest score minus the lowest score. Like the mode . . . – Easy to calculate – Potentially misleading – Doesn’t take EVERY score into account. What we need to do is calculate one number that will capture HOW spread out our numbers are from that measure of Central Tendency. – ‘Cause MANY different distributions of scores can have the same central tendency! – “Standard Deviation” -- σ = SQRT(Σ(X - µ)2/N) 27 R.G. Bias | rbias@ischool.utexas.edu | Let’s do a short example What if I asked four undergraduates how many cars they’ve owned in their lives and I got the following answers: 1 1 1 1 There would be NO variance. σ = 0. But what if the answers were 0 0 1 3 What’s the mode? Median? Mean? Go with mean. So, how much do the actual scores deviate from the mean? 28 R.G. Bias | rbias@ischool.utexas.edu | So . . . Add up all the deviations and we should have a feel for how dispersed, how spread, how deviant, our distribution is. Let’s calculate the Standard Deviation. As always, start inside the parentheses. Σ(X - µ) 29 R.G. Bias | rbias@ischool.utexas.edu | Standard Deviation Score (X) Mean (µ) X-µ 0 1 -1 0 1 -1 1 1 0 3 1 2 Total 30 0 (damn) R.G. Bias | rbias@ischool.utexas.edu | Damn! OK, let’s try it on another set of numbers. X 2 3 5 6 R.G. Bias | rbias@ischool.utexas.edu | 31 Damn! (cont’d.) OK, let’s try it on a smaller set of numbers. X X-µ 2 -2 3 -1 5 1 6 2 Σ = 16 Σ = 0 µ = 4 Hmm. R.G. Bias | rbias@ischool.utexas.edu | 32 OK . . . . . . so mathematicians at this point do one of two things. Take the absolute value or square ‘em. We square ‘em. Σ(X - µ)2 33 R.G. Bias | rbias@ischool.utexas.edu | X - µ (X - µ)2 X 2 3 5 6 Σ = 16 µ=4 -2 -1 1 2 Σ=0 4 1 1 4 10 R.G. Bias | rbias@ischool.utexas.edu | 34 Standard Deviation (cont’d.) Then take the average of the squared deviations. Σ(X - µ)2/N – Remember, dividing by N was the way we took the average of the original scores. – 10/4 = 2.5. But this number is so BIG! 35 R.G. Bias | rbias@ischool.utexas.edu | OK . . . . . . take the square root (to make up for squaring the deviations earlier). σ = SQRT(Σ(X - µ)2/N) SQRT(2.5) = 1.58 Now this doesn’t give you a headache, right? I said “right”? 36 R.G. Bias | rbias@ischool.utexas.edu | Hmmm . . . Mode Range Median ????? Mean Standard Deviation R.G. Bias | rbias@ischool.utexas.edu | 37 We need . . . A measure of spread that is NOT sensitive to every little score, just as median is not. SIQR: Semi-interquartile range. (Q3 – Q1)/2 38 R.G. Bias | rbias@ischool.utexas.edu | To summarize Mode Range -Easy to calculate. -May be misleading. Median SIQR Mean (µ) SD (σ) -Capture the center. -Not influenced by extreme scores. -Take every score into account. -Allow later manipulations. R.G. Bias | rbias@ischool.utexas.edu | 39 Practice Problems I’ll send you some, tonight. 40 R.G. Bias | rbias@ischool.utexas.edu | http://highered.mcgrawhill.com/sites/0072494468/student_view0/ statistics_primer.html Click on Statistics Primer. 41 R.G. Bias | rbias@ischool.utexas.edu | References Hinton, P. R. Statistics explained. Shaughnessy, Zechmeister, and Zechmeister. Experimental methods in psychology. R.G. Bias | rbias@ischool.utexas.edu | 42