i INF 397C Introduction to Research in Information Studies Fall, 2009 Day 2 R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 1 Standard Deviation σ = SQRT(Σ(X - i 2 µ) /N) (Does that give you a headache?) R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 2 i • USA Today has come out with a new survey - apparently, three out of every four people make up 75% of the population. – David Letterman R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 3 i • Statistics: The only science that enables different experts using the same figures to draw different conclusions. – Evan Esar (1899 - 1995), US humorist R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 4 Didja hear the one about . . . i • the three statisticians who went hunting? R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 5 Critical Skepticism i • Remember the Rabbit Pie example from last week? • The “critical consumer” of statistics asked “what do you mean by ’50/50’”? R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 6 Remember . . . i • I do NOT want you to become cynical. • Not all “media bias” (nor bad research) is intentional. • Just be sensible, critical, skeptical. • As you “consume” statistics, ask some questions . . . R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 7 Ask yourself. . . i • Who says so? (A Zest commercial is unlikely to tell you that Irish Spring is best.) • How does he/she know? (That Zest is “the best soap for you.”) • What’s missing? (One year, 33% of female grad students at Johns Hopkins married faculty.) • Did somebody change the subject? (“Camrys are bigger than Accords.” “Accords are bigger than Camrys.”) • Does it make sense? (“Study in NYC: Working woman with family needed $40.13/week for adequate support.”) R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 8 What were . . . i • . . . some claims you all heard this week? R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 9 Last week . . . i • We learned about frequency distributions. • I asserted that a frequency distribution, and/or a histogram (a graphical representation of a frequency distribution), was a good way to summarize a collection of data. • And I asserted there’s another, even shorter-hand way. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 10 Measures of Central Tendency i • Mode – Most frequent score (or scores – a distribution can have multiple modes) • Median – “Middle score” – 50th percentile • Mean - µ (“mu”) – “Arithmetic average” – ΣX/N R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 11 A quiz about averages i 1 – If one score in a distribution changes, will the mode change? __Yes __No __Maybe 2 – How about the median? __Yes __No __Maybe What if we ADDED one score? 3 – How about the mean? __Yes __No __Maybe 4 – True or false: In a normal distribution (bell curve), the mode, median, and mean are all the same? __True __False R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 12 More quiz i 5 – (This one is tricky.) If the mode=mean=median, then the distribution is necessarily a bell curve? __True __False 6 – I have a distribution of 10 scores. There was an error, and really the highest score is 5 points HIGHER than previously thought. a) What does this do to the mode? __ Increases it __Decreases it __Nothing __Can’t tell b) What does this do to the median? __ Increases it __Decreases it __Nothing __Can’t tell c) What does this do to the mean? __ Increases it __Decreases it __Nothing __Can’t tell 7 – Which of the following must be an actual score from the distribution? a) Mean b) Median c) Mode d) None of the above R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 13 OK, so which do we use? i • Means allow further arithmetic/statistical manipulation. But . . . • It depends on: – The type of scale of your data • Can’t use means with nominal or ordinal scale data • With nominal data, must use mode – The distribution of your data • Tend to use medians with distributions bounded at one end but not the other (e.g., salary). (Look at our “Number of MLB games” distribution.) – The question you want to answer • “Most popular score” vs. “middle score” vs. “middle of the see-saw” • “Statistics can tell us which measures are technically correct. It cannot tell us which are ‘meaningful’” (Tal, 2001, p. 52). R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 14 i Name X (# of MLB games seen) µ Wenbin 0 Daniel 0 Stephen 0 Christopher 2 Geoff 3 Clarke 3 Justin 4 Erik 15 Randolph 27 Total R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 15 i Scales (last week) Nominal Ordinal Interval Ratio Name = = = Mutuallyexclusive = = = Ordered = = Equal interval = + abs. 0 Days of wk., temp. Inches, dollars Gender, Yes/No Class rank, ratings R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 16 Scales (which measure of CT?) Nominal (mode) Ordinal (mode, median) Interval (any) Ratio (any) Name = = = Mutuallyexclusive = = = Ordered = = i Equal interval = + abs. 0 Gender, Yes/No Class rank, ratings Days of wk., temp. Inches, dollars R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 17 Mean – “see saw” (from Tal, 2001) R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 18 Have sidled up to SHAPES of distributions • • • • i Symmetrical Skewed – positive and negative Flat Multi-modal R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 19 Now, let’s add to freq dist Raw Score 0 1 2 3 4 5 13 Freq 2 3 1 1 1 1 1 Cumu Freq 2 5 6 7 8 9 10 i Relative Cumu Freq Rel Freq .2 .2 .3 .5 .1 .6 .1 .7 .1 .8 .1 .9 .1 1.0 R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 20 When you . . . i • add relative frequency and cumulative relative frequency to your frequency distribution it will help you calculate percentiles (and, therefore, the median). R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 21 “Pulling up the mean” R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 22 Why . . . i • . . . isn’t a “measure of central tendency” all we need to characterize a distribution of scores/numbers/data/stuff? • “The price for using measures of central tendency is loss of information” (Tal, 2001, p. 49). – Remember the see-saw example. Same measure of central tendency – widely varying distribution of scores. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 23 Didja hear the one about . . . i • the Aggies who were on a march and came to a river? The Aggie captain asked the farmer how deep the river was.” • “Oh, it averages two feet deep.” • All the Aggies drowned. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 24 Note . . . • • • • • i We started with a bunch of specific scores. We put them in order. We drew their distribution. Now we can report their central tendency. So, we’ve moved AWAY from specifics, to a summary. But with Central Tendency, alone, we’ve ignored the specifics altogether. – Note MANY distributions could have a particular central tendency! • If we went back to ALL the specifics, we’d be back at square one. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 25 Measures of Dispersion i • Range • Semi-interquartile range • Standard deviation – σ (sigma) R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 26 Range i • Highest score minus the lowest score. • Like the mode . . . – Easy to calculate – Potentially misleading – Doesn’t take EVERY score into account. • What we need to do is calculate one number that will capture HOW spread out our numbers are from that measure of Central Tendency. – ‘Cause MANY different distributions of scores can have the same central tendency! – “Standard Deviation” R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 27 Back to our data – MLB games • • • • i Let’s take just the men in this class xls spreadsheet. Measures of central tendency. Go with mean. (‘Cause we can – ratio scale data!) • So, how much do the actual scores deviate from the mean? R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 28 First – just for grins – mode, median, mean? Name X (# of MLB games seen) i µ Wenbin 0 Daniel 0 Stephen 0 Christopher 2 Geoff 3 Clarke 3 Justin 4 Erik 15 Randolph 27 Total 54 R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 29 So . . . i • Add up all the deviations and we should have a feel for how disperse, how spread, how deviant, our distribution is. • Let’s calculate the Standard Deviation. • As always, start inside the parentheses. • (X - µ) R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 30 So, find distance of each score from the mean Name X (# of MLB games seen) i µ X-µ Wenbin 0 6 -6 Daniel 0 6 -6 Stephen 0 6 -6 Christopher 2 6 -4 Geoff 3 6 -3 Clarke 3 6 -3 Justin 4 6 -2 Erik 15 6 9 Randolph 27 6 21 Total 54 R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 31 So, find distance of each score from the mean Name X (# of MLB games seen) i µ X-µ Wenbin 0 6 -6 Daniel 0 6 -6 Stephen 0 6 -6 Christopher 2 6 -4 Geoff 3 6 -3 Clarke 3 6 -3 Justin 4 6 -2 Erik 15 6 9 Randolph 27 6 21 Total 54 R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 0 32 i Damn! • OK, let’s try it on a smaller set of numbers. X 2 3 5 6 R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 33 Damn! (cont’d.) • OK, let’s try it on a smaller set of numbers. i X X-µ 2 -2 3 -1 5 1 6 2 Σ = 16 Σ = 0 µ = 4 Hmm. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 34 OK . . . i • . . . so mathematicians at this point do one of two things. • Take the absolute value or square ‘em. • We square ‘em. Σ(X - µ)2 R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 35 So, find distance of each score from the mean Name i µ X - µ (X - µ)2 X (# of MLB games seen) Wenbin 0 6 -6 36 Daniel 0 6 -6 36 Stephen 0 6 -6 36 Christopher 2 6 -4 16 Geoff 3 6 -3 9 Clarke 3 6 -3 9 Justin 4 6 -2 4 Erik 15 6 9 9 Randolph 27 6 21 441 Total 54 0 596 R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 36 Standard Deviation (cont’d.) i • Then take the average of the squared deviations. Σ(X - µ)2/N – 596/9 = 66.2 • But this number is so BIG! R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 37 Remember . . . i • We had to SQUARE all the deviation scores (X - µ) to get around the addin’up-to-zero problem . . . • So now we take the square root, to get us back in the same ballpark: • SQRT(66.2) = 8.1. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 38 Sooooo . . . i • How many MLB games have the males in our class seen live: • 0, 3, 0, 27, 15, 0, 3, 4, 2 (ugh) • 0, 0, 0, 2, 3, 3, 4, 15, 27 (hmm) • 50th percentile (median) = 3 (now we’re talkin’) • µ = 6 (I’m with ya’) • µ = 6, σ = 8.1 (NOW I have a pretty clear picture. I know YOU don’t, yet!) R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 39 OK . . . i • . . . take the square root (to make up for squaring the deviations earlier). • σ = SQRT(Σ(X - µ)2/N) • Now this doesn’t give you a headache, right? • I said “right”? R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 40 Hmmm . . . Mode Range Median ????? Mean Standard Deviation R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 41 We need . . . i • A measure of spread that is NOT sensitive to every little score, just as median is not. • SIQR: Semi-interquartile range. • (Q3 – Q1)/2 R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 42 To summarize Mode Range -Easy to calculate. -May be misleading. Median SIQR Mean (µ) SD (σ) -Capture the center. -Not influenced by extreme scores. -Take every score into account. -Allow later manipulations. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 43 Who wants to guess . . . i • . . . What I think is the most important sentence in S, Z, & Z (2006), Chapter 2? R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 44 p. 32 i • Penultimate paragraph, first sentence: • “Scientists seek to determine whether any differences in their observations of the dependent variable are caused by the different conditions of the independent variable.” R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 45 i • http://highered.mcgrawhill.com/sites/0072494468/student_view0 /statistics_primer.html • Click on Statistics Primer. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 46 Practice Problems R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu i 47 Homework i • LOTS of reading. See syllabus. • Send a table/graph/chart that you’ve read this past week. Send email to Garrett by noon, Friday. See you next week. R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu 48