i INF 397C Introduction to Research in Information Studies

advertisement
i
INF 397C
Introduction to Research in Information
Studies
Fall, 2009
Day 2
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
1
Standard Deviation
σ = SQRT(Σ(X -
i
2
µ) /N)
(Does that give you a
headache?)
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
2
i
• USA Today has come out with a new
survey - apparently, three out of every
four people make up 75% of the
population.
– David Letterman
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
3
i
• Statistics: The only science that enables
different experts using the same figures
to draw different conclusions.
– Evan Esar (1899 - 1995), US humorist
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
4
Didja hear the one about . . .
i
• the three statisticians who went hunting?
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
5
Critical Skepticism
i
• Remember the Rabbit Pie example from
last week?
• The “critical consumer” of statistics
asked “what do you mean by ’50/50’”?
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
6
Remember . . .
i
• I do NOT want you to become cynical.
• Not all “media bias” (nor bad research) is
intentional.
• Just be sensible, critical, skeptical.
• As you “consume” statistics, ask some
questions . . .
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
7
Ask yourself. . .
i
• Who says so? (A Zest commercial is unlikely to tell
you that Irish Spring is best.)
• How does he/she know? (That Zest is “the best
soap for you.”)
• What’s missing? (One year, 33% of female grad
students at Johns Hopkins married faculty.)
• Did somebody change the subject? (“Camrys
are bigger than Accords.” “Accords are bigger than
Camrys.”)
• Does it make sense? (“Study in NYC: Working
woman with family needed $40.13/week for adequate
support.”)
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
8
What were . . .
i
• . . . some claims you all heard this week?
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
9
Last week . . .
i
• We learned about frequency
distributions.
• I asserted that a frequency distribution,
and/or a histogram (a graphical
representation of a frequency
distribution), was a good way to
summarize a collection of data.
• And I asserted there’s another, even
shorter-hand way.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
10
Measures of Central Tendency
i
• Mode
– Most frequent score (or scores – a
distribution can have multiple modes)
• Median
– “Middle score”
– 50th percentile
• Mean - µ (“mu”)
– “Arithmetic average”
– ΣX/N
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
11
A quiz about averages
i
1 – If one score in a distribution changes, will the mode
change?
__Yes __No __Maybe
2 – How about the median?
__Yes __No __Maybe
What if we
ADDED one
score?
3 – How about the mean?
__Yes __No __Maybe
4 – True or false: In a normal distribution (bell curve), the
mode, median, and mean are all the same? __True
__False
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
12
More quiz
i
5 – (This one is tricky.) If the mode=mean=median, then the distribution is
necessarily a bell curve?
__True __False
6 – I have a distribution of 10 scores. There was an error, and really the
highest score is 5 points HIGHER than previously thought.
a) What does this do to the mode?
__ Increases it __Decreases it __Nothing __Can’t tell
b) What does this do to the median?
__ Increases it __Decreases it __Nothing __Can’t tell
c) What does this do to the mean?
__ Increases it __Decreases it __Nothing __Can’t tell
7 – Which of the following must be an actual score from the distribution?
a) Mean
b) Median
c) Mode
d) None of the above
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
13
OK, so which do we use?
i
• Means allow further arithmetic/statistical manipulation. But . . .
• It depends on:
– The type of scale of your data
• Can’t use means with nominal or ordinal scale data
• With nominal data, must use mode
– The distribution of your data
• Tend to use medians with distributions bounded at one
end but not the other (e.g., salary). (Look at our “Number
of MLB games” distribution.)
– The question you want to answer
• “Most popular score” vs. “middle score” vs. “middle of the
see-saw”
• “Statistics can tell us which measures are technically
correct. It cannot tell us which are ‘meaningful’” (Tal,
2001, p. 52).
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
14
i
Name
X (# of MLB
games seen)
µ
Wenbin
0
Daniel
0
Stephen
0
Christopher
2
Geoff
3
Clarke
3
Justin
4
Erik
15
Randolph
27
Total
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
15
i
Scales (last week)
Nominal
Ordinal
Interval
Ratio
Name
=
=
=
Mutuallyexclusive
=
=
=
Ordered
=
=
Equal
interval
=
+ abs. 0
Days of wk.,
temp.
Inches,
dollars
Gender,
Yes/No
Class rank,
ratings
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
16
Scales (which measure of CT?)
Nominal
(mode)
Ordinal
(mode,
median)
Interval
(any)
Ratio
(any)
Name
=
=
=
Mutuallyexclusive
=
=
=
Ordered
=
=
i
Equal interval =
+ abs. 0
Gender,
Yes/No
Class rank,
ratings
Days of wk.,
temp.
Inches,
dollars
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
17
Mean – “see saw” (from Tal, 2001)
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
i
18
Have sidled up to SHAPES of
distributions
•
•
•
•
i
Symmetrical
Skewed – positive and negative
Flat
Multi-modal
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
19
Now, let’s add to freq dist
Raw
Score
0
1
2
3
4
5
13
Freq
2
3
1
1
1
1
1
Cumu
Freq
2
5
6
7
8
9
10
i
Relative
Cumu
Freq Rel Freq
.2
.2
.3
.5
.1
.6
.1
.7
.1
.8
.1
.9
.1
1.0
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
20
When you . . .
i
• add relative frequency and cumulative
relative frequency to your frequency
distribution it will help you calculate
percentiles (and, therefore, the median).
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
21
“Pulling up the mean”
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
i
22
Why . . .
i
• . . . isn’t a “measure of central tendency”
all we need to characterize a distribution
of scores/numbers/data/stuff?
• “The price for using measures of central
tendency is loss of information” (Tal,
2001, p. 49).
– Remember the see-saw example. Same
measure of central tendency – widely
varying distribution of scores.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
23
Didja hear the one about . . .
i
• the Aggies who were on a march and
came to a river? The Aggie captain
asked the farmer how deep the river
was.”
• “Oh, it averages two feet deep.”
• All the Aggies drowned.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
24
Note . . .
•
•
•
•
•
i
We started with a bunch of specific scores.
We put them in order.
We drew their distribution.
Now we can report their central tendency.
So, we’ve moved AWAY from specifics, to a
summary. But with Central Tendency, alone,
we’ve ignored the specifics altogether.
– Note MANY distributions could have a particular
central tendency!
• If we went back to ALL the specifics, we’d be
back at square one.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
25
Measures of Dispersion
i
• Range
• Semi-interquartile range
• Standard deviation
– σ (sigma)
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
26
Range
i
• Highest score minus the lowest score.
• Like the mode . . .
– Easy to calculate
– Potentially misleading
– Doesn’t take EVERY score into account.
• What we need to do is calculate one number
that will capture HOW spread out our numbers
are from that measure of Central Tendency.
– ‘Cause MANY different distributions of scores can
have the same central tendency!
– “Standard Deviation”
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
27
Back to our data – MLB games
•
•
•
•
i
Let’s take just the men in this class
xls spreadsheet.
Measures of central tendency.
Go with mean. (‘Cause we can – ratio scale
data!)
• So, how much do the actual scores
deviate from the mean?
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
28
First – just for grins – mode,
median, mean?
Name
X (# of MLB
games seen)
i
µ
Wenbin
0
Daniel
0
Stephen
0
Christopher
2
Geoff
3
Clarke
3
Justin
4
Erik
15
Randolph
27
Total
54
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
29
So . . .
i
• Add up all the deviations and we should
have a feel for how disperse, how
spread, how deviant, our distribution is.
• Let’s calculate the Standard Deviation.
• As always, start inside the parentheses.
• (X - µ)
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
30
So, find distance of each score
from the mean
Name
X (# of MLB
games seen)
i
µ X-µ
Wenbin
0
6
-6
Daniel
0
6
-6
Stephen
0
6
-6
Christopher
2
6
-4
Geoff
3
6
-3
Clarke
3
6
-3
Justin
4
6
-2
Erik
15
6
9
Randolph
27
6
21
Total
54
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
31
So, find distance of each score
from the mean
Name
X (# of MLB
games seen)
i
µ X-µ
Wenbin
0
6
-6
Daniel
0
6
-6
Stephen
0
6
-6
Christopher
2
6
-4
Geoff
3
6
-3
Clarke
3
6
-3
Justin
4
6
-2
Erik
15
6
9
Randolph
27
6
21
Total
54
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
0
32
i
Damn!
• OK, let’s try it on a
smaller set of
numbers.
X
2
3
5
6
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
33
Damn! (cont’d.)
• OK, let’s try it on a
smaller set of
numbers.
i
X
X-µ
2
-2
3
-1
5
1
6
2
Σ = 16 Σ = 0
µ = 4 Hmm.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
34
OK . . .
i
• . . . so mathematicians at this point do
one of two things.
• Take the absolute value or square ‘em.
• We square ‘em. Σ(X - µ)2
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
35
So, find distance of each score
from the mean
Name
i
µ X - µ (X - µ)2
X (# of MLB
games seen)
Wenbin
0
6
-6
36
Daniel
0
6
-6
36
Stephen
0
6
-6
36
Christopher
2
6
-4
16
Geoff
3
6
-3
9
Clarke
3
6
-3
9
Justin
4
6
-2
4
Erik
15
6
9
9
Randolph
27
6
21
441
Total
54
0
596
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
36
Standard Deviation (cont’d.)
i
• Then take the average of the squared
deviations. Σ(X - µ)2/N
– 596/9 = 66.2
• But this number is so BIG!
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
37
Remember . . .
i
• We had to SQUARE all the deviation
scores (X - µ) to get around the addin’up-to-zero problem . . .
• So now we take the square root, to get
us back in the same ballpark:
• SQRT(66.2) = 8.1.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
38
Sooooo . . .
i
• How many MLB games have the males
in our class seen live:
• 0, 3, 0, 27, 15, 0, 3, 4, 2 (ugh)
• 0, 0, 0, 2, 3, 3, 4, 15, 27 (hmm)
• 50th percentile (median) = 3 (now we’re talkin’)
• µ = 6 (I’m with ya’)
• µ = 6, σ = 8.1 (NOW I have a pretty clear picture. I know YOU
don’t, yet!)
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
39
OK . . .
i
• . . . take the square root (to make up for
squaring the deviations earlier).
• σ = SQRT(Σ(X - µ)2/N)
• Now this doesn’t give you a headache,
right?
• I said “right”?
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
40
Hmmm . . .
Mode
Range
Median
?????
Mean
Standard Deviation
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
i
41
We need . . .
i
• A measure of spread that is NOT
sensitive to every little score, just as
median is not.
• SIQR: Semi-interquartile range.
• (Q3 – Q1)/2
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
42
To summarize
Mode
Range
-Easy to calculate.
-May be misleading.
Median
SIQR
Mean
(µ)
SD
(σ)
-Capture the center.
-Not influenced by
extreme scores.
-Take every score into
account.
-Allow later
manipulations.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
i
43
Who wants to guess . . .
i
• . . . What I think is the most important
sentence in S, Z, & Z (2006), Chapter 2?
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
44
p. 32
i
• Penultimate paragraph, first sentence:
• “Scientists seek to determine whether
any differences in their observations of
the dependent variable are caused by
the different conditions of the
independent variable.”
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
45
i
• http://highered.mcgrawhill.com/sites/0072494468/student_view0
/statistics_primer.html
• Click on Statistics Primer.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
46
Practice Problems
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
i
47
Homework
i
• LOTS of reading. See syllabus.
• Send a table/graph/chart that you’ve
read this past week. Send email to
Garrett by noon, Friday.
See you next week.
R. G. Bias | School of Information | UTA 5.424 | Phone: 512 471 7046 | rbias@ischool.utexas.edu
48
Download