Name that tune. Song title? Performer(s)? | | R.G. Bias

advertisement
Name that tune.
Song title? Performer(s)?
R.G. Bias | rbias@ischool.utexas.edu |
1
Descriptive Statistics
“Finding New Information”
4/5/2010
R.G. Bias | rbias@ischool.utexas.edu |
2
Standard Deviation
σ = SQRT(Σ(X -
2
µ) /N)
(Does that give you a
headache?)
3
R.G. Bias | rbias@ischool.utexas.edu |
 USA Today has come out with a new
survey - apparently, three out of every four
people make up 75% of the population.
– David Letterman
4
R.G. Bias | rbias@ischool.utexas.edu |
 Statistics: The only science that enables
different experts using the same figures to
draw different conclusions.
– Evan Esar (1899 - 1995), US humorist
5
R.G. Bias | rbias@ischool.utexas.edu |
Scales
 The data we collect can be represented on
one of FOUR types of scales:
– Nominal
– Ordinal
– Interval
– Ratio
 “Scale” in the sense that an individual
score is placed at some point along a
continuum.
6
R.G. Bias | rbias@ischool.utexas.edu |
Nominal Scale
 Describe something by giving it a name. (Name
– Nominal. Get it?)
 Mutually exclusive categories.
 For example:
– Gender: 1 = Female, 2 = Male
– Marital status: 1 = single, 2 = married,
3 = divorced, 4 = widowed
– Make of car: 1 = Ford, 2 = Chevy . . .
 The numbers are just names.
7
R.G. Bias | rbias@ischool.utexas.edu |
Ordinal Scale
 An ordered set of objects.
 But no implication about the relative SIZE
of the steps.
 Example:
– The 50 states in order of population:
•
•
•
•
8
1 = California
2 = Texas
3 = New York
. . . 50 = Wyoming
R.G. Bias | rbias@ischool.utexas.edu |
Interval Scale
 Ordered, like an ordinal scale.
 Plus there are equal intervals between each pair
of scores.
 With Interval data, we can calculate means
(averages).
 However, the zero point is arbitrary.
 Examples:
– Temperature in Fahrenheit or Centigrade.
– IQ scores
9
R.G. Bias | rbias@ischool.utexas.edu |
Ratio Scale
 Interval scale, plus an absolute zero.
 Sample:
– Distance, weight, height, time (but not years –
e.g., the year 2002 isn’t “twice” 1001).
10
R.G. Bias | rbias@ischool.utexas.edu |
Scales (cont’d.)
It’s possible to measure the same attribute on
different scales. Say, for instance, your midterm
test. I could:
 Give you a “1” if you don’t finish, and a “2” if you
finish.
 “1” for highest grade in class, “2” for second
highest grade, . . . .
 “1” for first quarter of the class, “2” for second
quarter of the class,” . . .
 Raw test score (100, 99, . . . .).
– (NOTE: A score of 100 doesn’t mean the person
“knows” twice as much as a person who scores 50,
he/she just gets twice the score.)
11
R.G. Bias | rbias@ischool.utexas.edu |
Scales (cont’d.)
Nominal
Ordinal
Interval
Ratio
Name
=
=
=
Mutuallyexclusive
=
=
=
Ordered
=
=
Equal interval
=
+ abs. 0
Days of wk.,
Temp.
Inches, Dollars
Gender,
Yes/No
Class rank,
Survey ans.
R.G. Bias | rbias@ischool.utexas.edu |
12
Earlier . . .
 We learned about frequency
distributions.
 I asserted that a frequency distribution,
and/or a histogram (a graphical
representation of a frequency distribution),
was a good way to summarize a collection
of data.
 There’s another, even shorter-hand way.
13
R.G. Bias | rbias@ischool.utexas.edu |
Measures of Central Tendency
 Mode
– Most frequent score (or scores – a distribution
can have multiple modes)
 Median
– “Middle score”
– 50th percentile
 Mean - µ (“mu”)
– “Arithmetic average”
– ΣX/N
14
R.G. Bias | rbias@ischool.utexas.edu |
More quiz questions about
measures of central tendency
4 – True or false: In a normal distribution (bell curve), the mode, median, and mean
are all the same? __True __False
5 – (This one is tricky.) If the mode=mean=median, then the distribution is necessarily
a bell curve?
__True __False
6 – I have a distribution of 10 scores. There was an error, and really the highest score
is 5 points HIGHER than previously thought.
a) What does this do to the mode?
__ Increases it __Decreases it __Nothing __Can’t tell
b) What does this do to the median?
__ Increases it __Decreases it __Nothing __Can’t tell
c) What does this do to the mean?
__ Increases it __Decreases it __Nothing __Can’t tell
7 – Which of the following must be an actual score from the distribution?
a) Mean
b) Median
c) Mode
d) None of the above
15
R.G. Bias | rbias@ischool.utexas.edu |
OK, so which do we use?
 Means allow further arithmetic/statistical manipulation. But . . .
 It depends on:
– The type of scale of your data
• Can’t use means with nominal or ordinal scale data
• With nominal data, must use mode
– The distribution of your data
• Tend to use medians with distributions bounded at one end
but not the other (e.g., salary).
– The question you want to answer
• “Most popular score” vs. “middle score” vs. “middle of the
see-saw”
• “Statistics can tell us which measures are technically correct.
It cannot tell us which are ‘meaningful’” (Tal, 2001, p. 52).
16
R.G. Bias | rbias@ischool.utexas.edu |
Mean – “see saw” (from Tal, 2001)
17
R.G. Bias | rbias@ischool.utexas.edu |
Have sidled up to SHAPES of
distributions
 Symmetrical
 Skewed – positive and negative
 Flat
18
R.G. Bias | rbias@ischool.utexas.edu |
“Pulling up the mean”
19
R.G. Bias | rbias@ischool.utexas.edu |
Why . . .
 . . . isn’t a “measure of central tendency”
all we need to characterize a distribution of
scores/numbers/data/stuff?
 “The price for using measures of central
tendency is loss of information” (Tal, 2001,
p. 49).
20
R.G. Bias | rbias@ischool.utexas.edu |
Didja hear the one about . . .
 the Aggies who were on a march and
came to a river? The Aggie captain asked
the farmer how deep the river was.”
 “Oh, it averages two feet deep.”
 All the Aggies drowned.
21
R.G. Bias | rbias@ischool.utexas.edu |
Note . . .





We started with a bunch of specific scores.
We put them in order.
We drew their distribution.
Now we can report their central tendency.
So, we’ve moved AWAY from specifics, to a
summary. But with Central Tendency, alone,
we’ve ignored the specifics altogether.
– Note MANY distributions could have a particular
central tendency!
 If we went back to ALL the specifics, we’d be
back at square one.
22
R.G. Bias | rbias@ischool.utexas.edu |
Measures of Dispersion
 Range
 Semi-interquartile range
 Standard deviation
– σ (sigma)
23
R.G. Bias | rbias@ischool.utexas.edu |
Range
 Highest score minus the lowest score.
 Like the mode . . .
– Easy to calculate
– Potentially misleading
– Doesn’t take EVERY score into account.
 What we need to do is calculate one number
that will capture HOW spread out our numbers
are from that measure of Central Tendency.
– ‘Cause MANY different distributions of scores can
have the same central tendency!
– “Standard Deviation” -- σ = SQRT(Σ(X - µ)2/N)
24
R.G. Bias | rbias@ischool.utexas.edu |
Let’s do a short example
 What if I asked four undergraduates how
many cars they’ve owned in their lives and
I got the following answers: 1 1 1 1
 There would be NO variance. σ = 0.
 But what if the answers were 0 0 1 3
What’s the mode? Median? Mean?
 Go with mean.
 So, how much do the actual scores
deviate from the mean?
25
R.G. Bias | rbias@ischool.utexas.edu |
So . . .
 Add up all the deviations and we should
have a feel for how disperse, how spread,
how deviant, our distribution is.
 Let’s calculate the Standard Deviation.
 As always, start inside the parentheses.
 Σ(X - µ)
26
R.G. Bias | rbias@ischool.utexas.edu |
Standard Deviation
Score (X)
Mean (µ)
X-µ
0
1
-1
0
1
-1
1
1
0
3
1
2
Total
27
0 (damn)
R.G. Bias | rbias@ischool.utexas.edu |
Damn!
 OK, let’s try it on
another set of
numbers.
X
2
3
5
6
R.G. Bias | rbias@ischool.utexas.edu |
28
Damn! (cont’d.)
 OK, let’s try it on a
smaller set of
numbers.
X
X-µ
2
-2
3
-1
5
1
6
2
Σ = 16 Σ = 0
µ = 4 Hmm.
R.G. Bias | rbias@ischool.utexas.edu |
29
OK . . .
 . . . so mathematicians at this point do one
of two things.
 Take the absolute value or square ‘em.
 We square ‘em. Σ(X - µ)2
30
R.G. Bias | rbias@ischool.utexas.edu |
X - µ (X - µ)2
X
2
3
5
6
Σ = 16
µ=4
-2
-1
1
2
Σ=0
4
1
1
4
10
R.G. Bias | rbias@ischool.utexas.edu | 31
Standard Deviation (cont’d.)
 Then take the average of the squared
deviations. Σ(X - µ)2/N
– Remember, dividing by N was the way we
took the average of the original scores.
– 10/4 = 2.5.
 But this number is so BIG!
32
R.G. Bias | rbias@ischool.utexas.edu |
OK . . .
 . . . take the square root (to make up for
squaring the deviations earlier).
 σ = SQRT(Σ(X - µ)2/N)
 SQRT(2.5) = 1.58
 Now this doesn’t give you a headache,
right?
 I said “right”?
33
R.G. Bias | rbias@ischool.utexas.edu |
Hmmm . . .
Mode
Range
Median
?????
Mean
Standard Deviation
R.G. Bias | rbias@ischool.utexas.edu |
34
We need . . .
 A measure of spread that is NOT sensitive
to every little score, just as median is not.
 SIQR: Semi-interquartile range.
 (Q3 – Q1)/2
35
R.G. Bias | rbias@ischool.utexas.edu |
To summarize
Mode
Range
-Easy to calculate.
-May be misleading.
Median
SIQR
Mean
(µ)
SD
(σ)
-Capture the center.
-Not influenced by
extreme scores.
-Take every score into
account.
-Allow later
manipulations.
R.G. Bias | rbias@ischool.utexas.edu |
36
Practice Problems
 I’ll send you some, tonight.
37
R.G. Bias | rbias@ischool.utexas.edu |
 http://highered.mcgrawhill.com/sites/0072494468/student_view0/
statistics_primer.html
 Click on Statistics Primer.
38
R.G. Bias | rbias@ischool.utexas.edu |
References
 Hinton, P. R. Statistics explained.
 Shaughnessy, Zechmeister, and
Zechmeister. Experimental methods in
psychology.
R.G. Bias | rbias@ischool.utexas.edu | 39
Download