Name that tune. Song title? Performer(s)? | | R.G. Bias

advertisement
Name that tune.
Song title? Performer(s)?
R.G. Bias | rbias@ischool.utexas.edu |
1
Descriptive Statistics
“Finding New Information”
3/23/2011
R.G. Bias | rbias@ischool.utexas.edu |
2
Standard Deviation
σ = SQRT(Σ(X -
2
µ) /N)
(Does that give you a
headache?)
3
R.G. Bias | rbias@ischool.utexas.edu |
 Statistics: The only science that enables
different experts using the same figures to
draw different conclusions.
– Evan Esar (1899 - 1995), US humorist
4
R.G. Bias | rbias@ischool.utexas.edu |
 USA Today has come out with a new
survey - apparently, three out of every four
people make up 75% of the population.
– David Letterman
5
R.G. Bias | rbias@ischool.utexas.edu |
The last 2 lectures . . .
 . . . we’ve been talking about the scientific
method.
 When you conduct an experiment, at
some point you’ll have some data.
 “Statistics” is the field of study that
addresses how we deal with, manipulate,
interpret those data.
R.G. Bias | rbias@ischool.utexas.edu |
6
How to talk about a set of numbers
 We can list ‘em.
– Can get WAY unwieldy.
– Plus hard to make any sense out of them.
 First step – put ‘em in order.
 Second step –
– Graph ‘em, and/or
– Calculate percentiles/deciles
7
R.G. Bias | rbias@ischool.utexas.edu |
Frequency Distributions Histograms
 # of pets ever owned
–
–
–
–
–
–
–
–
–
–
8
13
2
1
4
0
1
3
0
5
1
 Put ‘em in order
–
–
–
–
–
–
–
–
–
–
0
0
1
1
1
2
3
4
5
13
R.G. Bias | rbias@ischool.utexas.edu |
Freq Dist
 Raw Scores (in
order)
–
–
–
–
–
–
–
–
–
–
9
0
0
1
1
1
2
3
4
5
13
Raw Score
0
1
2
3
4
5
13
Freq
2
3
1
1
1
1
1
Cumu Freq
2
5
6
7
8
9
10
R.G. Bias | rbias@ischool.utexas.edu |
Histogram
3
2.5
2
1.5
# of pets
1
0.5
0
0
1
2
3
4
5
13
R.G. Bias | rbias@ischool.utexas.edu |
10
Percentiles
 LOCATION of 25th percentile:
– X.25 = (N+1) .25
 LOCATION of 50th percentile:
– X.50 = (N+1) .50
 LOCATION of 75th percentile:
– X.75 = (N+1) .75
 Example: If we had 10 scores,
– the 25th percentile would be the (11).25=2.75th score or part way
(half way!) between the 2nd and 3rd scores.
– The 50th percentile would be the (11).50=5.5th score, or half way
between the 5th and 6th scores.
11
R.G. Bias | rbias@ischool.utexas.edu |
Note . . .
 With an odd number of scores, the 50th percentile will
be an actual score:
 Raw Scores (in order)
–
–
–
–
–
–
–
–
–
–
–
0
0
1
1
1
2
3
4
5
13
100
 50th percentile = (N+1).50 = (12).5 = 6th score = 2.
12
R.G. Bias | rbias@ischool.utexas.edu |
Earlier . . .
 We learned about frequency
distributions.
 I asserted that a frequency distribution,
and/or a histogram (a graphical
representation of a frequency distribution),
was a good way to summarize a collection
of data.
 There’s another, even shorter-hand way.
13
R.G. Bias | rbias@ischool.utexas.edu |
Measures of Central Tendency
 Mode
– Most frequent score (or scores – a distribution
can have multiple modes)
 Median
– “Middle score”
– 50th percentile
 Mean - µ (“mu”)
– “Arithmetic average”
– ΣX/N
14
R.G. Bias | rbias@ischool.utexas.edu |
Let’s calculate some “averages”
Here’s a distribution of
scores
 2
 2
 5
Measures of Central
Tendency
 Mode?
 Median?
 Mean?
R.G. Bias | rbias@ischool.utexas.edu | 15
Let’s calculate some “averages”
Here’s a distribution of
scores
 0
 0
 0
 1
 1
 10
Measures of Central
Tendency
 Mode?
 Median?
 Mean?
R.G. Bias | rbias@ischool.utexas.edu | 16
A quiz about averages
1 – If one score in a distribution changes, will the mode change?
__Yes __No __Maybe
2 – How about the median?
__Yes __No __Maybe
3 – How about the mean?
__Yes __No __Maybe
4 – True or false: In a normal distribution (bell curve), the mode,
median, and mean are all the same? __True __False
17
R.G. Bias | rbias@ischool.utexas.edu |
More quiz questions about
measures of central tendency
5 – (This one is tricky.) If the mode=mean=median, then the distribution is necessarily
a bell curve?
__True __False
6 – I have a distribution of 10 scores. There was an error, and really the highest score
is 5 points HIGHER than previously thought.
a) What does this do to the mode?
__ Increases it __Decreases it __Nothing __Can’t tell
b) What does this do to the median?
__ Increases it __Decreases it __Nothing __Can’t tell
c) What does this do to the mean?
__ Increases it __Decreases it __Nothing __Can’t tell
7 – Which of the following must be an actual score from the distribution?
a) Mean
b) Median
c) Mode
d) None of the above
18
R.G. Bias | rbias@ischool.utexas.edu |
OK, so which do we use?
 Means allow further arithmetic/statistical manipulation. But . . .
 It depends on:
– The type of data
• Can’t use means with nominal or ordinal scale data (more on
the Monday)
• With nominal data, must use mode
– The distribution of your data
• Tend to use medians with distributions bounded at one end
but not the other (e.g., salary).
– The question you want to answer
• “Most popular score” vs. “middle score” vs. “middle of the
see-saw”
• “Statistics can tell us which measures are technically correct.
It cannot tell us which are ‘meaningful’” (Tal, 2001, p. 52).
19
R.G. Bias | rbias@ischool.utexas.edu |
Mean – “see saw” (from Tal, 2001)
20
R.G. Bias | rbias@ischool.utexas.edu |
Have sidled up to SHAPES of
distributions
 Symmetrical
 Skewed – positive and negative
 Flat
21
R.G. Bias | rbias@ischool.utexas.edu |
“Pulling up the mean”
22
R.G. Bias | rbias@ischool.utexas.edu |
Why . . .
 . . . isn’t a “measure of central tendency”
all we need to characterize a distribution of
scores/numbers/data/stuff?
 “The price for using measures of central
tendency is loss of information” (Tal, 2001,
p. 49).
23
R.G. Bias | rbias@ischool.utexas.edu |
Didja hear the one about . . .
 the Aggies who were on a march and
came to a river? The Aggie captain asked
the farmer how deep the river was.”
 “Oh, it averages two feet deep.”
 All the Aggies drowned.
24
R.G. Bias | rbias@ischool.utexas.edu |
Note . . .





We started with a bunch of specific scores.
We put them in order.
We drew their distribution.
Now we can report their central tendency.
So, we’ve moved AWAY from specifics, to a summary.
But with Central Tendency, alone, we’ve ignored the
specifics altogether.
– Why isn’t a Measure of Central Tendency, alone,
satisfactory?
– Note MANY distributions could have a particular
central tendency!
 If we went back to ALL the specifics, we’d be back at
square one.
25
R.G. Bias | rbias@ischool.utexas.edu |
Measures of Dispersion (or
Spread)
 Range
 Semi-interquartile range
 Standard deviation
– σ (sigma)
26
R.G. Bias | rbias@ischool.utexas.edu |
Range
 Highest score minus the lowest score.
 Like the mode . . .
– Easy to calculate
– Potentially misleading
– Doesn’t take EVERY score into account.
 What we need to do is calculate one number
that will capture HOW spread out our numbers
are from that measure of Central Tendency.
– ‘Cause MANY different distributions of scores can
have the same central tendency!
– “Standard Deviation” -- σ = SQRT(Σ(X - µ)2/N)
27
R.G. Bias | rbias@ischool.utexas.edu |
Let’s do a short example
 What if I asked four undergraduates how
many cars they’ve owned in their lives and
I got the following answers: 1 1 1 1
 There would be NO variance. σ = 0.
 But what if the answers were 0 0 1 3
What’s the mode? Median? Mean?
 Go with mean.
 So, how much do the actual scores
deviate from the mean?
28
R.G. Bias | rbias@ischool.utexas.edu |
So . . .
 Add up all the deviations and we should
have a feel for how dispersed, how
spread, how deviant, our distribution is.
 Let’s calculate the Standard Deviation.
 As always, start inside the parentheses.
 Σ(X - µ)
29
R.G. Bias | rbias@ischool.utexas.edu |
Standard Deviation
Score (X)
Mean (µ)
X-µ
0
1
-1
0
1
-1
1
1
0
3
1
2
Total
30
0 (damn)
R.G. Bias | rbias@ischool.utexas.edu |
Damn!
 OK, let’s try it on
another set of
numbers.
X
2
3
5
6
R.G. Bias | rbias@ischool.utexas.edu |
31
Damn! (cont’d.)
 OK, let’s try it on a
smaller set of
numbers.
X
X-µ
2
-2
3
-1
5
1
6
2
Σ = 16 Σ = 0
µ = 4 Hmm.
R.G. Bias | rbias@ischool.utexas.edu |
32
OK . . .
 . . . so mathematicians at this point do one
of two things.
 Take the absolute value or square ‘em.
 We square ‘em. Σ(X - µ)2
33
R.G. Bias | rbias@ischool.utexas.edu |
X - µ (X - µ)2
X
2
3
5
6
Σ = 16
µ=4
-2
-1
1
2
Σ=0
4
1
1
4
10
R.G. Bias | rbias@ischool.utexas.edu | 34
Standard Deviation (cont’d.)
 Then take the average of the squared
deviations. Σ(X - µ)2/N
– Remember, dividing by N was the way we
took the average of the original scores.
– 10/4 = 2.5.
 But this number is so BIG!
35
R.G. Bias | rbias@ischool.utexas.edu |
OK . . .
 . . . take the square root (to make up for
squaring the deviations earlier).
 σ = SQRT(Σ(X - µ)2/N)
 SQRT(2.5) = 1.58
 Now this doesn’t give you a headache,
right?
 I said “right”?
36
R.G. Bias | rbias@ischool.utexas.edu |
Hmmm . . .
Mode
Range
Median
?????
Mean
Standard Deviation
R.G. Bias | rbias@ischool.utexas.edu |
37
We need . . .
 A measure of spread that is NOT sensitive
to every little score, just as median is not.
 SIQR: Semi-interquartile range.
 (Q3 – Q1)/2
38
R.G. Bias | rbias@ischool.utexas.edu |
To summarize
Mode
Range
-Easy to calculate.
-May be misleading.
Median
SIQR
Mean
(µ)
SD
(σ)
-Capture the center.
-Not influenced by
extreme scores.
-Take every score into
account.
-Allow later
manipulations.
R.G. Bias | rbias@ischool.utexas.edu |
39
Practice Problems
 I’ll send you some, tonight.
40
R.G. Bias | rbias@ischool.utexas.edu |
 http://highered.mcgrawhill.com/sites/0072494468/student_view0/
statistics_primer.html
 Click on Statistics Primer.
41
R.G. Bias | rbias@ischool.utexas.edu |
References
 Hinton, P. R. Statistics explained.
 Shaughnessy, Zechmeister, and
Zechmeister. Experimental methods in
psychology.
R.G. Bias | rbias@ischool.utexas.edu | 42
Download