Descriptive Statistics: Measures of Central

advertisement
Statistics for Everyone Workshop
Fall 2010
Part 2
Descriptive Statistics:
Measures of Central Tendency and Variability
Workshop presented by Linda Henkel and Laura McSweeney of Fairfield University
Funded by the Core Integration Initiative and the Center for Academic Excellence at
Fairfield University
Descriptive Statistics
Once you know what type of measurement
scale the data were measured on, you can
choose the most appropriate statistics to
summarize them:
Measures of central tendency: Most
representative score
Measures of dispersion: How far spread out
scores are
Measures of Central Tendency
Central tendency = Typical or
representative value of a group of
scores
• Mean: Average score
• Median: middlemost score; score at 50th
percentile; half the scores are above, half
are below
• Mode: Most frequently occurring score(s)
Measures of Central Tendency
Measure
Definition
Takes Every
Value Into
Account?
Mean
M=X/n
Yes
Numerical data
BUT… Can be heavily
influenced by outliers so can
give inaccurate view if
distribution is not
(approximately) symmetric
Median
Middle value
No
Ordinal data or for numerical
data that are skewed
Mode
Most frequent
data value
No
Nominal data
When to Use
Three Different Distributions That
Have the Same Mean
Mean
Sample A
Sample B
Sample C
0
8
6
2
7
6
6
6
6
10
5
6
12
4
6
6
6
6
Measures of Variability
Knowing what the center of a set of scores is
is useful but….
How far spread out are all the scores?
Were all scores the same or did they have
some variability?
Range, Standard deviation, Interquartile
range
Variability = extent to which scores in a
distribution differ from each other; are spread
out
The Range as a Measure of Variability
Difference between lowest score in the set
and highest score
•
Ages ranged from 27 to 56 years of age
•
There was a 29-year age range
•
The number of calories ranged from 256
to 781
Sample Standard Deviation
Standard deviation = How far on “average”
do the scores deviate around the mean?
s = SD =
 X  M 
2
N 1
• In a normal distribution, 68% of the scores
fall within 1 standard deviation of the mean
(M  SD)
• The bigger the SD is, the more spread out
the scores are around the mean
Variations of the Normal Curve
(larger SD = wider spread)
Interquartile Range
Quartile 1: 25th percentile
Quartile 2: 50th percentile (median)
Quartile 3: 75th percentile
Quartile 4: 100th percentile
Interquartile range = IQR =
Score at 75th percentile – Score at 25th percentile
So this is the midmost 50% of the scores
Interquartile Range on Positively (Right)
Skewed Distribution
IQR is often used for interval or ratio data that are skewed
(do not want to consider ALL the scores)
Measures of Variability
Measure
Definition
Takes Every
Value Into
Account?
Range
Highest lowest
score
No, only based
on two most
extreme values
To give crude measure
of spread
Standard
Deviation
68% of the
data fall
within 1 SD
of the mean
Yes, but
describes
majority
For numerical data that
are approximately
symmetric or normal
Interquartile
Range
Middle 50%
of the data
fall within
the IQR
No, but
describes
most
When numerical data
are skewed
When to Use
Presenting Measures of Central Tendency and
Variability in Text
The number of fruit flies observed each day ranged
from 0 to 57 (M = 25.32, SD = 5.08).
Plants exposed to moderate amounts of sunlight were
taller (M = 6.75 cm, SD = 1.32) than plants exposed
to minimal sunlight (M = 3.45 cm, SD = 0.95).
The response time to a patient’s call ranged from 0 to 8
minutes (M = 2.1, SD = .8)
 Sentences should always be grammatical and
sensible. Do not just list a bunch of numbers. Use
the statistical information to supplement what you
are saying
Presenting Measures of Central Tendency
and Variability in Tables
(Symmetric Data)
Number of Fruit Flies
Weight (lbs)
Response time to
Patient’s Call (mins)
Range
0 to 57
M
25.32
SD
5.08
118 to 208
160.31
10.97
0 to 8
2.1
.8
Be sure to include the units of measurement! You can include an
additional column to put the sample size (N)
Presenting Measures of Central Tendency and
Variability in Tables
(Skewed Data)
Number of Fruit Flies
Weight (lbs)
Response time to
Patient’s Call (mins)
Range
0 to 57
Median
27
IQR
9
118 to 208
155.6
12
0 to 8
1.5
1
Be sure to include the units of measurement! You can include an
additional column to put the sample size (N)
What’s the Difference Between SD and SE?
Sometimes instead of standard deviation, people report the
standard error of the mean (SE or SEM) in text, tables, and
figures
Standard deviation (SD) = “Average” deviation of individual
scores around mean of scores
• Used to describe the spread of your (one) sample
Standard error (SE = SD/N) = How much on average sample
means would vary if you sampled more than once from the
same population (we do not expect the particular mean we got
to be an exact reflection of the population mean)
• Used to describe the spread of all possible sample means and
used to make inferences about the population mean
 Standard error usually looks better in figures because it is not
as large
Teaching Tips
• Dangers of low N: Be sure to emphasize to
students that with a small sample size, data
may not be representative of the population at
large and they should take care in drawing
conclusions
• Dangers of Outliers: Be sure your students look
for outliers (extreme values) in their data and
discuss appropriate strategies for dealing with
them (e.g., eliminating data because the researcher
assumes it is a mistake instead of part of the natural
variability in the population = subjective science)
Time to Practice
Finding descriptive statistics
Teaching tips:
• Hands-on practice is important for your
students
• Sometimes working with a partner helps
Download