Lecture 12 - Hunter College, Department of Geography

advertisement
Intro to Descriptive Statistics
GTECH 201
Lecture 12
Topics for Today

Measures of Central Tendency





Mean, Median, Mode
Sample and Population Mean
Weighted Means
Selecting Appropriate Measures of Central
Tendency
Measures of Dispersion


Variance
Standard Deviation
Descriptive vs. Inferential

Descriptive Statistics


Methods for organizing and summarizing
information
Inferential Statistics

Methods for drawing and measuring the
reliability of conclusions about a
population based on information obtained
from a sample of the population
Looking at This Data Set…
Student Performance in Class Tests
1
2
3
4
5
6
7
8
ID
Test 1
Test 2
Test 3
Test 4
2463
4140
1210
O649
2925
4194
4266
2517
B+
AD
D+
B
AB+
A-
A
A
F
B+
?
A
F
A
95
90
0
80.5
86
86.5
90.5
83.5
10
9.5
0
9
8.5
9
8.5
10
Overview







Mean
Median
Mode
Sample and Population Mean
Weighted Means
Selecting Appropriate Measures of
Central Tendency
Applying these measures
Mean
The mean of a set of n observations is the
arithmetic average
Mean of n observations x1, x2,x3,….xn is
x
 xi
n

i n
x
i 1
In Excel, =AVERAGE(insert range)
Median



The data value that is exactly in the
middle of an ordered list if the number of
pieces of data is odd
The mean of the two middle pieces of
data in an ordered list if the number of
pieces of data is even
The median is a typical value; it is the
midpoint of observations when they are
arranged in an ascending or descending
order
Mode




The most frequent data value; i.e., any
value having the highest frequency among
the observations
In Excel,you use the functions
=MEDIAN (insert range)
=MODE (insert range)
Unimodal, Bimodal, Multimodal data sets
Outliers
Sample and Population Means

Mean of a data set

Population mean if data set includes entire
population
X


i
N

Sample mean if data set is only a sample
of the population
x

x
i
n
Weighted Means

To calculate the mean when your
information is available only in the
form of summary data
C Interval
25 – 29.9
30 – 34.9
35 – 39.9
Freq
4
5
12
x

x
n
j
fj
Skewed Distributions
Skewed Distributions

When there is one mode and the distribution
is symmetric


Positive skew



mean, median, mode are the same
mean moves towards the positive tail
median also pulls towards the positive tail
Negative skew


mean moves towards the negative tail
median also moves towards the negative tail
Selecting Appropriate Measures

Mean



Median



affected by extreme values
includes all observations, therefore
comprehensive (useful for interval/ratio data)
not affected by the number of observations
reveals typical situations (used for ordinal data)
Mode

useful for nominal variables
Other Useful Calculations

In addition to the sum of data, Sx
we need to be able to calculate:
 x ;   x  x ;   x  x 
2
 x   x
2
2
 xy   x y
2
Variability or Spread



Mean and the median - limits
Range – coarse measure of variability
Percentiles





kth percentile is the point at which k percent of the
numbers fall below it and the rest are fall above it
25th percentile (lower quartile)
50th percentile (median)
75th percentile (upper quartile)
Interquartile range (difference between the 25th
percentile value and the 75th percentile value)
Describing the Spread

A five number summary




Median
Quartiles
Extremes
Variance and Standard Deviation


Measures spread about the mean
Standard deviation cannot be discussed
without the mean
Calculating Percentiles
In the list of twelve observations
2 4 7 11 11 11 11 14 16 16 24 29
Compute median, 25th and 75th percentiles
11  11
Median 
2
7  11
2
The upper quartile is the median of the 6 16  16
observations that fall above the median
2
The lower quartile is the median of the 6
observations that fall below the median
Five Number Summary






Median = 11
Lower Quartile = 9
Upper Quartile = 16
Extremes are 2 and 29
Can compute the range = 27
In a symmetric distribution, the lower
and upper quartiles are equally distant
from the median
Variance


Is the mean of the squares of the
deviations of the observations from their
2
mean
Xi  
2
Population variance  


N

Sample variance
s
2
x  x



i
n 1
2
Example
The heights, in inches for five starting players in a men’s
college basket ball team are:
67
72
76
76
84
Compute the mean and standard deviation.
x

=
x
n
75

x
xx
 x  x
67
72
76
76
84
-8
-3
1
1
9
64
9
1
1
81
375
0
156
2
Standard Deviation


s
Standard deviation is positive square
root of the variance
Variance in our basketball example:
2
x  x



i
n 1
2
156
s 
= 39
4
2
Formulas – Standard Deviation
Standard deviation of a sample
s
 x
i
x

2
n 1
Standard deviation of a population

 X

N
i
2
Example (Continued)
s
 x  x
i
n 1
s  39
s  6.24
2
Short Cut – Simpler Formula
Standard Deviation
of a sample
s
n
 x    x
2
2
n  n  1
 x 
Sum of the squares of data values,
i.e., you square each data value and
then sum those squared values
 x
Square of the sum of data values,
i.e., you sum all the data values and
then square that sum
2
2
Example (using the short cut)
2
x
x
67
72
76
76
84
4489
5184
5776
5776
7056

375
28281
  x    375 
2
 140625
2
5  28281   375 
s
5 4
780
s
20
s  39
s  6.24
2
Interpreting Std. Deviation


s and s 2 will be small when all the data are
close together
The deviations from the mean






Will be both positive and negative
Sum will always be 0
s is always 0 or a positive number
s = 0 means no spread; as s value increases,
the spread of the data increases
The units of s are the same as the original
observations
s is heavily influenced by outliers
Coefficient of Variation
CV is the standard deviation described
as a percent of the mean
CV =
 s
 100
 x
CV is useful when comparing different sets
of data where sample size and standard
deviation are different
Download