Descriptive statistics: mean, median, standard deviation, variance

advertisement
Descriptive Statistics
In this section we’ll cover descriptive statistics, which consists of measures of central
tendency, variation or dispersion of the values around the central tendency, and the
shape of the distribution. Descriptive statistics are numbers used to summarize and
describe data.
Central Tendency. The term central tendency refers to the center or middle value of a
distribution. Measures of central tendency include the mean (or average), the median,
and the mode.

Mean. The mean is calculated by dividing the sum of all scores by the number of
observations. The mean is the most commonly used statistic for describing
distributions but can be overly influenced by extremely high or low values. The
formula for calculating the mean is
where n is the number of observations in the data set and x1 is the value for the
first observation, x2 is the value for the second observation, etc.

Median. The central tendency of income data, for example, is more appropriately
described by the median rather than the mean. The median is the middle of the
distribution. If there are an odd number of observations the median is the center
value in which exactly half of the observations are lower and exactly half of the
observations are higher. If there is an even number of observations, the median
is the average of the two middle values. State income data is frequently
reported, as in the following table, in terms of median values.
State
Maryland
New Jersey
Connecticut
Alaska
Hawaii
Massachusetts
New Hampshire
Virginia
California
Washington
Delaware
District of Columbia
Minnesota
Colorado
Utah
Nevada
Illinois
New York
Rhode Island
Wyoming
Vermont
Wisconsin
Arizona
Georgia
Pennsylvania
Kansas
Oregon
Texas
Nebraska
Iowa
2008 Median Income
$70,545
$70,378
$68,595
$68,460
$67,214
$65,401
$63,731
$61,233
$61,021
$58,078
$57,989
$57,936
$57,288
$56,993
$56,633
$56,361
$56,235
$56,033
$55,701
$53,207
$52,104
$52,094
$50,958
$50,861
$50,713
$50,177
$50,169
$50,043
$49,693
$48,980
In the following list of the 25 richest Americans in 2009, as reported in Forbes
Magazine September 30, 2009, the median value would be $14,500 million, the net
worth of Michael Dell, the founder of Dell Computers.
Name
William Gates III
Warren Buffett
Lawrence Ellison
Christy Walton & family
Jim C. Walton
Alice Walton
S. Robson Walton
Michael Bloomberg
Charles Koch
David Koch
Sergey Brin
Larry Page
Michael Dell
Steven Ballmer
George Soros
Donald Bren
Paul Allen
Abigail Johnson
Forrest Edward Mars
John Mars
Jacqueline Mars
Carl Icahn
Ronald Perelman
George B. Kaiser
Philip Knight

Net Worth
($mil)
50,000
Source
Microsoft
40,000
Berkshire Hathaway
27,000
Oracle
21,500
Wal-Mart
19,600
Wal-Mart
19,300
Wal-Mart
19,000
Wal-Mart
17,500
Bloomberg
16,000
manufacturing, energy
16,000
manufacturing, energy
15,300
Google
15,300
Google
14,500
Dell
13,300
Microsoft
13,000
hedge funds
12,000
real estate
11,500
Microsoft, investments
11,500
Fidelity
11,000
candy, pet food
11,000
candy, pet food
11,000
candy, pet food
10,500
leveraged buyouts
10,000
leveraged buyouts
9,500
oil & gas, banking
9,500
Nike
Mode. The mode is the value that occurs most frequently in a list of numbers.
For example, in the top forty home run hitters in baseball in the American League
for the 2009 season as listed below, the mode is 25. Please note that it is
possible for there to be more than one mode (bi-modal, tri-modal,etc.
distributions).
American League
Player
Carlos Pena
Mark Teixeira
Jason Bay
2009 Home
Runs
39
39
36
Aaron Hill
Adam Lind
Miguel Cabrera
Kendry Morales
Nelson Cruz
Evan Longoria
Michael Cuddyer
Russell Branyan
Ian Kinsler
Alex Rodriguez
Justin Morneau
Curtis Granderson
Nick Swisher
Paul Konerko
David Ortiz
Hideki Matsui
Joe Mauer
Jason Kubel
Jermaine Dye
Brandon Inge
Kevin Youkilis
Ben Zobrist
Jack Cust
Juan Rivera
Hank Blalock
Jose Lopez
Robinson Cano
Luke Scott
Johnny Damon
J.D. Drew
Jim Thome
Victor Martinez
Miguel Olivo
Jorge Posada
Torii Hunter
Michael Young
Carlos Quentin
36
35
34
34
33
33
32
31
31
30
30
30
29
28
28
28
28
28
27
27
27
27
25
25
25
25
25
25
24
24
23
23
23
22
22
22
21
Please look at the DescriptiveStatisticsCentralTendency video clip to learn how to
calculate these measures of central tendency using Excel.
Dispersion. Dispersion refers to the variation or spread of the values around the central
tendency in a distribution. The measures of dispersion that are most commonly cited
are the range, variation, and standard deviation.

Range. The range is simply the difference between the highest and lowest value
in a distribution. For example in our home run data above, the highest number of
home runs is 39 and the lowest number of home runs is 21, so the range would
be 39 – 21 = 18.

Variance. The variance of a sample indicates how the observations are spread
around the mean. Variance is calculated with the following formula
where is the value of the variable for a particular observation, x is the mean of
the distribution, and n is the number of observations in the sample. The variance,
while used in a number of calculations statistics that we’ll go over later in this
course, in and of itself doesn’t really provide anything particularly useful for us in
terms of describing our distribution. A more useful statistic related to the
dispersion around the mean is a simple extension of the variance, the standard
deviation.
 Standard Deviation. The standard deviation s is the square root of the variance
s2. The formula for the standard deviation is:
The standard deviation is particularly useful. As depicted below, in a normal
distribution approximately 68 % of the observations would fall within one
standard deviation from the mean. That is, approximately 68% of the
observations would fall between one standard deviation below the mean and
one standard deviation above the mean. Approximately 95% of observations lie
between ± 2 standard deviations from the mean and almost 100% of
observations lie between ± 3 standard deviations from the mean.
Please look at the StandardDeviation video clip to learn how to calculate the
variance and standard deviation using Excel.
Shape. The shape of a curve (distribution of scores) may be symmetrical or
asymmetrical. If symmetrical the two halves of the curve will almost be mirror images of
each other.
 Normal curve. A symmetrical distribution is also referred to as the normal curve
and is presented below.
 Skewed. If the numbers tend to cluster at the lower end of the scale and are
fewer at the higher end of the scale the distribution is referred to as right
skewed. On the other hand, if the numbers tend to cluster at the higher end and
are fewer at the lower end the distribution is referred to as left skewed. Skewed
distributions are graphed below.
 Kurtosis. A final property of shapes is how peaked or flat a distribution is. This
property is known as kurtosis and is graphed below.
Download