Descriptive Statistics - Tamalpais Union High School District

Descriptive
Statistics
Ernesto Diaz
Faculty – Mathematics
Redwood High School
1
Basic Concepts
In statistics a population, includes all of
the items of interest, and a sample,
includes some of the items in the
population.
The study of statistics can be divided into
two main areas. Descriptive statistics,
has to do with collecting, organizing,
summarizing, and presenting data
(information). Inferential statistics, has
to do with drawing inferences or
conclusions about populations based on
information from samples.
2
Basic Concepts
Information that has been collected but
not yet organized or processed is called
raw data. It is often quantitative (or
numerical), but can also be qualitative
(or nonnumerical).
3
Basic Concepts
Quantitative data: The number of siblings
in ten different families: 3, 1, 2, 1, 5, 4, 3,
3, 8, 2
Qualitative data: The makes of five
different automobiles: Toyota, Ford,
Nissan, Chevrolet, Honda
Quantitative data can be sorted in
mathematical order. The number siblings
can appear as 1, 1, 2, 2, 3, 3, 3, 4, 5, 8
4
Measures of Central Tendency




Mean
Median
Mode
Symmetry in Data Sets
© 2008 Pearson Addison-Wesley. All rights reserved
5
Mean
The mean (more properly called the
arithmetic mean) of a set of data items is
found by adding up all the items and then
dividing the sum by the number of items.
(The mean is what most people associate
with the word “average.”)
x
The mean of a sample is denoted
(read
“x bar”), while the mean of a complete
population is denoted (the lower case
Greek letter mu).
© 2008 Pearson Addison-Wesley. All rights reserved

6
Mean
The mean of n data items x1, x2,…,
xn, is given by the formula
x

x
.
n
We use the symbol for “summation,”

(the Greek letter sigma).
 x  x1  x2   xn
© 2008 Pearson Addison-Wesley. All rights reserved
7
Example-find the mean

Find the mean amount of money
parents spent on new school supplies
and clothes if 5 parents randomly
surveyed replied as follows: $327
$465
$672
$150
$230
x $327  $465  $672  $150  $230

x

n
$1844

 $368.80
5
5
8
Weighted Mean
The weighted mean of n numbers
x1, x2,…, xn, that are weighted by the
respective factors f1, f2,…, fn is given
by the formula
x f 

w
.
f
© 2008 Pearson Addison-Wesley. All rights reserved
9
Median
Another measure of central tendency,
which is not so sensitive to extreme
values, is the median. This measure
divides a group of numbers into two
parts, with half the numbers below
the median and half above it.
© 2008 Pearson Addison-Wesley. All rights reserved
10
Median
To find the median of a group of items:
Step 1
Step2
the
the list.
Step 3
the
middle
Rank the items.
If the number of items is odd,
median is the middle item in
If the number of items is even,
median is the mean of the two
numbers.
© 2008 Pearson Addison-Wesley. All rights reserved
11
Example: Median
Ten students in a math class were polled as
to the number of siblings in their individual
families and the results were: 3, 2, 2, 1, 1,
6, 3, 3, 4, 2.
Find the median number of siblings for the
ten students.
Solution
In order: 1, 1, 2, 2, 2, 3, 3, 3, 4, 6
Median = (2+3)/2 = 2.5
© 2008 Pearson Addison-Wesley. All rights reserved
12
Mode
The mode of a data set is the value
that occurs the most often.
Sometimes, a distribution is bimodal
(literally, “two modes”). In a large
distribution, this term is commonly applied
even when the two modes do not have
exactly the same frequency
© 2008 Pearson Addison-Wesley. All rights reserved
13
Example: Mode for a Set
Ten students in a math class were polled as
to the number of siblings in their individual
families and the results were: 3, 2, 2, 1, 3,
6, 3, 3, 4, 2.
Find
the mode for the number of siblings.
Solution
3, 2, 2, 1, 3, 6, 3, 3, 4, 2
The mode for the number of siblings
is 3.
© 2008 Pearson Addison-Wesley. All rights reserved
14
Example: Mode for Distribution
Find the median for the distribution.
Value
1
2
3
4
5
Frequency
4
3
2
6
8
Solution
The mode is 5 since it has the
highest frequency (8).
© 2008 Pearson Addison-Wesley. All rights reserved
15
Measures of Position


Measures of position are often used
to make comparisons.
Two measures of position are
percentiles and quartiles.
16
To Find the Quartiles of a Set of Data


Order the data from smallest to
largest.
Find the median, or 2nd quartile, of
the set of data. If there are an odd
number of pieces of data, the
median is the middle value. If there
are an even number of pieces of
data, the median will be halfway
between the two middle pieces of
data.
17
To Find the Quartiles of a Set of Data
continued


The first quartile, Q1, is the median
of the lower half of the data; that is,
Q1, is the median of the data less
than Q2.
The third quartile, Q3, is the median
of the upper half of the data; that is,
Q3 is the median of the data greater
than Q2.
18
Example: Quartiles

The weekly grocery bills for 23
families are as follows. Determine
Q1, Q2, and Q3.
170
330
225
75
95
210
80
225
160
172
270
170
215
130
190
270
240
310
74
280
270
50
81
19
Example: Quartiles continued

Order the data:
50
75
74
80
81 95 130
160 170 170 172 190 210 215
225 225 240 270 270 270 280
310 330
Q2 is the median of the entire data set which is
190.
Q1 is the median of the numbers from 50 to 172
which is 95.
Q3 is the median of the numbers from 210 to
330 which is 270.
20
Measures of Dispersion
21
Measures of Dispersion
Sometimes we want to look at a
measure of dispersion, or spread, of
data. Two of the most common
measures of dispersion are the range
and the standard deviation.
22
Measures of Dispersion


Measures of dispersion are used to
indicate the spread of the data.
The range is the difference between
the highest and lowest values; it
indicates the total spread of the
data.
23
Example: Range


Nine different employees were
selected and the amount of their
salary was recorded. Find the range
of the salaries.
$24,000 $32,000
$26,500
$56,000
$48,000
$27,000
$28,500
$34,500
$56,750
Range = $56,750 $24,000 =
$32,750
24
Standard Deviation
One of the most useful measures of
dispersion, the standard deviation,
is based on deviations from the
mean of the data.
© 2008 Pearson Addison-Wesley. All rights reserved
Example: Deviations from the Mean
Find the deviations from the mean for all data
values of the sample 1, 2, 8, 11, 13.
Solution
The mean is 7. Subtract to find deviation.
Data Value
1 2
Deviation –6 –5
8
1
11 13
4 6
13 – 7 = 6
The sum of the deviations for a set is always 0.
© 2008 Pearson Addison-Wesley. All rights reserved
Standard Deviation
The variance is found by summing the squares of
the deviations and dividing that sum by n – 1
(since it is a sample instead of a population). The
square root of the variance gives a kind of average
of the deviations from the mean, which is called a
sample standard deviation. It is denoted by the
letter s. (The standard deviation of a population
is denoted the lowercase Greek letter sigma.)
,
© 2008 Pearson Addison-Wesley. All rights reserved
Standard Deviation

The standard deviation measures
how much the data differ from the
mean.
s
x  x
2
n 1
28
Calculation of Standard Deviation
The individual steps involved in this calculation
are as follows
Step
Step
Step
Step
Step
Step
1
2
3
4
5
6
Calculate the mean of the numbers.
Find the deviations from the mean.
Square each deviation.
Sum the squared deviations.
Divide the sum in Step 4 by n – 1.
Take the square root of the quotient
in Step 5.
© 2008 Pearson Addison-Wesley. All rights reserved
Example
Find the standard deviation of the sample
1, 2, 8, 11, 13.
Solution
The mean is 7.
Data
Value
Deviation
1
2
8
11
13
–6
–5
1
4
6
(Deviation)2
36
25
1
16
36
Sum = 36 + 25 + 1 + 16 + 36 = 114
© 2008 Pearson Addison-Wesley. All rights reserved
Example
Solution (continued)
Divide by n – 1 with n = 6:
114 114

 22.8.
6 1
5
Take the square root:
22.8  4.77.
© 2008 Pearson Addison-Wesley. All rights reserved
Example

Find the standard deviation of the
following prices of selected washing
machines:
$280, $217, $665, $684, $939,
$299
Find the mean.
x 665  217  684  280  939  299 3084

x


 514
n
6
6
32
Example continued, mean = 514
Data
217
280
299
665
684
939
Data - Mean
-297
-234
-215
151
170
425
0
(Data - Mean)2
(-297)2 = 88,209
54,756
46,225
22,801
28,900
180,625
421,516
33
Example continued, mean = 514


s
421,516
6 1
s
421,516
 290.35
5
The standard deviation is $290.35.
34
Interpreting Measures of Dispersion
A main use of dispersion is to
compare the amounts of spread in
two (or more) data sets. A common
technique in inferential statistics is
to draw comparisons between
populations by analyzing samples
that come from those populations.
35
Example: Interpreting Measures
Two companies, A and B, sell small packs of
sugar for coffee. The mean and standard
deviation for samples from each company
are given below. Which company
consistently provides more sugar in their
packs? Which company fills its packs more
consistently?
Company A
Company B
x A  1.013 tsp
xB  1.007 tsp
s A  .0021
sB  .0018
36
Example: Interpreting Measures
Solution
We infer that Company A most likely
provides more sugar than Company B
(greater mean).
We also infer that Company B is more
consistent than Company A (smaller
standard deviation).
37
Symmetry in Data Sets
The most useful way to analyze a data
set often depends on whether the
distribution is symmetric or nonsymmetric. In a “symmetric”
distribution, as we move out from a
central point, the pattern of
frequencies is the same (or nearly so)
to the left and right. In a “nonsymmetric” distribution, the patterns
to the left and right are different.
38
© 2008 Pearson Addison-Wesley. All rights reserved
Some Symmetric Distributions
© 2008 Pearson Addison-Wesley. All rights reserved
39
Non-symmetric Distributions
A non-symmetric distribution with
a tail extending out to the left,
shaped like a J, is called skewed
to the left. If the tail extends out
to the right, the distribution is
skewed to the right.
© 2008 Pearson Addison-Wesley. All rights reserved
40
Some Non-symmetric Distributions
© 2008 Pearson Addison-Wesley. All rights reserved
41
Chebyshev’s Theorem
For any set of numbers, regardless of
how they are distributed, the fraction
of them that lie within k standard
deviations of their mean (where k >
1) is at least
1
1
2
k .
© 2008 Pearson Addison-Wesley. All rights reserved
Example: Chebyshev’s Theorem
What is the minimum percentage of the items in a
data set which lie within 3 standard deviations of
the mean?
Solution
With k = 3, we calculate
1
1 8
1  2  1    .889  88.9%.
9 9
3
© 2008 Pearson Addison-Wesley. All rights reserved