Numerical Descriptions of the Data

advertisement
Numerical Descriptions of the Data
Sections 2.1 & 2.2
Cathy Poliak, Ph.D.
cathy@math.uh.edu
Department of Mathematics
University of Houston
January 21, 2016
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
1 / 34
Outline
1
Introduction
2
Range
3
Variance and Standard Deviation
4
Calculating The Standard Deviation
5
Coefficient of Variation
6
Percentiles
7
Quartiles
8
The 1.5IQR Rule
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
2 / 34
Different Test Scores?
The following table shows the score of an exam for two different
sections from a sample of randomly selected 10 students from each of
the two sections.
Section A
65
66
67
68
71
73
74
77
77
77
Section B
42
54
58
62
67
77
77
85
93
100
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
3 / 34
Types of Measurements for Spread
The simplest useful numerical description of a distribution consists of
both a measure of center and a measure of spread.
Range = largest value - smallest value
Variance
Standard deviation
Coefficient of variation
Percentiles
Quartiles
IQR; Interquartile range
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
4 / 34
Range
The range is the difference between the largest value and the
smallest value in your dataset.
Range of Section A = 77 − 65 = 12
Range of Section B = 100 − 42 = 58
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
5 / 34
Measuring Spread: The Standard Deviation
Measures spread by looking at how far the observations are from
their mean.
Most common numerical description for the spread of a
distribution.
A larger standard deviation implies that the values have a wider
spread from the mean.
Denoted s when used with a sample. This is the one we calculate
from a list of values.
Denoted σ when used with a population. This is the "idealized"
standard deviation.
The standard deviation has the same units of measurements as
the original observations.
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
6 / 34
Definition of the Standard Deviation
The standard deviation is the average distance each observation is
from the mean.
Using this list of values from a sample: 3, 3, 9, 15, 15
The mean is 9.
By definition, the average distance each of these values are from
the mean is 6. So the standard deviation is 6.
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
7 / 34
Values of the Standard Deviation
The standard deviation is a value that is greater than or equal to
zero.
It is equal to zero only when all of the observations have the same
value.
By the definition of standard deviation determine s for the
following list of values.
I
I
2, 2, 2, 2 : standard deviation = 0
125, 125, 125, 125, 125: standard deviation = 0
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
8 / 34
Adding or Subtracting a Value to the Observations
Adding or subtracting the same value to all the original
observations does not change the standard deviation of the list.
Using this list of values: 3, 3, 9, 15, 15 mean = 9, standard
deviation = 6.
If we add 4 to all the values: 7, 7, 13, 19, 19
mean = 13, standard deviation = 6
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
9 / 34
Multiplying or Dividing a Value to the Observations
Multiplying or dividing the same value to all the original
observations will change the standard deviation by that factor.
Using this list of values: 3, 3, 9, 15, 15: mean = 9, standard
deviation = 6.
If we double all the values: 6, 6, 18, 30, 30
mean = 18, standard deviation = 12
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
10 / 34
Population Variance and Standard Deviation
If N is the number of values in a population with mean mu, and xi
represents each individual in the population, the the population
variance is found by:
2
σ =
PN
i=1 (xi
− µ)2
N
and the population standard deviation is the square root, σ =
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
√
January 21, 2016
σ2.
11 / 34
Sample Variance and Standard Deviation
Most of the time we are working with a sample instead of a population.
So the sample variance is found by:
Pn
(xi − x̄)2
2
s = i=1
n−1
√
and the sample standard deviation is the square root, s = s2 .
Where n is the number of observations (samples), xi is the value for
the i th observation and x̄ is the sample mean.
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
12 / 34
Calculating the Standard Deviation By Hand
When calculating by hand easier calculations are:
σ2 =
N
1X 2
xi − µ2x .
N
i=1
s
2
=
1
n−1
n
X
!
xi2
− nx̄
2
.
i=1
√
Then the standard deviation is the square root: σ =
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
σ 2 or s =
√
January 21, 2016
s2 .
13 / 34
Example: Section A
Determine the sample standard deviation of the following test scores.
Section A
65
66
67
68
71
73
74
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
77
77
77
January 21, 2016
14 / 34
Sample Standard Deviation of Section A test scores
Sample standard deviation is s = 4.77.
This implies that from the sample of the 10 students from section
A the tests scores has a spread, on average, of 4.77 points from
the mean of 71.50 points.
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
15 / 34
In Class Work Example
A statistics teacher wants to decide whether or not to curve an exam.
From her class of 300 students, she chose a sample of 10 students
and their grade were:
72, 88, 85, 81, 60, 54, 70, 72, 63, 43
Determine the sample mean.
What is the variance?
What is the standard deviation?
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
16 / 34
Add 10
Suppose the statistics instructor decides to curve the grade by adding
10 points to each score. What is the new mean, variance and standard
deviation?
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
17 / 34
Multiply by 2
For the following dataset the mean is x̄ = 4.5, the variance is s2 = 3.5
and the standard deviation is s = 1.870829.
3,
6,
2,
7,
4,
5
Now, multiply each value by 2. What is the new variance and the new
standard deviation?
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
18 / 34
Calculating Standard Deviation
For larger data sets use a calculator or computer software.
Each calculator is different if you cannot determine how to
compute standard deviation from your calculator ask your
instructor.
For this course we will be using R as the software.
The function for the sample standard deviation in R is
sd(data name$variable name).
Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics
Sections 2.1University
& 2.2
of Houston )
January 21, 2016
19 / 34
Download