A statistic is a characteristic or measure obtained by using the data

advertisement
1
2
Parameters and statistics
DEFINITION:
The population is the entire group of objects or individuals under study, about which
information is wanted.
A unit is an individual object or person in the population. The units are often called
subjects if the population consists of people.
A sample is a part of the population that is actually used to get information.
A variable is a characteristic of interest to be measured for each unit in the sample.
The size of the population is denoted by the capital letter N.
The size of the sample is denoted by the small letter n.
Population
Unit
Population size N = 16
Sample size n=4
Sample
DEFINITION:
A parameter is a numerical value that would be calculated using all of the values of the
units in the population.
A statistic is a numerical value that is calculated using all of the values of the units in a
sample.
Tip: One way to remember this distinction is this: The letter p is for population and
parameter, while the letter s is for statistic and sample.
Let's Do It! 1 (1min.) Parameter or Statistic?
According to the Campus Housing Fact Sheet at a Big-Ten
University, 60% of the students living in campus housing are in-state
residents. In a sample of 200 students living in campus housing,
56.5% were found to be in-state residents. Circle your answer.
(a)
In this particular situation, the value of 60% is a (parameter,
statistic).
(b)
In this particular situation, the value of 56.5% is a
(parameter, statistic).
3
DEFINITIONS:
A unit is the item or object we observe. When the object is a person, we refer to the unit
as a subject.
An observation is the information or characteristic recorded for each unit.
A characteristic that can vary from unit to unit is called a variable.
A collection of observations on one or more variables is called a data set.
A Discrete Variable :
can only take on a finite (or countable)
number of possible values. For
example, the number of correct answers
on a five-question, multiple-choice test
is a discrete variable.
0
Continuous:
1
2
3
4
5
can take on any value in an interval (or
collection of intervals). For example, the
amount of water poured into a 50-mL
glass container.
0
50
50 ml
20 ml
0 ml
4
DEFINITIONS:
Qualitative variables are those which classify the units into
categories. The categories may or may not have a natural ordering
to them. Qualitative variables are also called categorical variables.
Quantitative variables have numerical values that are
measurements (length, weight, and so on) or counts (of how
many). Arithmetic operations on such numerical values do have
meaning. We further distinguish quantitative variables based on
whether or not the values fall on a continuum. A discrete variable
is one for which you can count the number of possible values. A
continuous variable can take on any value within a given interval.
Qualitative
Type of Religion
Type of
Zip Code
Variable
Continuous
Length
Quantitative
Discrete
# of Children
5
Let's Do It! 2 What Type of Variable?
Hurricane Charles, in August 2004, has been blamed for at least 16
deaths. Listed below is information on other major storms and
hurricanes that occurred from 1994 to 2003.
StormName
Tropical Storm
Alberto
Hurricane Marilyn
Hurricane Opal
Hurricane Fran
Hurricane Bonnie
Hurricane Georges
Hurricane Floyd
Tropical Storm Allison
Hurricane Isabel
Date
Category
Estimated
Damage/Cost*
Jul-94
Sep-95
Oct-95
Sep-96
Aug-98
Sep-98
Sep-99
Jun-01
Sep-03
n/a
2
3
3
3
2
2
n/a
2
$1.2billion
$2.5billion
$3.6billion
$5.8billion
$1.1billion
$6.5billion
$6.5billion
$5.1billion
$4.0billion
Deaths
32
13
27
37
3
16
77
43
47
For each variable, determine whether it is qualitative or quantitative. If
the variable is quantitative, state whether it is discrete or continuous.
(a) The name of the storm.
(b) The date the storm occurred.
(c) The category of the storm.
(d) The estimated amount of damage or cost of the storm.
(e) The number of deaths that occurred.
6
Think About It!
A number of packages are brought to a mailing center. The packages
are weighed and the results are recorded as 9 pounds, 5 pounds, 4
pounds, 12 pounds, 20 pounds, and so on. These values are all
whole numbers. Does this imply that the variable “weight” is discrete?
The variable “weight” is continuous. We have just measured weight to
the nearest pound. A package having a value for weight of 12 pounds
could actually weigh 12.2 pounds, or 11.9975 pounds, or any value in
the interval from 11.5 to 12.5.
Key Point: Don’t let the appearance of the data after they are recorded be
misleading as to their type.
Consider again the variable “weight.” Packages weighing under 5
pounds are classified as light and cost a fixed amount to ship.
Packages weighing over 20 pounds are classified as heavy and cost
a fixed amount to ship. Packages weighing between 5 and 20 pounds
are classified as medium and cost a fixed amount to ship. We record
the variable “weight,” which takes on the values light, medium, or
heavy. Now the variable “weight” is qualitative.
Key Point: The type of variable depends mainly on the measuring
process, not on the property being measured. It is important to ask
many questions about the data and how they were obtained, as
discussed in the next section.
7
Central tendency of a set
DATA SET 1
Suppose you had to give a single number that
would represent the most typical age for the 20
subjects.
What number would you choose?
Measures of center are numerical values that
tend to report in some sense the middle of a
set of data -- we will focus on the mean and
the median.
If the data are a sample, the mean and median
would be called statistics. If the data form an
entire population then these measures of
center would be called parameters.
Subject #
Gender
Age
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
M
M
F
F
F
F
M
F
M
M
M
F
F
F
M
F
M
F
M
M
45
41
51
46
47
42
43
50
39
32
41
44
47
49
45
42
41
40
45
37
Mean
DEFINITION:
The mean of a set of n observations is simply the sum of the
observations divided by the number of observations, n.
Mean age of the 20 subjects in the medical study -add the 20 ages up and divide by 20:
45  41  51  46  4745  37
 43.35 years
20
Special notation:
If x 1 , x 2 ,..., x n denote a sample of n observations,
then the mean of the sample is called "x-bar" and is denoted by:
x 
x
n
i

x1  x2 
n
 xn
If you have all of the population values ... the mean of the population
= add all of the values up and divide by how many there are.
The mean of a population is denoted by the Greek letter μ.
8
Example 
Mean Number of Children per Household
Problem
Suppose that the number of children in a simple random sample of 10
households is as follows: 2, 3, 0, 2, 1, 0, 3, 0, 1, 4
(a)
Calculate the sample mean number of children per household.
(b)
Interpret your answer.
(c)
Suppose that the observation for the last household in the
above list was incorrectly recorded as 40 instead of 4.What
would happen to the mean?
Solution
(a)
The sample mean number of children per household is given
by:
x
(b)
2  3  0  2  1  0  3  0  1  4 16

 16
. .
10
10
We expect about 1.6 children per household, on average. We
report 1.6 even though it is not possible to have 1.6 children in
any one given household; that is, the 1.6 is not rounded up to
say 2. We are reporting a value that we would expect on
average, over many samples of 10 households.
(c) The sample mean would now be given by:
x
2  3  0  2  1  0  3  0  1  40 52

 5.2 .
10
10
Note that 9 of the 10 observations are less than the mean. The
mean is sensitive to extreme observations. Most graphical
displays would have detected this outlying observation.
9
Think About It! Is the Mean Always the Center? Be Careful!
Suppose a sample of size n=10 observations is obtained.
Can the mean, x , be larger than the maximum value or less than the
minimum value? If yes, give an example.
Can the mean, x , be the minimum value? Give an example.
Can the mean, x , be the maximum value? If yes, give an example.
Can the mean, x , be exactly the midpoint between the minimum and
maximum value (when the minimum does not equal the maximum)?
If yes, give an example.
Can the mean, x , be exactly the second smallest value (out of the
10, not all equal observations, when they are ordered from smallest
to largest)? If yes, give an example.
Can the mean, x , be not equal to any value in the sample? If yes,
give an example.
Let's Do it! 3 (1min.)
A Mean Is Not Always Representative
Kim's biology test scores are 7, 98, 25, 19, and 26.
Calculate Kim's mean test score. Explain why the mean does not do
a very good job at summarizing Kim's test scores.
10
Let's Do It! 4 (2 min.)1
Combining Means
We have seven students. The mean score for three of these students
is 54 and the mean score for the four other students is 76.
What is the mean score for all seven students?
The mean = the point of equilibrium, the point where the
distribution would balance.
If the distribution is symmetric, as in the first picture at the left,
the mean would be exactly at the center of the
distribution.
1 2
3
Mean =2
1 2
5
As the largest observation is moved further to
the right, making this observation somewhat
extreme, the mean shifts towards the
extreme observation.
Mean =2.5
1 2
11
Mean =4
If a distribution appears to be
skewed, we may wish also to
report a more resistant measure
of center.
11
The Mean of Group Data /Frequency Tables
The procedure for finding the mean for grouped data uses the
midpoints of the classes.
This procedure is shown next.
Example
The data represent the number of miles
run during one week for a sample of 20
runners.
Solution
The procedure for finding the mean for
grouped data is given here.
Step 1 Make a table as
shown.
Step 2 Find the midpoints
of each class and enter them
in column C.
Step 3 For each class, multiply the frequency by the midpoint, as
shown, and place the product in column D.
1 .8 = 8 , 2 . 13 = 26 etc.
The completed table is shown here.
Step 4 Find the sum of column D.
Step 5 Divide the sum by n to get the mean.
12
Let's Do It! 5:
Eighty randomly selected light bulbs were tested to determine their
lifetime in hours. The frequency table of the results is shown in table.
Find the average lifetime of a light bulb.
Life interval in Frequency
hours
53-63
64-74
75-85
86-96
97-107
108-118
6
12
25
18
14
5
Let's Do It! 6
The cost per load (in cents) of 35 laundry detergents tested by
consumer organization is given below.
Class limit
13-19
20-26
27-33
34-40
41-47
48-54
55-61
62-68
Frequency
2
7
12
5
6
1
0
2
13
A measure of center that is more resistant to extreme values is the
median.
Median
DEFINITION:
The median of a set of n observations, ordered from smallest to
largest, is a value such that half of the observations are less than or
equal to that value and half the observations are greater than or
equal to that value.
If the number of observations is odd, the median is the middle
observation.
If the number of observations is even, the median is any number
between the two middle observations, including either of the two
middle observations.
To be consistent, we will define the median as the mean or average
of the two middle observations.
Location of the median: (n+1)/2, where n is the number of
observations.
The ages of the n = 20 subjects...
Calculating (n+1)/2 we get (20+1)/2 = 10.5. So the two middle
observations are the 10th and 11th observations, namely 43 and 44.
The median is the mean of these two middle observations,
(43+44)/2=43.5 years.
32
37
39
40
41
41
41
42
42
43
44
45
10th obs 11th obs
47
47
49
50
51
median = 43.5
45
45
46
14
Let's Do It! 7 (2 min.)1
Median Number of Children per Household
Find the median number of children in a household from this sample
of 10 households, that is, find the median of
Observation Number:
Number of Children:
1
2
2
3
3
0
4
1
5
4
6
0
7
3
8
0
9
1
10
2
(a)
Order the observations from smallest to largest:
(b)
Median = ______________
(c)
What happens to the median if the fifth observation in the first
list was incorrectly recorded as 40 instead of 4?
(d)
What happens to the median if the third observation in the
first list was incorrectly recorded as -20 instead of 0?
Note:
The median is resistant—that is, it does not change,
or changes very little, in response to extreme
observations.
15
Another Measure—The Mode
DEFINITION:
The mode of a set of observations is the most frequently occurring
value; it is the value having the highest frequency among the
observations.
The mode of the values: {0, 0, 0, 0, 1, 1, 2, 2, 3, 4} is 0.
For {0, 0,
(bimodal)
0,
1,
1,
2,
2,
2,
3,
4} two modes, 0 and 2
What would be the mode for { 0, 1, 2, 4, 5, 8 } ?
For {0, 0, 0, 0, 0, 1, 2, 3, 4, 4, 4, 4, 5 } ?
The mode is not often used as a measure of center for quantitative
data.
The mode can be computed for qualitative data.
The modal race category is “white.” If categories were given coded
as:
1=White, 2=Asian, 3=African-American, 4=Hispanic, 5=American
Indian, 6=No category listed,
then the mode would be the value 1.
80
70
Percent
60
50
40
30
20
10
0
American
Indian
No
Category
Hispanic
AfricanAmerican
Race
Asian
White
16
Example
Different Measures Can Give Different Impressions
Problem : The famous trio—the mean, the median, and the mode—
represent three different methods for finding a so-called “center”
value. These three values may be the same but are more likely going
to be different. When they are different, they can lead to different
interpretations of the data being summarized.
Consider the annual incomes of five families in a neighborhood:
$12,000
$12,000
$30,000
$90,000
$100,000
(a)
(b)
(c)
(d)
Calculate the average income.
Calculate the median income.
Calculate the modal income.
If you were trying to promote that this is an affluent
neighborhood, which measure might you prefer to present?
(e)
If you were trying to argue against a tax increase, which
measure might you prefer to present?
(f)
If you want to represent these values with the income that is in
the middle, which measure might you prefer to present?
The mean income is:
x
$48,800
100,000  90,000  30,000  12,000  12,000
 $48,800
5
The median income is:
$30,000
The modal income is: $12,000
If you were trying to promote that this is an affluent neighborhood,
you might prefer to report the mean income.
If you were trying to argue against a tax increase, you might argue
that income is too low to afford a tax increase and report the mode.
17
Effect of the Shape of the Distribution on the Mean, Median, Mode
Bell-shaped, Symmetric
Bimodal
50%
m ean=m edian=m ode
mean=median
two modes
Skewed Right
Skewed Left
50%
mode
mean
median
HW 1 page 33: 1, 4, 5, 8, 12, 15,19
50%
mean
mode
median
18
MEASURING VARIATION OR SPREAD
Both sets of data have the same mean, median and mode but the
values obviously differ in another respect -- the variation or spread of
the values.
The values in List 1 are much more tightly clustered around the
center value of 60. The values in List 2 are much more dispersed or
spread out.
List 1: 55, 56, 57, 58, 59, 60, 60, 60, 61, 62, 63, 64,
65
mean = median = mode = 60
X
X
XXXXXXXXXXX
35
40
45
50
55
60
65
.
70
75
80
85
List 2: 35, 40, 45, 50, 55, 60, 60, 60, 65, 70, 75, 80,
85
mean = median = mode = 60
X
X
X
X
35
40
45
50
X
55
X
X
X
X
X
X
X
X
60
65
70
75
80
85
.
19
Range
The range is the simplest measure of variability or spread.
Range is just the difference between the largest value and the
smallest value.
Range can give a distorted picture of the actual pattern of variation.
Two distributions: same range but different patterns of variation.
The first distribution has most of its values far from the center, while
the second distribution has most of its values closer to the center.
X
X
X
20
X
X X
X X
X X X
X X X X X X X X X X
21 22 23 24 25 26 27 28 29 30
X
X X X
X X X X X
X X X X X X X X X X
20 21 22 23 24 25 26 27 28 29 30
20
Interquartile Range
The interquartile range measures the spread of the middle 50% of the
data. You first find the median (represented by Q2—the value that
divides the data into two halves), and then find the median for each
half. The three values that divide the data into four parts are called
the quartiles, represented by Q1, Q2, and Q3.
The difference
between the third quartile and the first quartile is called the
interquartile range, denoted by IQR=Q3-Q1.
Finding the Quartiles
1. Find the median of all of the observations.
2. First Quartile = Q1 = median of observations that fall below the
median.
3. Third Quartile = Q3 = median of observations that fall above the
median.
Notes
 When the number of observations is odd, the middle observation is
the median. This observation is not included in either of the two
halves when computing Q1 and Q3.
 Although different books, calculators, and computers may use
slightly different ways to compute the quartiles, they are all based
on the same idea.
 In a left-skewed distribution, the first quartile will be farther from the
median than the third quartile is. If the distribution is symmetric, the
quartiles should be the same distance from the median.
21
Example Quartiles for Age
The ages of the 20 subjects in the medical study are listed below in
order.
32,
37,
39,
40,
41,
41,
41,
42,
42,
43,
44,
45,
45,
45,
46,
47,
47,
49,
50,
51
The histogram of the ages is also provided.
32
(a)
Calculate the median age.
(b)
Calculate the first Quartile Q1 for this age data.
(c)
Calculate the third Quartile Q3 for this age data.
(d)
Calculate the range for this age data.
37
39
40
41
41
41
42
42
43
44
45
45
45
46
47
47
49
50 51
median = 43.5
Q1 = 41
Q3 = 46.5
Count
We see that the distribution of age is
approximately symmetric and that the
quartiles are about the same distance from
the median.
8
6
4
2
30
35
40
45
50
55
The quartiles are actually the 25th, 50th, and 75th percentiles.
DEFINITION:
The pth percentile is the value such that p% of the observations fall at or
below that value and (100 - p)% of the observations fall at or above that
value.
22
Five-number summary:
Minimum,
Q1,
Median,
Q3,
Maximum
To Build a Basic Boxplot
 List the data values in order from smallest to largest.
 Find the five number summary: minimum, Q1, median, Q3, and
maximum.
 Locate the values for Q1, the median and Q3 on the scale. These
values determine the “box” part of the Boxplot. The quartiles
determine the ends of the box, and a line is drawn inside the box
to mark the value of the median.
 Draw lines (called whiskers) from the midpoints of the ends of the
box out to the minimum and maximum.
Example 
Five-Number Summary and Boxplot for Age
Problem
Consider the (ordered) ages of the 20 subjects in a medical study :
32,
44,
37,
45,
39,
45,
40,
45,
41,
46,
41,
47,
41,
47,
42,
49,
42,
50,
43,
51
The five-number summary for the age data is given by:
min = 32, Q1 = 41, median = 43.5, Q3 = 46.5, and max = 51.
Draw the basic boxplot.
23
Side-by-side boxplots are helpful for comparing two or more
distributions with respect to the five-number summary.
Although the median of the first process
is closer to the target value of 20.000
cm, the second process produces a
less variable distribution.
Using the 1.5 x IQR Rule to Identify Outliers and Build a Modified
Boxplot
 List the data values in order from smallest to largest.
 Find the five number summary: minimum, Q1, median, Q3, and
maximum.
 Locate the values for Q1, the median and Q3 on the scale. These
values determine the “box” part of the boxplot. The quartiles
determine the ends of the box, and a line is drawn inside the box
to mark the value of the median.
 Find the IQR = Q3 – Q1.
 Compute the quantity STEP = 1.5 x (IQR)
 Find the location of the inner fences by taking 1 step out from
each of the quartiles
lower inner fence = Q1 – STEP;
upper inner fence = Q3 + STEP.
 Draw the lines (whiskers) from the midpoints of the ends of the box
out to the smallest and largest values WITHIN the inner fences.
 Observations that fall OUTSIDE the inner fences are considered
potential outliers. If there are any outliers, plot them individually
along the scale using a solid dot.
24
Five-number summary:
Min=1
Q1=21
Median=32
Q3=66
Max=325
Inner Fences
Potential Outliers
Outside
value
Far Outside
value
Farthest observations that
are not potential outliers
Example Any Age Outlier?
Let’s apply the "rule of thumb" to our age data set to assess if there
are any outliers.
(a) Construct the fences for the modified boxplot based on the
1.5 * IQR rule.
(b)
Are there any outliers using the 1.5 * IQR rule?
(c)
Construct the modified boxplot.
25
Let's Do It! 8( 3min)
26
Let's Do It! 9 (5min)
Comparing Ages—Antibiotic Study
Variable = age for 23 children randomly assigned to one of two
treatment groups.
(a)
Give the five-number summary for each of the two
treatment groups. Comment on your results.
Amoxicillin Group (n=11): 8 9 9
Five-number summary:
10
10
Cefadroxil Group (n=12): 7 8
Five-number summary:
9
9
9
11
10
11 12 14 14 17
10 11 12 13 14 16
(b) Make side-by-side Boxplots for the antibiotic study data in part a.
(c)
Using our “rule of thumb,” are there any outliers for the
Amoxicillin group? If so, modify your Boxplot above.
(d)
Using our “rule of thumb,” are there any outliers for the
Cefadroxil group? If so, modify your Boxplot above.
27
Standard Deviation
.…...a measure of the spread of the observations from the mean.
.……think of the standard deviation as an “average (or standard)
distance of the observations from the mean.”
Example Standard Deviation—What Is It?
Deviations:
-4,
1,
3
Squared Deviations: 16,
1,
9
----------------------------------------------------------------------------------------Observation Deviation Squared Deviation
 x  x 2
x x
x
----------------------------------------------------------------------------------------0
0 - 4 = -4
16
5
5-4= 1
1
7
7-4= 3
9
----------------------------------------------------------------------------------------mean = 4 sum always = 0
sum = 26
sample variance 
 4 2  1 2   3 2
31

sample standard deviation  13  36
.
16  1  9 26

 13
2
2
28
Interpretation of the Standard Deviation
Think of the standard deviation as roughly an average distance of the
observations from their mean. If all of the observations are the same,
then the standard deviation will be 0 (i.e. no spread). Otherwise the
standard deviation is positive and the more spread out the
observations are about their mean, the larger the value of the
standard deviation.
If x 1 , x 2 ,..., x n denote a sample of n observations, the sample
variance is denoted by:
s
2
 x

i
 x
2
n 1
Sample standard deviation, denoted by s , is the square root of the
variance: s 
s2 .
The population standard deviation, denoted by the Greek letter

(sigma), is the square root of the population variance and is
computed as:  

2

 x
i
 
N
2
.
Remarks:

The variance is measured in squared units. By taking the
square root of the variance we bring this measure of spread
back into the original units.

Just as the mean is not a resistant measure of center, since
the standard deviation used the mean in its definition, it is
not a resistant measure of spread. It is heavily influenced by
extreme values.

There are statistical arguments that support why we divide
by n  1 instead of n in the denominator of the sample
standard deviation.
29
Shortcut formulas for computing the variance and standard deviation
are presented next and will be used in the remainder of the chapter
and in the exercises. These formulas are mathematically equivalent
to the preceding formulas and do not involve using the mean. They
save time when repeated subtracting and squaring occur in the
original formulas. They are also more accurate when the mean has
been rounded.
Example
Consistency of Weight Loss Program
In a recent study of the effect of a certain diet on weight reduction, 11
subjects were put on the diet for two weeks and their weight loss/gain
in lbs was measured (positive values indicate weight loss).
1, 1, 2, 2, 3, 2, 1, 1, 3, 2.5, -23.
What is the standard deviation of the weight loss?
Solution
 x 1  1....2.5  23 
 4.5 ,
x
2
12  12 ...2.52  (23) 2  569.25
The standard deviation of this sample is s 
569.25  ( 4.5) 2 / 11
 7.5327
10
Let's Do It! 10
Emergency Room Patients
The following are the ages of a sample of 20 patients seen in the
emergency room of a hospital on a Friday night.
35 32 21 43 39 60 36 12 54 45
37 53 45 23 64 10 34 22 36 55
Find the standard deviation of the ages.
30
Variance and Standard Deviation for Grouped Data
The procedure for finding the variance and standard deviation for
grouped data is similar to that for finding the mean for grouped data,
and it uses the midpoints of each class.
Example
The data represent the number of
miles that 20 runners ran during one
week. Find the variance and the
standard deviation for the frequency
distribution of the data.
Solution
Step1
Make a table as
shown, and find the midpoint of
each class.
Step 2
Multiply the frequency
by the midpoint for each class, and
place the products in column D.
1 .8 = 8, 2 . 13 =26, . . . , 2 .38
= 76
Step 3
Multiply the frequency
by the square of the midpoint, and
place the products in column E.
1 .82 = 64, 2 . 132 = 338, . . . , 2
.382 = 2888
Step 4
Find the sums of columns B, D, and E. The sum of column
B is n, the sum of column D is  f i xm , and the sum of column E is  f i xm2 .
The completed table is shown.
Step 5
variance.
Substitute in the formula and solve for s2 to get the
Step 6
Take the square root to get the standard deviation.
31
Let's Do It! 11
The data show distribution of the birth weight ( in oz.) of 100
consecutive deliveries. Find the variance and the standard deviation.
Interval
29.50-69.45
69.50-89.45
89.50-99.45
99.50-109.45
109.50-119.45
119.50-129.45
129.50-139.45
139.50-169.45
Frequency
5
10
11
19
17
20
12
6
HW page 33: 2, 3, 9, 13, 35, 37
32
Let's Do It! 12( 8 min).16A Transformation
Data on number of children for 10 households in a neighborhood:
2, 3, 0, 2, 1, 0, 3, 0, 1, 4
Mean = 1.6 and standard deviation = 1.43.
We wish to summarize the number of people in a household. Each
household has two adults so we can simply add the value 2 to each
number in the list.
4, 5, 2, 4, 3, 2, 5, 2, 3, 6
(a)
Find the mean and the standard deviation of this new set of
observations and compare them to those for the original
observations. How did the mean change? How did the
standard deviation change?
(b)
Summarize how adding the same constant to each observation
affects the mean and standard deviation of the observations.
Knowing how the standard deviation is computed, does this
make sense?
Suppose each child receives a weekly allowance of $3. The total
allowance expense in a household can be obtained by multiplying
every number in the original list by 3.
6, 9, 0, 6, 3, 0, 9, 0, 3, 12
(c)
Find the mean and the standard deviation of this new set of
observations and compare them to those for the original
observations. How did the mean change? How did the
standard deviation change?
(d)
Summarize how multiplying the same constant to each
observation affects the mean and standard deviation of the
observations.
Knowing how the standard deviation is computed, does this
make sense?
33
(e)
Suppose that for a local recreation program, 3 credits are
deducted for each child in the household. The adjustment in
credit hours can be obtained by multiplying every number in the
original list by –3. Note the multiplier is now negative.
Determine the new values and find the sample mean and the
sample standard deviation for these new values.
New Values
Mean for the new values
values
Standard deviation of the new
Note: Even though the multiplier was negative, the sign for the
new standard deviation is positive.
(f)
The cost is $5 for each child to enter a children indoor play
park. Adults are free. Each household also has a coupon to
save $2.
Without doing arithmetic, can you state what the mean and
standard deviation would be if: Y = 5(X) - 2?
34
Linear Transformation Rules
If X represents the original values,
x
is the average of the original
values, and s X is the standard deviation of the original values and
the new values, represented by Y, are a linear transformation of X,
Y=aX+b, then: the mean for Y is given by:
y  ax  b
and the standard deviation for Y is given by:
sy  a s x
Example Temperature Transformation
In a recent letter from one of your cousins in Europe, he stated that
this past summer had been very hot. In particular, the high
temperature of each day for a week was
X = Temp (Celsius):
Monday Tuesday Wednesday Thursday Friday Saturday Sunday
40
41
39
41
41
40
38
The mean and standard deviation are:
x  40 degrees Celsius
s X  1.15 degrees Celsius
Temperature in the Fahrenheit scale, Y = Temperature in Fahrenheit,
is related to temperature in the Celsius scale, X, by the following
linear transformation:
9
9
F  C  32
or in terms of Y and X: Y  X  32
5
5
So the mean and standard deviation for temperature in the
Fahrenheit scale are:
9
40 + 32 = 72 + 32 = 104 degrees Fahrenheit
5
9
sY  1.15 = 2.07 degrees Fahrenheit
5
y 
Now you can understand just how hot it was! Better than that, you
did not need to transform each value. You just used what you
learned in statistics.
35
Let's Do It! 13 17
Standardization: A Special Transformation
Let’s perform a special transformation of the original data on the
number of children in a household:
2, 3, 0, 2, 1, 0, 3, 0, 1, 4
(a) The first step: subtract the mean
x  x.
x from each number in the list,
(b) The second step: divide the difference
deviation s X .
x  x by the standard
(c) Calculate the mean and standard deviation of the resulting values.
Mean = _________
Standard Deviation = _______
A variable X is said to be standardized if the variable has a mean of
zero and a standard deviation of 1.
Note that the standardized variable
xx
sX
can be expressed in the
1
xx  1   x 
   x    with a   ,
form of a linear transformation, s
 sX   sX 
 sX 
X
 x 
  .
b

and
 sX 
Download