Transforms transform

advertisement
Transforms
• What does the word transform mean?
Transforms
• What does the word transform mean?
– Changing something into another thing
Transforms
• What does the word transform mean?
– Changing something into another thing
• In statistics it refers to changing a
distribution into a different distribution
Transforms
• What does the word transform mean?
– Changing something into another thing
• In statistics it refers to changing a
distribution into a different distribution
– How can you change a distribution?
Linear Transforms
• A linear transform happens when you add or
multiply a constant with EACH number in a
distribution
– Usually has the form zi = a(xi) + b
• Where y is the new “transformed” number, x is the
old “untransformed number” and a and b are any
constants (including zero!)
Linear Transforms
• What would happen to the mean, the
variance, the standard deviation if you
applied a linear transform?
Linear Transforms
• adding or multiplying by a constant affects
the mean
• multiplying by a constant affects the
variance and standard deviation but adding
a constant does not!
Linear Transforms
• More formally, when adding a constant:
zi  xi  c
• the mean of a new distribution z is the mean
of the old distribution x plus the constant c

z  x c
Linear Transforms
• More formally, when adding a constant:
zi  xi  c
• the variance of the distribution z is the same
as the variance of the distribution x

S S
2
z
2
x
Linear Transforms
• More formally, when multiplying by a
constant:
z  c(x )
i
i
• the mean of the distribution z is the mean of
x multiplied by the constant c

z  c(x )
Linear Transforms
• More formally, when multiplying by a
constant:
z  c(x )
i
i
• the variance of the distribution z is the
variance of x multiplied by the square of the

constant c
2
2
2
Sz  c (Sx )
Linear Transforms
• More formally, when multiplying by a
constant:
z  c(x )
i
i
• And the standard deviation

Sz  c(Sx )
Linear Transforms
• If these features of transforms aren’t
intuitive for you, work through pages 34
and 35!
Linear Transforms
• Notice that you can work backward from
what you want the mean or standard
deviation to be because:
– Add or subtract a constant to change the mean
– Multiply or divide a constant to change the
standard deviation
The Z Transform
The Z Transform
• What if you wanted a very specific
distribution - one with a mean of zero and a
standard deviation of one
• Why on earth would you want THAT?
Normal Distributions
• Often we can assume that a set of numbers
are normally distributed
• Normally distributed numbers have
interesting characteristics
Normal Distributions
• Importantly, the probability of any number
being of a particular value can be computed
using the gaussian function:
1
F
e
 2
(x  )2
2 2
• Which you will almost certainly never need
to 
compute yourself!
Normal Distribution
• Probability of a score is the height on the curve
Gaussian (Normal) Distribution
0.6
probability
0.5
standard
deviation
0.4
0.3
0.2
0.1
0
score
mean
The Normal Distribution
• 34% of scores fall between the mean and 1
standard deviation above the mean
Gaussian (Normal) Distribution
0.6
0.5
probability
0.4
0.3
34%
0.2
0.1
0
-4
-3
-2
-1
0
score
1
2
Standard Deviations
3
4
The Normal Distribution
• 34% of scores fall between the mean and 1
standard deviation below the mean
Gaussian (Normal) Distribution
0.6
0.5
probability
0.4
0.3
34%
0.2
0.1
0
-4
-3
-2
-1
0
score
1
2
Standard Deviations
3
4
The Normal Distribution
• 68% of scores fall between the 1 standard
deviation below and 1 standard deviation
above the mean
Gaussian (Normal) Distribution
0.6
0.5
probability
0.4
0.3
34% 34%
0.2
0.1
0
-4
-3
-2
-1
0
score
1
2
Standard Deviations
3
4
The Normal Distribution
• 96% of scores fall between the 2 standard
deviations below and 2 standard deviations
above the mean
Gaussian (Normal) Distribution
0.6
0.5
probability
0.4
0.3
0.2
48%
48%
0.1
0
-4
-3
-2
-1
0
score
1
2
Standard Deviations
3
4
The Normal Distribution
• 95% of scores fall between 1.96 standard
deviations below and 1.96 standard
deviations above the mean
Gaussian (Normal) Distribution
0.6
0.5
probability
0.4
0.3
0.2
48%
48%
0.1
0
-4
-3
-2
-1
0
score
1
2
Standard Deviations
3
4
The Normal Distribution
• The Normal distribution reveals the
proportions (i.e. probabilities) of scores that
fall within certain ranges when the ranges
are expressed in terms of standard
deviations
The Normal Distribution
• The Normal distribution reveals the
proportions (i.e. probabilities) of scores that
fall within certain ranges when the ranges
are expressed in terms of standard
deviations
• If only there was some way to transform
scores into units of standard deviation…
The Z Transform
xi  x
zi 
Sx
The Z Transform
• Break that down:
– Remember that
z  x c
– And that
Sz  c(Sx )


The Z Transform
• Break that down:
z  x c
– If we used this
to make the
new mean zero by plugging in the negative of
the old mean for c

The Z Transform
• Break that down:
z  x c
– If we used this
to make the
new mean zero by plugging in the negative of
the old mean for c
– And we use this Sz  c(Sx ) to make the
standard deviation equal 1 by plugging in 1 /
the old standard deviation

The Z Transform
1
xi  x
zi  (x i  x ) 
Sx
Sx
-subtract the mean from each score
-Divide each score by the standard deviation
The Z Transform
– Then any score that was exactly the mean
would be zero standard deviations from the
mean (Z = 0.0)
– A score that was 1 standard deviation from the
mean would now be Z = 1.0
– 2 standard deviations from the mean would be
Z = 2.0
– Half way between 1 and 2 std. dev. from the
mean would be 1.5, etc.
The Z Transform
• Z scores are standardized
• Z scores are in units of standard deviation
• One can think of Z scores as the ratio of a
score’s difference from the mean to the
average difference from the mean
• Or one can think of Z score as “what
percentage of one standard deviation from
the mean is this score’s distance from the
mean?”
The Z Transform
• Uses of Z scores:
– allows comparison across different samples
(e.g. 25 degrees in Vancouver vs. 25 degrees in
Lethbridge)
– If one assumes that scores are normally
distributed, the Z score reveals the probability
of that particular score occurring by chance
The Standard Normal
Distribution or Z Distribution
• For any distribution:
– probability for each number (on x axis) is given
by height of curve
– probability for getting one out of a range of
numbers is given by the area under the curve
Standard Normal Distribution
• For the Standard Normal (a.k.a. Z
distribution), the area under the curve for a
given range is found in a Z table
• e.g. pg 111
Standard Normal Distribution
• Table shows areas
between  and any z
score you wish
• What’s  !?
•  is the mean of the
population of possible
scores (more on that
later)
Standard Normal Distribution
• Using the Z table
– note that negative z scores yield the same
probabilities because the curve is symmetric
– total area under the curve = 1.0 (probability
that something will happen is 1 ! )
– Examples
• probability of getting a z score between 0 and 1 is
.3413
• probability of getting a z score within 1 std. dev. of
the mean is .3413 + .3413 = .6826 or ~ 68%
Standard Normal Distribution
• What range above and below the mean
contains 95% of all the z scores?
Standard Normal Distribution
• What range above and below the mean
contains 95% of all the z scores?
– Z table tells you the positive half of the curve
Standard Normal Distribution
• What range above and below the mean
contains 95% of all the z scores?
– Z table tells you the positive half of the curve
– 1/2 of 95% = 47.5% or .475 is on each side of
the mean
Standard Normal Distribution
• What range above and below the mean
contains 95% of all the z scores?
– Z table tells you the positive half of the curve
– 1/2 of 95% = 47.5% or .475 is on each side of
the mean
– .475 corresponds to a z score of 1.96 (or
negative 1.96! )
Standard Normal Distribution
• What range above and below the mean
contains 95% of all the z scores?
– Z table tells you the positive half of the curve
– 1/2 of 95% = 47.5% or .475 is on each side of
the mean
– .475 corresponds to a z score of 1.96 (or
negative 1.96! )
– Thus .475 + .475 or 95% of z scores fall
between + / - 1.96 standard deviations of the
mean
Standard Normal Distribution
• What do we do with this knowledge?
Standard Normal Distribution
• What do we do with this knowledge?
• Knowing the probability of getting
particular z scores helps us to know what
population a given sample came from
Standard Normal Distribution
• For example:
• Blood Doping in Cross-Country Skiing
Standard Normal Distribution
• For example:
• Blood Doping in Cross-Country Skiing
• At the World Championships in Lahti
Finland, 13% of the athletes were found to
have a red blood cell count between 3.5 and
5.5 standard deviations from the presumed
population mean for all athletes (that was
measured in previous IOC study)
Standard Normal Distribution
• What percentage of athletes would you
expect to be greater than 3.5 standard
deviations from the mean?
Standard Normal Distribution
• What percentage of athletes would you
expect to be greater than 3.5 standard
deviations from the mean?
– look up z = 3.5
– the associated probability is .4998 so 2x.4998 =
.9996 or 99.96% should fall within +/- 3.5 !
– less than 1.0 - .9996 = .0004 or .04% should
have red blood cell z scores greater than +/-3.5!
– only half (.02%) of that should be above +3.5 !
Standard Normal Distribution
• .02% is what you’d expect - 13% is what they
observed!
• What this tells us is that the sample of athletes at
the Lahti World Championships was almost
certainly not taken from the same population as
the “normal” athletes in the IOC study
• At least some of the athletes sampled in Lahti had
done something to artificially elevate their red
blood cell count
Problem:
• What good is it to know about normal
distributions if there’s no guarantee that
your scores will be normally distributed?
Download