Lecture #8 - notes - for Dr. Jason P. Turner

advertisement
Hypothesis Testing III
MARE 250
Dr. Jason Turner
To ASSUME is to make an…
Four assumptions for means test hypothesis testing:
1.
2.
3.
4.
Random Samples
Independent Samples
Normal Populations (or large samples)
Variances (std. dev.) are equal
Significance Level
The probability of making a TYPE I Error
(rejection of a true null hypothesis) is
called the significance level (α) of a
hypothesis test
TYPE II Error Probability (β) – nonrejection
of a false null hypothesis
For a fixed sample size, the smaller we
specify the significance level (α) , the larger
will be the probability (β) , of not rejecting a
false hypothesis
Significance Level
If H:0 is
true
If H:0 is
false
TYPE I
ERROR
No Error
If H:0 is not No Error
rejected
TYPE II
ERROR
If H:0 is
rejected
I have the POWER!!!
The power of a hypothesis test is the
probability of not making a TYPE II error
(rejecting a false null hypothesis) t evidence
to support the alternative hypothesis
POWER = 1 – β
Produces a power curve
We need more POWER!!!
For a fixed significance level, increasing
the sample size increases the power
Therefore, you can run a test to determine if your
sample size HAS THE POWER!!!
By using a
sufficiently large
sample size, we can
obtain a hypothesis
test with as much
power as we want
Increasing the power of the test
There are four factors that can increase the
power of a means test:
1. Larger effect size (difference) - The greater the real
difference between data for the two populations, the
more likely it is that the sample means will also be
different.
2. Higher α-level (the level of significance) - If you
choose a higher value for α, you increase the probability
of rejecting the null hypothesis, and thus the power of
the test. (However, you also increase your chance of
type I error.)
Increasing the power of the test
There are four factors that can increase the
power of a means test:
3. Less variability - When the standard deviation is
smaller, smaller differences can be detected.
4. Larger sample sizes - The more observations there
are in your samples, the more confident you can be that
the sample means represent m for the two populations.
Thus, the test will be more sensitive to smaller
differences.
Calculating Power
Power - the probability of being able to detect
an effect of a given size
Sample size - the number of observations in
each sample
Difference (effect) - the difference between μ
for one population and μ for the other
Calculating Power
For a t-test – provide difference (means)
and standard deviation (largest of two)
If enter Sample size – get power
If enter Power – get sample size
Calculating Power
Similar for ANOVA – enter levels as well
If enter Sample size – get power
If enter Power – get sample size
Calculates sample
size per level
Calculating Power
For an ANOVA– provide # levels,
difference (means) and standard deviation
(largest)
If enter Sample size – get power
If enter Power – get sample size
Calculates sample
size per level
Increasing the power of the test
The most practical way to increase power is
often to increase the sample size
However, you can also try to decrease the
standard deviation by making
improvements in your process or
measurement
Sample size
Increasing the size of your samples increases the
power of your test
However, in the real world this is also a function of:
1.
2.
3.
4.
Time
Money
Logistics
Reality
Data Transformations
One advantage of using parametric statistics is that
it makes it much easier to describe your data
If you have established that it follows a normal
distribution you can be sure that a particular set of
measurements can be properly described by its
mean and standard deviation
If your data are not normally distributed you
cannot use any of the tests that assume that it is
(e.g. ANOVA, t test, regression analysis)
Data Transformations
If your data are not normally distributed it
is often possible to normalize it by
transforming it
Transforming data to allow you to use
parametric statistics is completely legitimate
Data Transformations
People often feel uncomfortable when they
transform data because it seems like it artificially
improves their results but this is only because they
feel comfortable with linear or arithmetic scales
However, there is no reason for not using other
scales (e.g. logarithms, square roots, reciprocals
or angles) where appropriate (See Chapter 13)
Data Transformations
Different transformations work for different data
types:
Logarithms : Growth rates are often
exponential and log transforms will often
normalize them. Log transforms are particularly
appropriate if the variance increases with the
mean.
Reciprocal : If a log transform does not
normalize your data you could try a reciprocal
(1/x) transformation. This is often used for
enzyme reaction rate data.
Data Transformations
Square root : This transform is often of value
when the data are counts, e.g. # urchins, # Honu.
Carrying out a square root transform will convert
data with a Poisson distribution to a normal
distribution.
Arcsine : This transformation is also known as
the angular transformation and is especially
useful for percentages and proportions
Which Transformation?
Johnson Transformation is useful when the
collected data are non-normal, but you want to
apply a methodology that requires a normal
distribution
It is a MINITAB program – not a TEST!
Which Transformation?
Johnson Transformation should be used as a
first step before you transform data “by hand”
Why?
1) Its quick and easy (point and click)
2) It runs a variety of very complex data
transformation functions
3) However, only runs LOG, ARCSINE based
equations
How To?
STAT – Quality Tools – Johnson
Transformation
Enter what variable to be transformed, and
what the “new” transformed variable will be
called
Places transformed data into a new column
in your MINITAB datasheet – can copy this
into Excel and save FOREVER…
Johnson Transformation
How do I know if it worked?
If Johnson transformation program is
successful it will:
1) Transform data and provide info on what
transformation it ran (formula)
2) Run normality test to verify
3) Provide you with transformed data (if you ask
for it)
4) Output has 3 graphs
Johnson Transformation
J o hns o n T r a ns fo r ma tio n fo r M a le s
P r o b a b ilit y P lo t f o r O r ig in a l D a t a
N
AD
P-V alue
P e r ce nt
90
31
0.876
0.022
50
10
0.96
P -V a lue fo r A D te st
99
S e le c t a T r a n s f o r m a t io n
0.8
0.6
0.4
0.2
Ref P
0.0
0.2
1
0.6
0.8
1.0
100
200
99
(P - V alu e = 0.005 m ean s < = 0.005)
N
AD
P-V alue
90
31
0.176
0.915
3 Graphs
Transformation Equation
P -V a lu e fo r B e st F it: 0 .9 1 5 2 4 9
Z fo r B e st F it: 0 .9 6
B e st T ra n sfo rm a tio n T y p e : S L
T ra n sfo rm a tio n fu n ctio n e q u a ls
50
-1 0 .8 7 3 4 + 2 .5 8 2 9 5 * Lo g ( X + 5 .3 4 3 3 5 )
10
1
1.2
Z V a lue
0
P r o b a b ilit y P lo t f o r T r a n s f o r m e d D a t a
P e r ce nt
0.4
Did a LOG Transformation
-2
0
2
4
Johnson Transformation
How do I know if it worked?
If Johnson transformation program is NOT
successful it will:
1) Tell you it failed to find a data transformation
that passed the normality test
2) Output has 2 graphs
Johnson Transformation
2 Graphs
Did not transform data
Then What?
If the Johnson Transformation program was
unsuccessful at transformation your data to
meet parametric assumptions then run Data
transformations “by hand”
There are several, I am teaching you 4:
1) Log
2) Reciprocal
3) Square Root
4) Arcsine
Then What?
Calculate these in your working Excel file
1) Make new column headers
Then What?
Calculate these in your working Excel file
2) Insert Function in cell below header
Then What?
Calculate these in your working Excel file
3) Enter cell number for first datapoint
Then What?
Calculate these in your working Excel file
4) Copy cell and paste/fill down
Then What?
Calculate these in your working Excel file
5) Wash, Rinse, Repeat…for other 3
variables
Download