Uploaded by vincent.priela2004

Two-way ANOVA Lesson: Analysis of Variance

advertisement
College of Arts and Sciences
Two-way Analysis of Variance (ANOVA)
Lesson 4
I.
INTRODUCTION
You studied how to compare three or more means and apply the pos hoc test in the previous section.
When using a one-way ANOVA, just one independent variable is used, such as comparing how
differently the patient's cholesterol is improved by the means of the four different diet kinds. The
independent variable (IV) is the type of diet and the dependent variable (DV) is the cholesterol level.
Adding another variable such as two types of medication leads to two independent variables, the types
of diet and the types of medication. Hence, comparing means involving two independent variables is
called two-way ANOVA. In addition, it will also give you an idea of how to interpret the findings based
on the statistical result using the Minitab software (Minitab 17) and draw contextualized conclusions.
II.
OBJECTIVES
At the end of this lesson, you (students) are expected to:
a. Differentiate one-way and two-way ANOVA;
b. apply appropriately two-way ANOVA using statistical software;
c. write findings and conclusions based on the statistical results of two-way ANOVA and pos hoc tests.
III.
LESSON PROPER
There are several types of Analysis of Variance (ANOVA) namely One-way ANOVA, two-way
ANOVA, etc. They vary depending on the number of independent variables they have. However, the
most common ANOVA used are the One-way and Two-way ANOVA.
There are assumptions that need to be
satisfied before using two-way ANOVA,
the following are as follows:
Assumptions of F-test According to Bluman (2009)
1. The populations from which the samples were obtained must be normally or
approximately normally distributed.
2. The samples must be independent of one another.
3. The variances of the populations must be equal.
4. Interval data on the dependent variable.
5. The groups must be equal in sample size.
What is the difference between one-way
ANOVA and two-way ANOVA?
A one-way ANOVA has just one independent variable (IV), whereas a two-way ANOVA has two IVs.
For instance, the researcher wants to know how the length of the patient's hospital stays is impacted by
the four different post-operative treatment kinds (treatment 1, treatment 2, treatment 3). The kind of
post-operative treatment is the IV and the length of hospital stay is the DV. However, if there are two
1|Pa ge 1 4
Anton A. Romero
independent variables, two-way ANOVA is the appropriate statistical test to use. For instance, in
addition to IV₁ (post-operative treatment), there is another IV₂ that includes several forms of medication
consumption (Medicine 1, Medicine 2).
Example 1. Investigate whether sleep and
sex impact health score. Draw your
findings and conclusions. Assume that all
the assumptions are satisfied. Use α = .05.
Solutions:
In the given problem, there are two IVs (sex and number of sleep) and one DV (health score).
Step 1: Input the data for sleep, sex and health score in different columns.
Step 2: Click general linear model, click model, highlight the IVs, add, OK, OK.
2|Pa ge 1 4
Anton A. Romero
Solutions:
Step 1: Input the data for sleep, sex and health score in different columns. 1 and 2 stands for
male and female, respectively.
Step 2: Click general linear model, click model, highlight the IVs, add, OK, OK.
Step 3: Draw findings from the ANOVA table.
Since the p-value for sleep (0.003) is less than .05, there is a significant difference in
the health score among the different sleeping time.
Since the p-value for sleep (0.002) is less than .05, there is a significant difference in
the health score between males and females.
3|Pa ge 1 4
Anton A. Romero
Since the p-value for interaction (sleep*sex) (.012) is less than .05, there is a significant
interaction between sleep and sex on the health score.
Note: Pos hoc test must be executed on the variables resulting in significant results
(sleep, sex, interaction). However, pos hoc test for interaction is not covered in the
discussion.
Step 4: Execute Pos hoc test for variables with significant results. Click comparisons, Tukey,
double click sleep and sex, click results, tests and confidence intervals, OK.
Step 5: Draw findings from the pos hoc results.
Look at the first column in the pos hoc result, the difference of means for 7 – 6 is 5.50
(positive), which shows that the mean of 7-hour sleep is higher than the mean of 6
hours sleep.
4|Pa ge 1 4
Anton A. Romero
The findings:
Since the p-value (.009) is less than .05, there is a significant difference between the
health scores of people who sleep 6 and 7 hours.
Since the p-value (.004) is less than .05, there is a significant difference between the
health scores of people who sleep 8 and 6 hours.
Since the p-value (.898) is greater than .05, there is no significant difference between
the health score of people who sleep 8 and 7 hours.
Other findings:
Since the p-value (.002) is less than .05, there is a significant difference between the
male and female health score.
Note: If the mean of male and female are available, there is no need to include the
variable sex in the pos hoc test.
Step 6: Draw your conclusion/s.
The following are the possible conclusion and other perspectives (some are
mandatory some are alternative conclusions)
(1)
Females are healthier than males.
Males are more prone to sickness compared to females.
(2)
7 and 8 hours of sleep is better than 6 hours of sleep.
People who sleep at least 7 hours per day are healthier.
(3)
The number of sleep affects the health of males and females. Specifically,
being a female is not an assurance that they are healthier compared to males or
vice versa, it can be affected by the number of sleeps they had. (E.g. females
can be unhealthy because they only slept for 6 hours).
(4)
Sex affects the health of people who sleep for 6, 7 and 8 hours. Specifically,
sleeping 7 and 8 hours a day is not a guarantee, it can be affected by their sex
(E.g. people who sleep 7 and 8 hours a day are not healthy because they are
male).
Note: There are at least four mandatory conclusions in this problem. Also, specific
findings can be achieved if pos hoc test is done for the interaction.
5|Pa ge 1 4
Anton A. Romero
Example 2. A psychiatrist studied the effects of three antidepressants on subjects in three
different age groups. Each subject was rated on a scale of 0 to 100, with higher numbers indicating
greater relief from depression. Assume that all the assumptions are satisfied. Use α = .05. The following
table presents the results.
Solutions:
In the given problem, there are two IVs (types of drugs and age group) and one DV (relief of
depression).
Step 1: Input the data for types of data, age group and depression relief score in different
columns.
6|Pa ge 1 4
Anton A. Romero
Step 2: Click general linear model, transfer DV (depression relief) to responses and the IVs
(types of drugs and age group) to factors, click model, highlight the IVs, add, OK, OK.
Step 3: Draw findings from the ANOVA table.
Analysis of Variance
Source
Types of Drugs
Age Group
Types of Drugs*Age Group
Error
Total
DF
2
2
4
27
35
Adj SS
5386.50
1362.67
2.33
25.25
6776.75
Adj MS
2693.25
681.33
0.58
0.94
F-Value
2879.91
728.55
0.62
P-Value
0.000
0.000
0.650
Since the p-value for types of drugs (0.000) is less than .05, there is a significant
difference in depression relief scores among the three antidepressant drugs.
Since the p-value for the age group (0.000) is less than .05, there is a significant
difference in depression relief scores among the age group.
7|Pa ge 1 4
Anton A. Romero
Since the p-value for interaction (types of drugs*age group) (.650) is greater than .05,
there is no significant interaction between types of drugs and age group on scores on
depression relief.
Note: Pos hoc test must be executed on the variables resulting in significant results
(types of drugs and age group).
Step 4: Execute Pos hoc test for variables with significant results. Click comparisons, Tukey,
double click types of drugs and age group, click results, tests and confidence intervals, OK.
Step 5: Draw findings from the pos hoc results.
Look at the first column in the pos hoc result, the difference of means for drug B and
drug A is 9.00 (positive), which shows that the mean of type B is higher than type A.
The findings:
Since the p-value (.000) is less than .05, there is a significant difference between the
drug B and drug A depression relief scores.
8|Pa ge 1 4
Anton A. Romero
Since the p-value (.000) is less than .05, there is a significant difference between the
drug C and drug A depression relief scores.
Since the p-value (.000) is less than .05, there is a significant difference between the
drug C and drug B depression relief scores.
Other findings:
Since the p-value (.000) is less than .05, there is a significant difference between the
depression relief scores of age groups 31-50 and 18-30.
Since the p-value (.000) is less than .05, there is a significant difference between the
depression relief scores of age groups 51-80 and 18-30.
Since the p-value (.000) is less than .05, there is a significant difference between the
depression relief scores of age groups 51-80 and 31-50.
Step 6: Draw your conclusion/s.
The following are the possible conclusion and other perspectives (some are
mandatory some are alternative conclusions)
(1)
(2)
Among the three different drugs, drug C is the most effective antidepressant
medicine or drug A is the least effective antidepressant medicine.
Drug B is more effective than drug A in relieving depression.
(3)
The antidepressant drugs are more effective in the age group 51-80 or the
antidepressant drugs are least effective in the age group 18-30.
(4)
The antidepressant drugs are more effective for the age group 31-50 compared
to the age group 18-30.
The effectiveness of the different antidepressant drugs is the same for the
different age groups.
(5)
Note: There are at least five mandatory conclusions in this problem.
9|Pa ge 1 4
Anton A. Romero
IV.
ASSESSMENT
A. Pre-class Activity: Answer the following questions and refer to the scoring rubric below:
Scoring Rubric
Criteria
Analyzing and
Interpreting Data
(Reading
between and
beyond the data)
1
(Idiosyncratic
Reasoning)
The findings
based on the
statistical result is
incorrect.
2
(Transitional
Reasoning)
The findings
based on the
statistical result is
correct or make
conclusion that is
primarily based
on the data may
be only partially
reasonable.
3
(Quantitative
Reasoning)
Makes
reasonable
conclusion based
on the findings
and the context.
4
(Analytical
Reasoning)
Makes
reasonable
conclusion based
on the findings
and the context
and provide
additional
reasonable
perspective.
1. Researchers have sought to examine the effect of various types of music on agitation levels in
patients who are in the early and middle stages of Alzheimer's disease. Patients were selected
to participate in the study based on their stage of Alzheimer's disease. Three forms of music
were tested: Easy listening,
Mozart, and piano interludes.
While listening to music,
agitation levels were recorded
for the patients with a high score
indicating a higher level of
agitation. Assume that all the
assumptions are satisfied. Use α
= .05. Below are the results of
ANOVA and the Pos hoc test.
10 | P a g e 1 4
Anton A. Romero
B. Grouping Activity
Direction: Given the problem/data, compute using the Minitab software. Assume that all the
assumptions in using a parametric test are satisfied. Refer to the scoring rubric below. One item
will be assigned for each group. Assume that all assumptions are satisfied.
SCORING RUBRIC
Criteria
Analyzing and
Interpreting Data
(Reading
between and
beyond the data)
Statistical Result
Presentation
1
(Idiosyncratic
Reasoning)
The findings
based on the
statistical result is
incorrect.
2
3
(Transitional
(Quantitative
Reasoning)
Reasoning)
The findings
Makes reasonable
based on the
conclusion based
statistical result is on the findings
correct or make
and the context.
conclusion that is
primarily based
on the data may
be only partially
reasonable.
Statistical Result and Presentation
Significant errors While there may
The statistical
or deficiencies in be minor areas
analysis is highly
the statistical
for improvement, accurate, with
analysis indicate
the statistical
complete results.
a need for
analysis is
substantial
generally
revision and
accurate and
improvement to
thorough, with
meet acceptable
few errors or
standards of
omissions.
accuracy.
Significant
While there may
The presentation
problems with
be some areas for is exceptionally
conciseness and
improvement, the clear and concise,
clarity reduce the presentation
engaging the
presentation's
generally
audience with
effectiveness.
achieves clarity
well-structured
and conciseness,
content and
effectively
minimal
conveying key
unnecessary
information to
detail.
the audience.
4
(Analytical
Reasoning)
Makes reasonable
conclusion based
on the findings
and the context
and provide
additional
reasonable
perspective.
1. A medical researcher wishes to test the effects of two different diets and two different
exercise programs on the glucose level in a person’s blood. The glucose level is measured
in milligrams per deciliter (mg/dl). Three
subjects are randomly assigned to each
group. Assume that all the assumptions are
satisfied. Use α = .05.
11 | P a g e 1 4
Anton A. Romero
2. Vermont maple sugar producers sponsored a testing program to determine the benefit of a new
fertilizer. A random sample of 27 maple trees in Vermont were chosen and treated with one of
three levels of fertilizer. In this experimental setup, nine trees (three in each of three climatic
zones) were treated with each fertilizer level and the amount of sap produced (in milliliters) by
the trees was measured. Sap is a body fluid (such as blood) essential to life, health, or vigor.
Assume that all the assumptions are satisfied. Use α = .05. The results are as follows.
3. An agricultural scientist wants to determine how the type of fertilizer and the type of soil
affect the yield of oranges in an orange grove. He has two types of fertilizer and three types
of soil. For each of the six combinations of fertilizer and soil, the scientist plants four
stands of trees and measures the yield of oranges (in tons per acre) from each stand.
Assume that all the assumptions are satisfied. Use α = .05. The data are shown in the following
table.
4. A hospital doctor wished to compare the effectiveness of 4 brands of painkillers A, B, C and D.
She arranged that when patients on a surgical ward requested painkillers they would be asked
if their pain is mild, severe or very severe. The first patient who said mild would be given brand
A, the second who said mild would be given brand B, the third brand C and the fourth brand D.
Painkillers would be allocated in the same way to the first four patients who said their pain was
very severe. The patients were then
asked to record the time in minutes, for
which the painkillers were effective.
The following data were collected.
Assume that all the assumptions are
satisfied. Use α = .05.
5. A botanist wants to know whether
or not plant growth is influenced
by sunlight exposure and watering
frequency. She plants 40 seeds
and lets them grow for two
months under different conditions
for sunlight exposure and
watering frequency. After two
months, she records the height of
each plant. The results are shown
at the right. Assume that all the
assumptions are satisfied. Use α =
.05.
12 | P a g e 1 4
Anton A. Romero
V.
REFERENCES
Books
Abbott, M. L., (2017). Using Statistics In The Social And Health Sciences With Spss® And
Excel®. John Wiley & Sons, Inc
Bluman, A. G., (2009). Elementary Statistics: A Step by Step Approach (Eight Edition). McGrawHill
Chaudhary, K., (2020). Introduction to Biotechnology and Biostatistics. Delve Publishing
Ho, R., (2018). Understanding Statistics for the Social Sciences with IBM SPSS. Taylor & Francis
Group, LLC
Navidi, W. & Monk, B., (2019). Elementaty Statistics (Third Edition). McGraw-Hill Education
Internet Source and Related Studies
ANOVA
Examples.
(n.d.).
https://www.people.vcu.edu/~wsstreet/courses/314_20033/Examples.ANOVA.pdf
ANOVA
Test
Types,
Table,
Formula,
https://www.cuemath.com/anova-formula/
Examples.
(2021).
Cuemath.
http://eagri.org/eagri50/STAM101/pdf/pract07.pdf
https://www.cimt.org.uk/projects/mepres/alevel/fstats_ch7.pdf
Indoria, A. K., Sharma, K. L., Reddy, K. S., & Rao, C. S. (2017). Role of soil physical properties
in soil health management and crop productivity in rainfed systems-I: Soil physical
constraints and scope. Current science, 2405-2414.
13 | P a g e 1 4
Anton A. Romero
https://www.kaggle.com/
Mathew, T. K., & Tadi, P. (2020). Blood glucose monitoring.
Utah State University. (2024). What is Iron Chlorosis and What Causes it? | Forestry | Extension.
Usu.edu.
https://extension.usu.edu/forestry/trees-cities-towns/tree-care/causes-ironchlorosis#:~:text=The%20primary%20symptom%20of%20iron,as%20the%20plant%20c
ells%20die.
Prepared by:
ANTON A. ROMERO
Mathematics Instructor
14 | P a g e 1 4
Anton A. Romero
Download