Slides

advertisement
Richard Baldy College of Agriculture
One-Click Anovas – Analysis of
Variance with Microsoft Excel
Overview of Presentation
• Demonstration of “One-Click ANOVA” with an
experiment set out in a Latin-square design.
• Goal of project: develop data analysis software
agriculture undergraduates would use in college
in their post college careers
• Why the goal?
• Software creation
Demonstration of “One-click
Anova”
Latin square randomization
Latin-square design with 1 experimental factor
Background for Developing Data
Analysis Software Agriculture
Undergraduates Would Use in
College and After Graduation
• Snapshot of California’s agriculture
• Industry input into curriculum development
• Our students
Features of the Industry
America’s Top Agricultural States
Rank Value ($ in Billions)
California 1 $26.80
Texas
2 $15.90
Sophisticated producers of 350
different crops and commodities in
the nation’s most populous state
Features of the industry continued
• Public demands fewer chemical inputs
• Yet, public expects industry to do the research
– Industry economically powerful. Has the money
– Public cannot support research of 350 commodities
– Research needs local ecosystem focus to advance goal
of ecologically sound agricultural production
Much on-farm research yields results once/yr
Easy to forget involved data analysis procedures
Characteristics of our students
Most will not attend graduate school
They will enter industry. Some will teach high
school agriculture
Industry Input
•
•
•
Continue to give hands-on experience
Integrate curriculum around ecosystem concepts
Teach problem solving:
–
–
–
Working in emotionally “charged” settings
Leadership
Experimentation including data analysis –
Experimentation, Data Analysis Course
“Dick, you are a plant physiologist, develop
and teach this course.”
Expectations:
• Undergraduates experiment and analyze data
• They report research via scientific papers,
posters, web pages and seminars
• And continue as farmer-researchers, consultantresearchers, high school teacher-researchers
Faculty Wish List
•
•
•
•
•
•
•
Completely randomized design
Randomized complete block design
Latin-square design
Split-plot design
Ancova
Simple and multiple regression
Analysis of count data
Why Excel?
• Students familiar with Excel. Therefore, teach data
analysis, not a program –saves time
• Students’ computers come with Excel – saves $
• Graduates use Excel for other purposes, not just
data analysis
• Specific experimental design templates friendlier
than general purpose programs e.g., SYSTAT
• OK for regression
Why Not SYSTAT?
• Expensive. Cheaper student version lacks
required capability such as split-plot
• My experience. Used daily use for weeks.
Left it for 3 months. Had to relearn
– Number codes for treatments are confusing
– For split-plot have to recalculate Anova
– Remember ag research data analyzed once/yr
Excel’s Limits As an Anova
Platform
• “Out of the box” Anova procedures handle
few designs; Do not handle missing data
• No mean separation tests
• No orthogonal contrasts
• No automatic charting of treatment means
Overcoming Excel’s Limitations
Text book formulas
•Such formulas are not for a world where
experimental cows die – become missing data.
•No residuals for testing normality, equal variance
Instead Use Method That Gives Residuals
For missing data, use iteration to find
values that gives residual total = zero
The Right Model
• Example: randomized block design, single
experimental factor.
• Need to solve for three sums of squares:
Block
Treatment
Error
Model for Error Term
• Treatment Mean + Block Mean – Grand Mean
• Subtract the above estimate from datum to
obtain residual
• Square residuals
• Sum of squared residuals = Error SS
Model for Block Term
• Datum - treatment mean = residual
• Square and sum residuals = confounded error
and block SS
Error SS confounded
with block SS
Treatment SS
From this sum of squares
subtract the error term’s sum
of squares.
Block
SS
Error SS
Treatment SS
Model for Treatment Term
• Datum – block means = residual
• Square and sum residuals = confounded error and
treatment SS
Error confounded
with treatment SS
Block SS
From this sum of squares
subtract the error term’s sum
of squares.
Treatment
SS
Block SS
Error SS
Finding Correct Substitute Data
•
•
•
•
•
Pivot table gives means to be used in estimates.
Set substitute datum to some value, e.g., 0
Substitute datum – estimate = residual
New substitute datum = former substitute datum – residual
Refresh pivot table, obtain new means for estimates
Circular argument – need careful control of
calculation order to avoid crashes
T r e atm e n t
Blo c k
Re s p o n s e
B onus
N o rt h
.
E x p re s s
N o rt h
112
S tander
N o rt h
80
B onus
S outh
78
S tander
S outh
67
B onus
E as t
54
E x p re s s
E as t
66
S tander
E as t
48
Let’s seeEhow
a
pivot
table
summarizes
these
data
x p re s s
S outh
98
Average of Response Treatment
Block
Bonus
Express Stander Grand Total
East
54
66
48
56
North
0
112
80
64
South
78
98
67
81
Grand Total
44
92
65
67
These are just data in this pivot table and
are not used in computing estimates. If
this were a Factor A x Factor B summary,
these numbers would also be means and
used in estimates
Residual = Response-Estimate. Thus, 0.0 – 44 = Looked
-44
up in
BBloloc c
kk
BBoonnuuss
N o r th
0 .00 .0
.0
4444.0
- 4 4 .0
N o r th
1 1 2 .0
9 2 .0
2 0 .0
Ex p r e s s
Ex p r e s s
S ta n d e r
R
Reessppoon ns se e
N o r th
N o r th
N o r th
1 1 2 .0
calculate
8 0 .0
tim
EsEs
tim
a atete
pivot table
tm e
en
TTr reeaatm
n tt
9 2 .0
6 5 .0
Re
es
s id
R
iduuaalsls
- 4 4 .0
2 0 .0
1 5 .0
B o n u s= Response-Residual.
S o u th
7 8New
.0 response
4 4 .0
4 .0
Response
Thus,
= 0- (-44) =3 44
Ex p r e s s
S o u th
9 8 .0
9 2 .0
6 .0
Srta
d ee rn t
T
e antm
SB olouc th
k
B oonnuuss
B
Ea
t
N osr th
.0
4544.0
.0
4444.0
1 00.0
.0
Ex
Ex pprreesss s
N osr th
Ea
t
1 1626.0
.0
9922.0
.0
.0
- 2260.0
S ta n d e r
Ea s t
4 8 .0
6 5 .0
- 1 7 .0
6 5 .0 R e s id u a ls 2 .0
R e s p o6
n s7 e.0 Es tim a te
T r e a tm e n t
B lo c k
R e s p o n s e Es tim a te
R e s id u a ls
Bonus
N o r th
4 4 .0
4 4 .0
0 .0
Ex p r e s s
N o r th
1 1 2 .0
9 2 .0
2 0 .0
refresh
Average of Response Treatment
Block
Bonus
East
54
North
44
South
78
Grand Total
58.7
Average of Response Treatment
Block
Bonus
East
54
North
44
South
78
Grand Total
58.7
calculate
T r e a tm e n t
B lo c k
Res pons e
Es tim a te
R e s id u a ls
Bonus
N o r th
4 4 .0
5 8 .7
- 1 4 .7
Ex p r e s s
N o r th
1 1 2 .0
9 2 .0
2 0 .0
Sum of residuals -2.842170943040400E-14
Sum of residuals squared
1918
Previous sum of residuals squared
1918
How to handle different sized
data sets
• Have pivot table
summarize 65,000+
rows.
– Simple to program
– Takes 5-10 minutes for
analysis.
Rodney’s autofill code +
ASSUME page tip on
Offset function  cut
run times 70 – 90%
Neville’s time trimming suggestion
Refreshing pivot tables for each
iteration = 1-2 minutes/analysis
Use DAVERAGE function
C o u n t o f R e s p o n s e Tre a t m e n t
B lo c k
B onus
E x p re s s
S tander
G ra n d To t a l
E as t
1
1
1
3
N o rt h
0
1
1
2
S outh
1
1
1
3
G ra n d To t a l
2
3
3
8
B on u s E xpress S tan der
DAVERAGE(DataRange, ResponseColumn,
m ean m ean
m ean
Criteria
INDIRECT(HC1))
for selecting responses to average)
E ast block m ean
N orth block m ean
S ou th block m ean
T r e a tm e n t
B lo c k
R e s p o n s e Es tim a te
R e s id u a ls
Bonus
N o r th
4 4 .0
4 4 .0
0 .0
Ex p r e s s
N o r th
1 1 2 .0
9 2 .0
2 0 .0
Further Topics
• Experiment planning
– Formulate additional, orthogonal hypotheses
– Estimate number of replicates
• Additional work
Experiment planning: Formulate additional
hypotheses. Test with single degree of freedom
orthogonal contrasts
An Example of Estimating
Number of Replicates. Example
Will Be for Randomized
Complete Block Design.
# reps RBD 1 experimental factor
Additional Work
• Simplify mean separation tests for factorial designs
• Example using split-plot design
Split-plot design with blocks
Additional Work (Continued)
• Replace trial and error method to develop models
• Open development to others
• Share programs
Download