Richard Baldy College of Agriculture One-Click Anovas – Analysis of Variance with Microsoft Excel Overview of Presentation • Demonstration of “One-Click ANOVA” with an experiment set out in a Latin-square design. • Goal of project: develop data analysis software agriculture undergraduates would use in college in their post college careers • Why the goal? • Software creation Demonstration of “One-click Anova” Latin square randomization Latin-square design with 1 experimental factor Background for Developing Data Analysis Software Agriculture Undergraduates Would Use in College and After Graduation • Snapshot of California’s agriculture • Industry input into curriculum development • Our students Features of the Industry America’s Top Agricultural States Rank Value ($ in Billions) California 1 $26.80 Texas 2 $15.90 Sophisticated producers of 350 different crops and commodities in the nation’s most populous state Features of the industry continued • Public demands fewer chemical inputs • Yet, public expects industry to do the research – Industry economically powerful. Has the money – Public cannot support research of 350 commodities – Research needs local ecosystem focus to advance goal of ecologically sound agricultural production Much on-farm research yields results once/yr Easy to forget involved data analysis procedures Characteristics of our students Most will not attend graduate school They will enter industry. Some will teach high school agriculture Industry Input • • • Continue to give hands-on experience Integrate curriculum around ecosystem concepts Teach problem solving: – – – Working in emotionally “charged” settings Leadership Experimentation including data analysis – Experimentation, Data Analysis Course “Dick, you are a plant physiologist, develop and teach this course.” Expectations: • Undergraduates experiment and analyze data • They report research via scientific papers, posters, web pages and seminars • And continue as farmer-researchers, consultantresearchers, high school teacher-researchers Faculty Wish List • • • • • • • Completely randomized design Randomized complete block design Latin-square design Split-plot design Ancova Simple and multiple regression Analysis of count data Why Excel? • Students familiar with Excel. Therefore, teach data analysis, not a program –saves time • Students’ computers come with Excel – saves $ • Graduates use Excel for other purposes, not just data analysis • Specific experimental design templates friendlier than general purpose programs e.g., SYSTAT • OK for regression Why Not SYSTAT? • Expensive. Cheaper student version lacks required capability such as split-plot • My experience. Used daily use for weeks. Left it for 3 months. Had to relearn – Number codes for treatments are confusing – For split-plot have to recalculate Anova – Remember ag research data analyzed once/yr Excel’s Limits As an Anova Platform • “Out of the box” Anova procedures handle few designs; Do not handle missing data • No mean separation tests • No orthogonal contrasts • No automatic charting of treatment means Overcoming Excel’s Limitations Text book formulas •Such formulas are not for a world where experimental cows die – become missing data. •No residuals for testing normality, equal variance Instead Use Method That Gives Residuals For missing data, use iteration to find values that gives residual total = zero The Right Model • Example: randomized block design, single experimental factor. • Need to solve for three sums of squares: Block Treatment Error Model for Error Term • Treatment Mean + Block Mean – Grand Mean • Subtract the above estimate from datum to obtain residual • Square residuals • Sum of squared residuals = Error SS Model for Block Term • Datum - treatment mean = residual • Square and sum residuals = confounded error and block SS Error SS confounded with block SS Treatment SS From this sum of squares subtract the error term’s sum of squares. Block SS Error SS Treatment SS Model for Treatment Term • Datum – block means = residual • Square and sum residuals = confounded error and treatment SS Error confounded with treatment SS Block SS From this sum of squares subtract the error term’s sum of squares. Treatment SS Block SS Error SS Finding Correct Substitute Data • • • • • Pivot table gives means to be used in estimates. Set substitute datum to some value, e.g., 0 Substitute datum – estimate = residual New substitute datum = former substitute datum – residual Refresh pivot table, obtain new means for estimates Circular argument – need careful control of calculation order to avoid crashes T r e atm e n t Blo c k Re s p o n s e B onus N o rt h . E x p re s s N o rt h 112 S tander N o rt h 80 B onus S outh 78 S tander S outh 67 B onus E as t 54 E x p re s s E as t 66 S tander E as t 48 Let’s seeEhow a pivot table summarizes these data x p re s s S outh 98 Average of Response Treatment Block Bonus Express Stander Grand Total East 54 66 48 56 North 0 112 80 64 South 78 98 67 81 Grand Total 44 92 65 67 These are just data in this pivot table and are not used in computing estimates. If this were a Factor A x Factor B summary, these numbers would also be means and used in estimates Residual = Response-Estimate. Thus, 0.0 – 44 = Looked -44 up in BBloloc c kk BBoonnuuss N o r th 0 .00 .0 .0 4444.0 - 4 4 .0 N o r th 1 1 2 .0 9 2 .0 2 0 .0 Ex p r e s s Ex p r e s s S ta n d e r R Reessppoon ns se e N o r th N o r th N o r th 1 1 2 .0 calculate 8 0 .0 tim EsEs tim a atete pivot table tm e en TTr reeaatm n tt 9 2 .0 6 5 .0 Re es s id R iduuaalsls - 4 4 .0 2 0 .0 1 5 .0 B o n u s= Response-Residual. S o u th 7 8New .0 response 4 4 .0 4 .0 Response Thus, = 0- (-44) =3 44 Ex p r e s s S o u th 9 8 .0 9 2 .0 6 .0 Srta d ee rn t T e antm SB olouc th k B oonnuuss B Ea t N osr th .0 4544.0 .0 4444.0 1 00.0 .0 Ex Ex pprreesss s N osr th Ea t 1 1626.0 .0 9922.0 .0 .0 - 2260.0 S ta n d e r Ea s t 4 8 .0 6 5 .0 - 1 7 .0 6 5 .0 R e s id u a ls 2 .0 R e s p o6 n s7 e.0 Es tim a te T r e a tm e n t B lo c k R e s p o n s e Es tim a te R e s id u a ls Bonus N o r th 4 4 .0 4 4 .0 0 .0 Ex p r e s s N o r th 1 1 2 .0 9 2 .0 2 0 .0 refresh Average of Response Treatment Block Bonus East 54 North 44 South 78 Grand Total 58.7 Average of Response Treatment Block Bonus East 54 North 44 South 78 Grand Total 58.7 calculate T r e a tm e n t B lo c k Res pons e Es tim a te R e s id u a ls Bonus N o r th 4 4 .0 5 8 .7 - 1 4 .7 Ex p r e s s N o r th 1 1 2 .0 9 2 .0 2 0 .0 Sum of residuals -2.842170943040400E-14 Sum of residuals squared 1918 Previous sum of residuals squared 1918 How to handle different sized data sets • Have pivot table summarize 65,000+ rows. – Simple to program – Takes 5-10 minutes for analysis. Rodney’s autofill code + ASSUME page tip on Offset function cut run times 70 – 90% Neville’s time trimming suggestion Refreshing pivot tables for each iteration = 1-2 minutes/analysis Use DAVERAGE function C o u n t o f R e s p o n s e Tre a t m e n t B lo c k B onus E x p re s s S tander G ra n d To t a l E as t 1 1 1 3 N o rt h 0 1 1 2 S outh 1 1 1 3 G ra n d To t a l 2 3 3 8 B on u s E xpress S tan der DAVERAGE(DataRange, ResponseColumn, m ean m ean m ean Criteria INDIRECT(HC1)) for selecting responses to average) E ast block m ean N orth block m ean S ou th block m ean T r e a tm e n t B lo c k R e s p o n s e Es tim a te R e s id u a ls Bonus N o r th 4 4 .0 4 4 .0 0 .0 Ex p r e s s N o r th 1 1 2 .0 9 2 .0 2 0 .0 Further Topics • Experiment planning – Formulate additional, orthogonal hypotheses – Estimate number of replicates • Additional work Experiment planning: Formulate additional hypotheses. Test with single degree of freedom orthogonal contrasts An Example of Estimating Number of Replicates. Example Will Be for Randomized Complete Block Design. # reps RBD 1 experimental factor Additional Work • Simplify mean separation tests for factorial designs • Example using split-plot design Split-plot design with blocks Additional Work (Continued) • Replace trial and error method to develop models • Open development to others • Share programs