Statistical Analysis & Design in Research Structure in the Experimental Material PGRM 10 Statistics in Science Blocking – the idea Detecting differences between treatments depends on the background noise (BN) • BN is: – caused by inherent differences between the experimental units – measured by the residual (error) mean square RMS (alternatively! MSE) • Comparing treatments on similar units would reduce background noise • With blocks of units of differing contributing characteristics we measures the variation due to blocks and reduce residual variation Statistics in Science Blocking – the benefit Reducing background noise: • Gives more precise estimates • Allows a reduction in replication, without loss of power (the probability of detecting an effect of a specified size) • Reduces cost! Statistics in Science Blocking and experimental material Examples 1. A field: with fertility increasing from top to bottom With 3 treatments group plots into BLOCKS of 3, starting at top and continuing to bottom. Randomise treatments within each block Statistics in Science Block Design Statistics in Science Blk 1 A T1 Treat B T3 2 T3 T2 T1 3 T2 T1 T3 4 T1 T2 T3 5 T3 T1 T2 6 T1 T2 T3 C T2 What is the experimental unit? How many replicates per treatment? What is the block? Example • 2 drugs (A, B) to control blood pressure • 100 subjects – randomly assign 50 each to A and B • Valid - but is it efficient? • If subjects are heterogenous - likely to be a large variation (2) in the responses within each group. • Design may not be very efficient. Statistics in Science Factors affecting BP variation Statistics in Science Blocking and experimental material 1. 100 subjects are selected to compare new drug to control BP with a Control Block into pairs by age & weight (believed to affect BP) In each pair one is selected at random to receive the new drug, the other receives Control Alternatively – see next slide Statistics in Science Groups (Blocks) Age >50 >50 >50 Statistics in Science Sex Male Male Male Weight # H 15 N 11 L 12 >50 Female >50 Female >50 Female H N L 11 9 13 <50 Male <50 Male <50 Male H N L 7 2 5 <50 Female <50 Female <50 Female H N L 4 8 3 Total 100 T1 T2 Groups (Blocks) Age >50 >50 >50 Statistics in Science Sex Male Male Male Weight # H 15 N 11 L 12 T1 8 5 6 T2 7 6 6 >50 Female >50 Female >50 Female H N L 11 9 13 5 5 6 6 4 7 <50 Male <50 Male <50 Male H N L 7 2 5 4 1 2 3 1 3 <50 Female <50 Female <50 Female H N L 4 8 3 2 4 2 2 4 1 Total 100 50 50 Blocking and experimental material Examples 1. A field: with fertility increasing from top to bottom With 3 treatments group plots into BLOCKS of 3, starting at top and continuing to bottom. Randomise treatments within each block 2. 100 subjects are selected to compare new drug to control BP with a Control Block into pairs by age & weight (believed to affect BP) In each pair one is selected at random to receive the new drug, the other receives Control 3. 3 products to be compared in 15 supermarkets: All 3 compared in each supermarket, regarded as BLOCKS Statistics in Science Blocking and experimental material Examples (contd) 4. A crop experiment will take 5 days to harvest. The material is blocked into 5 sets of plots, and treatments assigned at random within each set A BLOCK of plots is harvested each day Here: day effects, such as rain etc will be allowed for in the ANOVA table, not clouding the estimation of treatment effects, and reducing residual variation. Statistics in Science Blocking factors in your work area? Statistics in Science Reasons to BLOCK 1. Reduce BN (as above) 2. Material is naturally blocked (eg identical twins) so using this a part of the design may reduce BN 3. To protect against factors that may influence the experimental outcomes, and so cloud comparison of treatments 4. To assess block variation itself eg day to day variation large may indicate a process that is not well controlled. Statistics in Science Typical Randomised Block Design (RBD) Layout 4 treatments T1 – T4 BLOCKS of size 4 Example of random allocation within blocks: Block Statistics in Science 1 T3 T1 T2 T4 2 T2 T3 T1 T4 3 T1 T2 T3 T4 4 T2 T4 T1 T3 5 T4 T2 T3 T1 6 T3 T1 T4 T2 ANOVA table each treatment occurs once in each block t treatments b blocks tb experimental units Source DF SS MS F Treatments t–1 TSS TMS TMS/RMS Small? Blocks b–1 BSS BMS BMS/RMS Small? (t-1)(b-1) RSS RMS Residual Total Statistics in Science tb - 1 MS = SS/DF Pr > F Example PGRM pg 10-2 Compare effect of washing solution used in retarding bacterial growth in food processing containers. Only 3 trials can be run each day, and temperature is not controlled so day to day variability is expected. BLOCKS: day Treatments: 2%, 4%, 6% of active ingredient Randomisation: 3 containers randomly allocated to 3 treatments on each of 4 days. Response: bacterial count on each container each day (low score = cleaner) Statistics in Science Example (contd) E x c e l Statistics in Science Day Solution(%) Count 1 2 13 1 4 10 1 6 5 2 2 18 2 4 20 2 6 6 3 2 18 3 4 17 3 6 7 4 2 30 4 4 31 4 6 10 Day,Solution(%),Count 1,2,13 1,4,10 1,6,5 csv 2,2,18 2,4,20 ... Note: Response values in a single column Extra column to identify BLOCK (day) TREATMENT (solution) SAS GLM code proc glm data = randb; class solution day; model score = solution day; lsmeans solution; lsmeans day; estimate ‘2-6’ solution 1 0 -1; estimate ‘linear ok?’ solution 1 -2 1; Statistics in Science quit; GLM OUTPUT: ANOVA Sum of DF Squares Source Mean Square Model 5 748.08 149.6 Error 6 76.8 12.8 Corrected Total 11 824.9 Source Type I SS Mean Square F Value Pr > F DF in Science 11.68 0.0048 solution 2 425.17 212.58 16.60 0.0036 Day 3 322.92 107.64 8.41 0.0144 425.17 + 322.92 = 748.09 Statistics F Value Pr > F So the Model SS has been partitioned into TREATMENT (solution) and BLOCK (Day) GLM OUTPUT: means solution score LSMEAN 2 19.75 4 19.5 6 7.0 Parameter 2-6 linear ok? Statistics in Science Standard Error t Value Pr > |t| Estimate 12.75 2.530 5.04 0.0024 -12.25 4.383 -2.80 0.0314 ANOVA table Source SS Days 425 Solution Residual 2 19.8 1 9.3 Statistics in Science F P ? 213 18.60 0.004 323 ? 108 8.41 0.014 76.8 ? 12.8 Solution 4 19.5 Day 2 14.7 df MS 6 7.0 SED 2.53 3 14.0 4 23.7 SED 2.92 More Blocking – Latin square designs Statistics in Science Latin Square design – blocking by 2 Sources of variation Variation in milk yield among cows is large (CV% = 25) Lactation yield pattern Variation in Yield across lactation is large 600 Yield (kg) Use different treatments in sequence on each cow 800 400 200 Need to allow for a standardisation period (12) weeks between treatments Statistics in Science 0 0 2 4 6 Month 8 10 Data Period 1 2 3 4 1 T2 T4 T3 T1 Cow 2 T1 T2 T4 T3 3 T3 T1 T2 T4 Milk yield (kg/day) Cow Period 1 2 3 1 9.7 14.0 20.2 2 15.1 20.3 17.8 3 16.4 20.1 21.3 4 11.8 19.1 21.3 Statistics in Science 4 T4 T3 T1 T2 4 20.9 24.3 21.5 20.6 Period 1 2 3 4 1 2 …. Cow 1 1 1 1 2 2 Treat 2 4 3 1 1 2 yield 9.7 15.1 16.4 11.8 14.0 20.3 Columns for period,cow and treatment codes SAS GLM code proc glm data = latinsq; class period cow treat; model yield = period cow treat; lsmeans treat; lsmeans period; lsmeans cow; estimate ‘1v2’ treat 1 -1 0 0 ; Run; Statistics in Science Results Source Period Cow Treat Error DF 3 3 3 6 SS 31.2 165.8 32.5 7.2 MS 10.41 55.28 10.82 1.20 F 8.68 46.06 9.01 P 0.013 0.000 0.012 Cow and Period removed much variation Statistics in Science Means 1 2 3 4 SED Treat 16.28 17.98 20.01 19.33 0.775 Period 16.21 19.37 19.82 18.18 0.775 Cow 13.24 18.38 20.16 21.82 0.775 Conclusions on Latin square design CV greatly reduced to 6% - When the effect of period is allowed for, repeated measurements within a cow are not very variable. Periods and cows are nuisance variables. Sometimes the row and column variables are of interest in themselves and so design is very efficient – information on 3 factors. (e.g. treatments, machines, operators). Useful for screening but questionable whether short term results would apply for the long term. Statistics in Science