Single-Factor Studies KNNL – Chapter 16 Single-Factor Models • Independent Variable can be qualitative or quantitative • If Quantitative, we typically assume a linear, polynomial, or no “structural” relation • If Qualitative, we typically have no “structural” relation • Balanced designs have equal numbers of replicates at each level of the independent variable • When no structure is assumed, we refer to models as “Analysis of Variance” models, and use indicator variables for treatments in regression model Single-Factor ANOVA Model • Model Assumptions for Model Testing All probability distributions are normal All probability distributions have equal variance Responses are random samples from their probability distributions, and are independent • Analysis Procedure Test for differences among factor level means Follow-up (post-hoc) comparisons among pairs or groups of factor level means Cell Means Model r # of levels of the study factor ni # of replicates (cases, units) for the i th level of the study factor r n1 ... nr ni nT overall sample size (number of cases) i 1 Yij i ij i 1,..., r j 1,..., ni Yij Response for j th case within the i th level of the study factor i Population mean for the i th level of the study factor ij ~ NID 0, 2 where NID Normally and Independently Distributed E Yij i 2 Yij 2 Yij are independent N i , 2 Cell Means Model – Regression Form Suppose r 3 and n1 n2 n3 2 Y11 Y 12 Y Y 21 Y22 Y31 Y32 1 1 0 X 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 E Y11 1 1 E Y12 E Y21 0 E Y Xβ E Y22 0 E Y 0 31 E Y32 0 2 0 0 X'X 0 2 0 0 0 2 1 β 2 3 0 0 1 1 0 0 11 12 ε 21 22 31 32 2 0 0 0 0 0 2 0 0 0 0 0 2 0 0 0 0 0 2 2 ε I 2 0 0 0 0 0 0 0 0 0 2 0 2 0 0 0 0 0 0 1 0 1 1 0 2 2 0 2 3 1 3 1 3 Y11 Y12 X'Y Y21 Y22 Y31 Y32 ^ 0 Y11 Y12 Y 1 1 0.5 0 ^ ^ -1 β = X'X X'Y 0 0.5 0 Y21 Y22 Y 2 2 0 0 0.5 Y31 Y32 Y 3 ^ 3 Model Interpretations • Factor Level Means Observational Studies – The i represent the population means among units from the populations of factor levels Experimental Studies - The i represent the means of the various factor levels, had they been assigned to a population of experimental units • Fixed and Random Factors Fixed Factors – All levels of interest are observed in study Random Factors – Factor levels included in study represent a sample from a population of factor levels Fitting ANOVA Models ni ni Notation: Yi Yij Y i Y ij j 1 ni j 1 ni r r Y i ni ni Y Yij Y i 1 j 1 Y i 1 j 1 nT ij Y r ni Y i nT i 1 nT Least Squares and Maximum Likelihood Estimation ni ni Error Sum of Squares: Q Yij i r i 1 j 1 r 2 ij 2 i 1 j 1 nk Q 2 Ykj k k j 1 nk Q Setting 0 k nk Y kj j 1 ^ ^ nk k k Likelihood: L 1 ,..., r , | Y11 ,..., Yrnr 2 Y 1 2 2 j 1 kj nk Y k k 1,..., r 1 r ni 2 exp 2 Yij i n 2 i 1 j 1 ni maximizing Likelihood wrt 1 ,..., r minimizing Yij i r i 1 j 1 ^ Fitted values: Y ij Y i ^ Residuals: eij Yij Y ij Yij Y i 2 k 1,..., r Analysis of Variance Y Yij Y Yij Y i Total Deviation Deviation from trt mean (residual) Y r ni r Y Deviation of trt mean from overall mean r Y ni i 1 j 1 Yij Y 2 i 1 ni r Yij Y i i 1 j 1 2 ni Y i Y i Y Y i Y ij i 1 j 1 i Y i 0 ij j 1 ni r Y i Y i 1 j 1 r ni Total (Corrected) Sum of Squares: SSTO Yij Y i 1 j 1 ni r Treatment Sum of Squares: SSTR Y i Y i 1 j 1 r ni Error Sum of Squares: SSE Yij Y i i 1 j 1 Note: SSTO SSTR SSE ni s 2 i j 1 Yij Y i ni 1 2 2 2 2 r dfTO nT 1 ni Y i Y i 1 2 dfTR r 1 df E nT r dfTO dfTR df E Useful result: 2 Mean Squares: MSTR ni ni 1 s Yij Y i SSTR r 1 MSE 2 i j 1 SSE nT r 2 r SSE ni 1 s i 1 2 i r df E nT r ni 1 i 1 ANOVA Table Source df SS MS E{MS } r Treatments r 1 r SSTR ni Y i Y i 1 nT r Error nT 1 Total ni r SSE Yij Y i i 1 j 1 r ni 2 SSTO Yij Y i 1 j 1 r Note: SSTR ni Y nT Y 2 i 2 i 1 SSTR 2 MSTR i 1 r 1 2 SSE nT r MSE r ni i 2 r 1 2 2 ni r SSE Y ni Y i i 1 j 1 2 ij 2 i 1 r ni 2 r Yij E Y E Yij ni i2 nT 2 i 1 j 1 i 1 r 2 2 2 2 r 2 2 Y i E Y i i E ni Y i ni i2 r 2 ni ni i 1 i 1 E Yij i 2 2 2 ij 2 E Y i i 2 i r E Y n i 1 i nT i Y 2 2 nT E Y 2 2 2 nT 2 E nT Y nT 2 2 F-Test for H0: 1 ... r H 0 : 1 ... r H A : Not all i are equal MSTR MSE Under null hypothesis (and independence and normality of errors): Test Statistic: F * SSTR 2 ~ r21 SSE 2 ~ n2T r and are independent (independent even if H 0 false) SSTR r 1 2 MSTR ~ F r 1, nT r SSE MSE n r 2 T Decision Rule: Reject H 0 if F * MSTR F 1 ; r 1, nT r MSE General Linear Test of Equal Means H 0 : 1 ... r c c Common Mean (Reduced Model) H A : Not all i are equal (Complete Model) ^ ^ Reduced Model: c Y Y ij 2 i SSE ( R) Yij Y ij Yij Y i 1 j 1 i 1 j 1 r ni ^ ^ r n 2 SSTO df R nT 1 2 SSE df F nT r ^ Complete (Full) Model: i Y i Y ij 2 r i SSE ( F ) Yij Y ij Yij Y i i 1 j 1 i 1 j 1 r ni ^ n SSE ( R) SSE ( F ) SSTO SSE SSTR n 1 n r T T r 1 MSTR df R df F * Test Statistic: F SSE ( F ) SSE SSE MSE df n r n r F T T Factor Effects Model Alternative Form of Model (Necessary for interactions in multi-factor models): i i i Yij i ij i i "Effect" of i th factor level ij ~ NID 0, 2 Defining : r Unweighted Mean: i 1 r i i 1 i 1 r Weighted Mean: wi i r r s.t. i 0 w 1 i 1 i r w i 1 i i 0 Weights may represent the population sizes in observational studies Note: 1 ... r 1 ... r 0 Regression Approach – Factor Effects Model Suppose r 3 and n1 n2 n3 2 and Unweighted Mean Model: 1 2 3 0 3 1 2 Y11 Y 12 Y Y 21 Y22 Y31 Y32 1 1 0 1 1 0 1 0 1 X 1 0 1 1 1 1 1 1 1 β 1 2 11 12 ε 21 22 31 32 E Y11 1 1 1 1 0 1 1 0 E Y12 1 1 E Y21 1 0 1 2 2 E Y Xβ 1 E Y 1 0 1 22 2 2 2 E Y 1 1 1 1 2 3 31 E Y32 1 1 1 1 2 3 6 0 0 X'X 0 4 2 0 2 4 Y11 Y12 Y21 Y22 Y31 Y32 X'Y Y11 Y12 Y31 Y32 Y21 Y22 Y31 Y32 ^ 0 0 Y11 Y12 Y21 Y22 Y31 Y32 Y 1/ 6 ^ ^ -1 β = X'X X'Y 0 1/ 3 1/ 6 Y11 Y12 Y31 Y32 Y 1 Y 1 0 1/ 6 1/ 3 Y21 Y22 Y31 Y32 Y 2 Y ^ 2 Factor Effects Model with Weighted Mean ni Weights are relative sample sizes: wi nT r r r ni wi i 0 i ni i 0 i 1 i 1 nT i 1 r 1 r 1 ni nr r ni i r i i 1 i 1 nr Yij 1 X ij1 ... r 1 X ij ,r 1 ij 1 if i 1 n1 X ij1 if i r nr 0 otherwise ... 1 if i r 1 nr 1 X ij ,r 1 if i r nr 0 otherwise Regression for Cell Means Model Yij i ij 1 X ij1 ... r X ijr 1 if i 1 X1 0 if i 1 1 β r 1 if i r Xr 0 if i r ... Y 1 β Y r ^ When fitting with a regression package, no intercept is used Under H 0 : 1 ... r c : 1 X 1 β c ^ β Y Randomization (aka Permutation) Tests • Treats the units in the study as a finite population of units, each with a fixed error term ij • When the randomization procedure assigns the unit to treatment i, we observe Yij = . i + ij • When there are no treatment effects (all i = 0), Yij = . ij • We can compute a test statistic, such as F* under all (or in practice, many) potential treatment arrangements of the observed units (responses) • The p-value is measured as proportion of observed test statistics as or more extreme than original. • Total number of potential permutations = nT!/(n1!...nr!) Power Approach to Sample Size Choice - Tables When the means are not all equal, the F -statistic is non-central F : r F ~ F r 1, nT r , where * 1 n r i 1 i i r When all sample sizes are equal: 1 r 2 n i where n where r The power of the test, when conducted at the significance level of : Pr F * F 1 ; r 1, nT r , i nT r 2 i 1 i i 1 i 1 i r See Table B.11 Choose sample sizes so that the power is sufficiently high for specific 1 ,..., r or effects levels of interest 1 ,..., r max i min i Table B.12 is simple to use for equal sample sizes and mean levels of interest Power Approach to Sample Size Choice – R Code When the means are not all equal, the F -statistic is non-central F : r F ~ F r 1, nT r , where * n i 1 i i where 2 r When all sample sizes are equal: r 2 n i i 1 n i 1 nT r 2 where 2 i 1 r The power of the test, when conducted at the significance level of : i Pr F * F 1 ; r 1, nT r | F * ~ F r 1, nT r , In R: F 1 ; r 1, nT r qf (1 , r 1, nT r ) Power = 1 1 pf qf (1 , r 1, nT r ), r 1, nT r , i i Power Approach to Finding “Best” Treatment Goal: Determining the best treatment (one with highest or lowest mean): 1 Probability the treatment with highest (lowest) sample mean has highest (lowest) population mean Difference between highest (lowest) mean and 2nd highest (lowest) mean r Number of treatments n for various r ,1 Solve for n for given , Table B.13 gives