732A35 1 Regression analysis: Observational study Analysis of Variance: Experimental study Analysis of Variance = ANOVA We still have a normally distributed response variable but the explanatory variables are qualitative variables called Factors. The values of the factor are called levels. 732A35 2 Y=number of sold TVs Is there a difference between the number of sold TVs if there has been a advertisement in the newspaper or TV advertisement? Type of advertisement is determined in advance, so it’s an experimental study, not an observational study. If 𝑌𝑖 ~𝑁 0, 𝜎 2 , i=1,2 then 𝑡= 𝑌1 − 𝑌2 𝑠𝑝 1 𝑛1 + 𝑛1 ~ 𝑡 𝑛1 + 𝑛2 − 2 2 2 2 (𝑛 −1)𝑠 + (𝑛 −1)𝑠 1 2 1 2 𝑠𝑝2 = 𝑛1 + 𝑛2 − 2 732A35 3 Assume now that we want to compare three levels, that is 1) newspaper advertisement 2) TV advertisement 3) No advertisement We need a new model for this 732A35 4 𝑌𝑖𝑗 = 𝜇𝑖 + 𝜀𝑖𝑗 𝑖 = 1,2, … , 𝑟 number of levels 𝑗 = 1,2, … , 𝑛𝑖 number of observations for level i 𝜀𝑖𝑗 ~𝑁 0, 𝜎 2 𝑖𝑖𝑑 is the random component 𝐸 𝑌𝑖𝑗 = 𝜇𝑖 732A35 5 𝑌𝑖𝑗 = 𝜇∙ + 𝜏𝑖 + 𝜀𝑖𝑗 𝑖 = 1,2, … , 𝑟 number of levels 𝑗 = 1,2, … , 𝑛𝑖 number of observations for level i 𝜏𝑖 = factor effect for level i 𝑟 𝜏𝑖 = 0 𝑖=1 𝜏1 + 𝜏2 + ⋯ + 𝜏𝑟−1 = −𝜏𝑟 This holds if 𝑛𝑖 = 𝑛 Otherwise: 𝑖 𝑛𝑖 𝜏𝑖 𝑛𝑇 =0 732A35 6 Example Factor 1 𝑦11 𝑦12 𝑦13 𝑦14 𝑛1 = 4 𝑌1∙ 2 𝑦21 𝑦22 𝑦23 𝑦24 𝑦25 𝑦26 𝑛2 = 6 𝑌2∙ 3=r 𝑦31 𝑦32 𝑦33 𝑛3 = 3 𝑌3∙ 732A35 7 𝑛𝑖 𝑌𝑖𝑗 = 𝑌𝑖∙ 𝑗=1 𝑌𝑖∙ 𝑌𝑖∙ = 𝑛𝑖 𝑟 𝑛𝑖 = 𝑛 𝑇 𝑖=1 𝑌∙∙ = 𝑖,𝑗 𝑌𝑖𝑗 𝑛𝑇 𝑌∙∙ = 𝑛𝑇 732A35 8 Ordinary least squares estimates (OLS): Minimize: 𝑟 𝑛𝑖 (𝑌𝑖𝑗 − 𝜇𝑖 )2 𝑄= 𝑖=1 𝑗=1 Result: 𝜇𝑖 = 𝑌𝑖∙ Or if 𝜇𝑖 = 𝜇∙ + 𝜏𝑖 then 𝜇∙ = 𝑌∙∙ and 𝜏𝑖 = 𝑌𝑖∙ − 𝑌∙∙ 732A35 9 𝑒𝑖𝑗 = 𝑌𝑖𝑗 − 𝑌𝑖𝑗 = 𝑌𝑖𝑗 − 𝑌𝑖∙ Sum to zero and have the same properties as in regression analysis. That is: Normally distributed Constant variance Independent Plot against fitted values and in observational order if needed. 732A35 10 SSTO=SSTR+SSE 𝑟 𝑛𝑖 (𝑌𝑖𝑗 − 𝑌∙∙ )2 𝑆𝑆𝑇𝑂 = 𝑖=1 𝑗=1 𝑟 𝑛𝑖 𝑟 (𝑌𝑖∙ − 𝑌∙∙ )2 = 𝑆𝑆𝑇𝑅 = 𝑖=1 𝑗=1 𝑟 𝑛𝑖 𝑛𝑖 (𝑌𝑖∙ − 𝑌∙∙ )2 𝑖=1 𝑟 𝑛𝑖 2 𝑒𝑖𝑗 (𝑌𝑖𝑗 − 𝑌𝑖∙ )2 = 𝑆𝑆𝐸 = 𝑖=1 𝑗=1 𝑖=1 𝑗=1 732A35 11 Source df SS MS F Treatment 𝑟−1 SSTR 𝑆𝑆𝑇𝑅 =MSTR 𝑟−1 𝑀𝑆𝑇𝑅 𝑀𝑆𝐸 Error 𝑛𝑡 − 𝑟 SSE Total 𝑛𝑡 − 1 SSTO 𝑆𝑆𝐸 =MSE 𝑛𝑡 −𝑟 732A35 12 𝜎 2 = 𝑀𝑆𝐸 𝑀𝑆𝐸 = 𝑟 𝑖=1 𝑛𝑖 𝑗=1(𝑌𝑖𝑗 − 𝑌𝑖∙ 𝑛𝑇 − 𝑟 )2 = Easier to calculate : 𝑀𝑆𝐸 = 𝑠𝑖2 = 𝑛 1−1 𝑖 𝑛𝑖 2 𝑒 𝑗=1 𝑖𝑗 𝑟 𝑖=1 𝑛𝑇 − 𝑟 2 (𝑛 −1)𝑠 𝑖 𝑖 𝑖 𝑛𝑇 −𝑟 (𝑌𝑖𝑗 − 𝑌𝑖∙ )2 𝑗 732A35 13 𝐻0 : 𝜇1 = 𝜇2 = ⋯ = 𝜇𝑟 𝐻𝑎 : 𝑛𝑜𝑡 𝑎𝑙𝑙 𝑚𝑒𝑎𝑛𝑠 𝑎𝑟𝑒 𝑒𝑞𝑢𝑎𝑙 𝐻0 : 𝜏1 = 𝜏2 = ⋯ = 𝜏𝑟 = 0 𝐻𝑎 : 𝑛𝑜𝑡 𝑎𝑙𝑙 𝑒𝑓𝑓𝑒𝑐𝑡𝑠 𝑎𝑟𝑒 𝑧𝑒𝑟𝑜 𝐹∗ 𝑀𝑆𝑇𝑅 = ~𝐹(𝑟 − 1, 𝑛 𝑇 − 𝑟) 𝑀𝑆𝐸 732A35 14 𝐸 𝑀𝑆𝐸 = 𝜎 2 so MSE is an unbiased estimate of 𝜎 2 𝐸 𝑀𝑆𝑇𝑅 = 𝜎 2 + 𝑛𝑖 (𝜇𝑖 − 𝜇∙ )2 𝑖 where 𝜇∙ = 𝑖 𝑛𝑖 𝜇𝑖 𝑛𝑇 𝐸 𝑀𝑆𝑇𝑅 = 𝜎 2 when the null hypothesis is true 732A35 15 If we reject the null hypothesis, then we would like to investigate where the differences are. Calculate confidence-intervals for 𝜇𝑖 𝐷 = 𝜇𝑖 − 𝜇𝑖´ 𝐿= 𝑟 𝑖=1 𝑐𝑖 𝜇𝑖 Difference 𝑖 ≠ 𝑖´ 𝑟 𝑖=1 𝑐𝑖 =0 Contrast Example of a contrast: 𝜇1 + 𝜇2 𝐿= − 𝜇3 2 732A35 16 𝜇𝑖 = 𝑌𝑖∙ 𝑠 𝑌𝑖∙ = 𝑀𝑆𝐸 𝑛𝑖 𝐷 = 𝑌𝑖∙ − 𝑌𝑖´∙ 𝑠 𝐷 = 𝑀𝑆𝐸 𝐿= 𝑟 𝑖=1 𝑐𝑖 𝑌𝑖∙ 𝑠 𝐿 = 𝑀𝑆𝐸 1 𝑛𝑖 + 𝑛1 𝑖´ 𝑐𝑖2 𝑟 𝑖=1 𝑛 732A35 𝑖 17 Denote one of the parameters 𝜇𝑖 , D or L by 𝜃 For a single confidence-interval, use the tdistribution. 𝑡 = 𝑡(1 − 𝛼2; 𝑛 𝑇 − 𝑟) CI for 𝜃: 𝜃 ± 𝑡 ∙ 𝑠(𝜃) 𝜃−𝜃0 Test-statistic: to test if 𝜃 = 𝜃0 𝑠(𝜃 ) Compare with the distribution above 732A35 18 For g confidence intervals, use the Bonferroni method 𝛼 B = 𝑡(1 − 2𝑔 ; 𝑛 𝑇 − 𝑟) Family confidence 1 − 𝛼 CI for 𝜃: 𝜃 ± 𝐵 ∙ 𝑠(𝜃 ) 𝜃 −𝜃0 Test-statistic: to test if 𝜃 = 𝜃0 𝑠(𝜃) Compare with the distribution above 732A35 19 Use Tukeys method 𝑇= 1 2 𝑞(1 − 𝛼: 𝑟, 𝑛 𝑇 − 𝑟) q is the studentized range distribution Family confidence 1 − 𝛼 CI for 𝐷: 𝐷 ± 𝑇 ∙ 𝑠(𝐷 ) Test-statistic: Most often 𝐷0 = 0 𝐷−𝐷0 to test if 𝐷 = 𝐷0 𝑠(𝐷) Compare with the distribution above 732A35 20 Use Scheffés method 𝑆 2 = (1 − 𝑟)𝐹(1 − 𝛼; 𝑟 − 1, 𝑛 𝑇 − 𝑟) Family confidence 1 − 𝛼 CI for 𝐿: 𝐿 ± 𝑆 ∙ 𝑠(𝐿) 𝐿 Test-statistic: 𝑠(𝐿) Compare with the distribution above to test if 𝐿 = 0 732A35 21 Chapter 15, 16 and 17 732A35 22