Mata kuliah Tahun : A0392 - Statistik Ekonomi : 2010 Pertemuan 12 Analisis Varians Satu Arah dan Dua Arah 1 Outline Materi : Model tabel ANOVA klasifikasi satu arah ANOVA ulangan sama ANOVA ulangan tidak sama 2 Analisis Variansi • Analisa variansi (ANOVA) adalah suatu metoda untuk menguji hipotesis kesamaan rata-rata dari tiga atau lebih populasi. • Asumsi Sampel diambil secara random dan saling bebas (independen) Populasi berdistribusi Normal Populasi mempunyai kesamaan variansi • Hipotesis H0 : 1 = 2 = … = k H1 : paling sedikit dua tidak sama 3 3 Analisis Variansi Total 1 x11 x12 : x1n T1 Sampel dari Populasi ke : 2 … i … x21 … xi1 … x22 … xi2 … : : : : x2n … xin … T2 … Ti … k Xk1 Xk2 : xkn Tk Total T Ti adalah total semua pengamatan dari populasi ke-i T adalah total semua pengamatan dari semua populasi 4 4 Rumus Hitung Jumlah Kuadrat Untuk Pengujian Hipotesis Di atas Perlu ditentukan Jumlah Kuadrat Setiap Sumber Variasi 2 T JKT x ij2 nk i 1 j1 k Jumlah Kuadrat Total = n k 2 T i Jumlah Kuadrat Perlakuan = Jumlah Kuadrat Galat = 2 T JKP i 1 n nk JKG JKT JKP 5 5 Tabel Anova dan Daerah Penolakan Sumber Variasi Derajat bebas Jumlah kuadrat Kuadrat Rata-rata Statistik F Perlakuan k–1 JKP KRP = JKP/(k – 1 ) F= KRP/KRG KRG = JKG/(k(n-1)) Galat k(n-1) JKG Total nk – 1 JKT H0 ditolak jika F > F(; k – 1; k(n – 1)) 6 6 Contoh 1 Sebagai manager produksi, anda ingin melihat mesin pengisi akan dilihat rata-rata waktu pengisiannya. Diperoleh data seperti di samping. Pada tingkat signifikansi 0.05 adakah perbedaan rata-rata waktu ? Mesin1 Mesin2 Mesin3 25.40 26.31 24.10 23.74 25.10 23.40 21.80 23.50 22.75 21.60 20.00 22.20 19.75 20.60 20.40 7 7 Penyelesaian Hipotesa : H 0: 1 = 2 = 3 H1: Ada rata-rata yang tidak sama Tingkat signifikasi = 0.05 Karena df1= derajat bebas perlakuan = 2 dan df2 = derajat bebas galat = 12, maka f(0.05;2;12) = 3.89. Jadi daerah pelokannya: H0 ditolak jika F > 3.89 8 8 Data Populasi Total 1 2 3 25.40 23.40 20.00 26.31 21.80 22.20 24.10 23.50 19.75 23.74 22.75 20.60 25.10 21.60 20.40 124.65 113.05 102.95 Total 340.65 9 9 Jumlah Kuadrat Total 2 T JKT x ij2 nk i 1 j1 k n 25.40 2 26.312 24.10 2 23.74 2 25.10 2 23.40 2 21.80 2 23.50 2 22.752 21.60 2 20.00 2 22.20 2 19.752 20.60 2 20.40 2 340.652 5 3 58.2172 10 10 Jumlah Kuadrat Perlakuan dan Jumlah Kuadrat Galat k T 2 i 2 T JKP n nk 2 2 2 2 124.65 113.05 102.95 340.65 5 5 3 47.1640 i 1 JKG 58.2172 47.1640 11.0532 11 11 Tabel Anova dan Kesimpulan Sumber Variasi Derajat Bebas Jumlah Kuadrat Kuadrat Rata-rata Perlakuan 3-1=2 47.1640 23.5820 Statistik F F = 25.60 Galat 15-3=12 11.0532 Total 15-1=14 58.2172 0.9211 Karena Fhitung = 25.60 > 3.89 maka H0 ditolak. Jadi ada rata-rata yang tidak sama. 12 12 Rumus Hitung Jumlah Kuadrat Untuk ukuran sampel yang berbeda 2 T 2 JKT x ij N i 1 j1 k Jumlah Kuadrat Total = ni Ti2 T2 Jumlah Kuadrat Perlakuan = JKP N i 1 n i k Jumlah Kuadrat Galat = JKG JKT JKP k dengan N n i i 1 13 13 Tabel Anova Untuk ukuran sampel yang berbeda Sumber Variasi Derajat bebas Jumlah kuadrat Perlakuan k–1 JKP KRP = F= JKP/(k – 1 ) KRP/KRG KRG = JKG/(N - k) Galat N–k JKG Total N–1 JKT Kuadrat Rata-rata Statistik F 14 14 Contoh 2 • Dalam Sebuah percobaan biologi 4 konsentrasi bahan kimia digunakan untuk merangsang pertumbuhan sejenis tanaman tertentu selama periode waktu tertentu. Data pertumbuhan berikut, dalam sentimeter, dicatat dari tanaman yang hidup. • Apakah ada beda pertumbuhan rata-rata yang nyata yang disebabkan oleh keempat konsentrasi bahan kimia tersebut. • Gunakan signifikasi 0,05. Konsentrasi 1 2 3 4 8.2 7.7 6.9 6.8 8.7 8.4 5.8 7.3 9.4 8.6 7.2 6.3 9.2 8.1 6.8 6.9 8.0 7.4 7.1 6.1 15 15 Penyelesaian Hipotesa : H 0: 1 = 2 = 3= 4 H1: Ada rata-rata yang tidak sama Tingkat signifikasi = 0.05 Karena df1= derajat bebas perlakuan = 3 dan df2 = derajat bebas galat = 16, maka f(0.05;3;16) = 3.24. Jadi daerah pelokannya: H0 ditolak jika F > 3.24 16 16 Data 1 Populasi 2 3 4 8.2 7.7 6.9 6.8 8.7 8.4 5.8 7.3 9.4 8.6 7.2 6.3 9.2 8.1 6.8 6.9 8.0 7.4 7.1 Total 6.1 Total 35.5 40.8 40.2 34.4 150.9 17 17 Jumlah Kuadrat Total 2 T JKT x ij2 N i 1 j1 k ni 8.2 2 8.7 2 9.4 2 9.2 2 7.7 2 8.4 2 8.6 2 8.12 8.0 2 6.9 2 5.82 7.2 2 6.82 7.4 2 150.9 6. 1 6. 8 7 .3 6 .3 6 .9 7 .1 20 19.350 2 2 2 2 2 2 18 18 2 Jumlah Kuadrat Perlakuan dan Jumlah Kuadrat Galat Ti2 T2 JKP N i 1 n i k 35.52 40.82 40.2 2 34.4 2 150.9 2 4 5 6 5 20 15.462 JKG 19.350 15.462 3.888 19 19 Tabel Anova dan Kesimpulan Sumber Variasi Derajat Bebas Jumlah Kuadrat Kuadrat Rata-rata Perlakuan 4-1=3 15.462 5.154 Galat 20-4=16 3.888 Total 20-1=19 19.350 0.243 Statistik F F= 21.213 Karena Fhitung = 21.213 > 3.24 maka H0 ditolak. Jadi ada rata-rata yang tidak sama. 20 20 Latihan 1 Seorang kontraktor di bidang jenis jasa pengangkutan ingin mengetahui apakah terdapat perbedaan yang signifikan pada kapasitas daya angkut 3 merk truk, yaitu Mitsubishi, Toyota dan Honda. Untuk itu kontraktor ini mengambil sampel masing-masing 5 truk pada tiap-tiap merek menghasilkan data seperti disamping. Jika ketiga populasi data tersebut berdistribusi normal dan variansi ketiganya sama, uji dengan signifikasi 5% apakah terdapat perbedaan pada kwalitas daya angkut ketiga merek truk tersebut Kapasitas Mitsubishi (A) Toyota (B) Honda (A) 44 42 46 43 45 47 48 44 45 45 45 44 46 44 43 21 21 Latihan 2 Seorang guru SMU mengadakan penelitian tentang keunggulan metode mengajar dengan beberapa metode pengajaran. Bila data yang didapat seperti pada tabel disamping, ujilah dengan signifikasi 5% apakah keempat metode mengajar tersebut memiliki hasil yang sama? (asumsikan keempat data berdistribusi Normal dan variasnisnya sama) Metode A B C D 70 68 76 67 76 75 87 66 77 74 78 78 78 67 77 57 67 57 68 89 22 22 ANALISIS VARIANSI DUA ARAH (Randomized Block Design) 23 The ANOVA Procedure • The ANOVA procedure for the randomized block design requires us to partition the sum of squares total (SST) into three groups: sum of squares due to treatments, sum of squares due to blocks, and sum of squares due to error. • The formula for this partitioning is • SST = SSTR + SSBL + SSE • The total degrees of freedom, nT - 1, are partitioned such that k - 1 degrees of freedom go to treatments, b - 1 go to blocks, and (k - 1)(b - 1) go to the error term. 24 ANOVA Table for a Randomized Block Design Source of Variation F Sum of Squares Degrees of Freedom Treatments SSTR k-1 Blocks SSBL b-1 Error SSE (k - 1)(b - 1) Total SST nT - 1 Mean Squares SSTR MSTR k - 1 MSE SSBL MSBL b-1 MSTR SSE MSE ( k 1)(b 1) 25 Randomized Block Design • Example: Crescent Oil Co. Crescent Oil has developed three new blends of gasoline and must decide which blend or blends to produce and distribute. A study of the miles per gallon ratings of the three blends is being conducted to determine if the mean ratings are the same for the three blends. 26 Randomized Block Design Example: Crescent Oil Co. Five automobiles have been tested using each of the three gasoline blends and the miles per gallon ratings are shown on the next slide. 27 Randomized Block Design Automobile (Block) Type of Gasoline (Treatment) Blend X Blend Y Blend Z Block Means 30.333 29.333 28.667 31.000 25.667 1 2 3 4 5 31 30 29 33 26 30 29 29 31 25 30 29 28 29 26 Treatment Means 29.8 28.8 28.4 28 Randomized Block Design Mean Square Due to Treatments The overall sample mean is 29. Thus, SSTR = 5[(29.8 - 29)2 + (28.8 - 29)2 + (28.4 - 29)2] = 5.2 MSTR = 5.2/(3 - 1) = 2.6 Mean Square Due to Blocks SSBL = 3[(30.333 - 29)2 + . . . + (25.667 - 29)2] = 51.33 MSBL = 51.33/(5 - 1) = 12.8 • Mean Square Due to Error SSE = 62 - 5.2 - 51.33 = 5.47 MSE = 5.47/[(3 - 1)(5 - 1)] = .68 29 Randomized Block Design ANOVA Table Source of Variation Degrees of Freedom Mean Squares F 5.20 2 2.60 3.82 51.33 4 12.80 Error 5.47 8 .68 Total 62.00 14 Treatments Blocks Sum of Squares 30 Randomized Block Design • Rejection Rule p-Value Approach: Reject H0 if p-value < .05 Critical Value Approach: Reject H0 if F > 4.46 For = .05, F.05 = 4.46 (2 d.f. numerator and 8 d.f. denominator) 31 Randomized Block Design Test Statistic F = MSTR/MSE = 2.6/.68 = 3.82 • Conclusion The p-value is greater than .05 (where F = 4.46) and less than .10 (where F = 3.11). (Excel provides a p-value of .07). Therefore, we cannot reject H0. There is insufficient evidence to conclude that the miles per gallon ratings differ for the three gasoline blends. 32 • Selamat Belajar Semoga Sukses. 33 Materi Tambahan : 34 Analysis of Variance • The Completely Randomized Design: One-Way Analysis of Variance – ANOVA Assumptions – F Test for Difference in c Means – The Tukey-Kramer Procedure 35 General Experimental Setting • Investigator Controls One or More Independent Variables – Called treatment variables or factors – Each treatment factor contains two or more groups (or levels) • Observe Effects on Dependent Variable – Response to groups (or levels) of independent variable • Experimental Design: The Plan Used to Test Hypothesis 36 Completely Randomized Design • Experimental Units (Subjects) are Assigned Randomly to Groups – Subjects are assumed to be homogeneous • Only One Factor or Independent Variable – With 2 or more groups (or levels) • Analyzed by One-Way Analysis of Variance (ANOVA) 37 Randomized Design Example Factor (Training Method) Factor Levels (Groups) Randomly Assigned Units Dependent Variable (Response) 21 hrs 17 hrs 31 hrs 27 hrs 25 hrs 28 hrs 29 hrs 20 hrs 22 hrs 38 One-Way Analysis of Variance F Test • Evaluate the Difference Among the Mean Responses of 2 or More (c ) Populations – E.g., Several types of tires, oven temperature settings • Assumptions – Samples are randomly and independently drawn • This condition must be met – Populations are normally distributed • F Test is robust to moderate departure from normality – Populations have equal variances • Less sensitive to this requirement when samples are of equal size from each population 39 Why ANOVA? • Could Compare the Means One by One using Z or t Tests for Difference of Means • Each Z or t Test Contains Type I Error • The Total Type I Error with k Pairs of Means is 1- (1 - ) k – E.g., If there are 5 means and use = .05 • Must perform 10 comparisons • Type I Error is 1 – (.95) 10 = .40 • 40% of the time you will reject the null hypothesis of equal means in favor of the alternative when the null is true! 40 Hypotheses of One-Way ANOVA • H 0 : 1 2 c – All population means are equal – No treatment effect (no variation in means among groups) • H1 : Not all i are the same – At least one population mean is different (others may be the same!) – There is a treatment effect – Does not mean that all population means are different 41 One-Way ANOVA (No Treatment Effect) H 0 : 1 2 c H1 : Not all i are the same The Null Hypothesis is True 1 2 3 42 One-Way ANOVA (Treatment Effect Present) H 0 : 1 2 c H1 : Not all i are the same 1 2 3 The Null Hypothesis is NOT True 1 2 3 43 One-Way ANOVA (Partition of Total Variation) Total Variation SST = Variation Due to Group SSA Commonly referred to as: Among Group Variation Sum of Squares Among Sum of Squares Between Sum of Squares Model Sum of Squares Explained Sum of Squares Treatment Variation Due to Random Sampling SSW Commonly referred to as: + Within Group Variation Sum of Squares Within Sum of Squares Error Sum of Squares Unexplained 44 Total Variation nj c SST ( X ij X ) 2 j 1 i 1 X ij : the i -th observation in group j n j : the number of observations in group j n : the total number of observations in all groups c : the number of groups c X nj X j 1 i 1 n ij the overall or grand mean 45 Total Variation (continued) SST X 11 X X 2 21 X X 2 nc c X Response, X X Group 1 Group 2 Group 3 46 2 Among-Group Variation c SSA n j ( X j X ) j 1 2 SSA MSA c 1 X j : The sample mean of group j X : The overall or grand mean i j Variation Due to Differences Among Groups 47 Among-Group Variation (continued) SSA n1 X 1 X n X 2 2 2 X 2 nc X c X Response, X X3 X1 Group 1 Group 2 X2 Group 3 X 48 2 Within-Group Variation c nj SSW ( X ij X j ) 2 j 1 i 1 SSW MSW nc X j : The sample mean of group j X ij : The i -th observation in group j Summing the variation within each group and then adding over all groups j 49 Within-Group Variation (continued) SSW X 11 X 1 X 21 X 1 2 2 X nc c X c Response, X X3 X1 Group 1 Group 2 X2 Group 3 X 50 2 Within-Group Variation (continued) For c = 2, this is the SSW MSW pooled-variance in the nc t test. 2 2 2 (n1 1) S1 (n2 1) S2 (nc 1) Sc (n1 1) (n2 1) (nc 1) •If more than 2 groups, use F Test. •For 2 groups, use t test. F Test more limited. j 51 One-Way ANOVA F Test Statistic • Test Statistic – F MSA MSW • MSA is mean squares among • MSW is mean squares within • Degrees of Freedom – – df1 c 1 df 2 n c 52 One-Way ANOVA Summary Table Degrees Source of of Freedo Variation m Among c–1 (Factor) Within (Error) Total Sum of Squares SSA n–c SSW n–1 SST = SSA + SSW Mean Squares (Variance) F Statistic MSA = MSA/MS SSA/(c – 1 ) W MSW = SSW/(n – c ) 53 Features of One-Way ANOVA F Statistic • The F Statistic is the Ratio of the Among Estimate of Variance and the Within Estimate of Variance – The ratio must always be positive – df1 = c -1 will typically be small – df2 = n - c will typically be large • The Ratio Should Be Close to 1 if the Null is True 54 Features of One-Way ANOVA F Statistic (continued) • If the Null Hypothesis is False – The numerator should be greater than the denominator – The ratio should be larger than 1 55 One-Way ANOVA F Test Example As production manager, you want to see if 3 filling machines have different mean filling times. You assign 15 similarly trained & experienced workers, 5 per machine, to the machines. At the .05 significance level, is there a difference in mean filling times? Machine1 Machine2 Machine3 25.40 26.31 24.10 23.74 25.10 23.40 21.80 23.50 22.75 21.60 20.00 22.20 19.75 20.60 20.40 56 One-Way ANOVA Example: Scatter Diagram Machine1 Machine2 Machine3 25.40 26.31 24.10 23.74 25.10 23.40 21.80 23.50 22.75 21.60 27 20.00 22.20 19.75 20.60 20.40 X 1 24.93 X 2 22.61 X 3 20.59 X 22.71 26 25 24 23 22 21 20 • •• • • X1 •• • •• X2 • •• •• X X3 19 57 One-Way ANOVA Example Computations Machine1 Machine2 Machine3 25.40 26.31 24.10 23.74 25.10 23.40 21.80 23.50 22.75 21.60 20.00 22.20 19.75 20.60 20.40 X 1 24.93 nj 5 X 2 22.61 c3 X 3 20.59 n 15 X 22.71 2 2 2 SSA 5 24.93 22.71 22.61 22.71 20.59 22.71 47.164 SSW 4.2592 3.112 3.682 11.0532 MSA SSA /(c -1) 47.16 / 2 23.5820 MSW SSW /( n - c) 11.0532 /12 .9211 58 Summary Table Source Degree of s of Variatio Freedo n m Among (Factor) Within (Error) Total 3-1=2 153=12 151=14 Mean Squares (Variance) F Statistic 47.1640 23.5820 MSA/MS W =25.60 11.0532 .9211 Sum of Squares 58.2172 59 One-Way ANOVA Example Solution Test Statistic: H0: 1 = 2 = 3 H1: Not All Equal MSA 23.5820 25.6 F MSW .9211 = .05 df1= 2 df2 = 12 Decision: Reject at = 0.05. Critical Value(s): = 0.05 0 3.89 F Conclusion: There is evidence that at least one i differs from the rest. 60 The Tukey-Kramer Procedure • Tells which Population Means are Significantly Different – E.g., 1 = 2 3 f(X) – 2 groups whose means may be significantly different X 1= 2 3 • Post Hoc (A Posteriori) Procedure – Done after rejection of equal means in ANOVA • Pairwise Comparisons – Compare absolute mean differences with critical range 61 The Tukey-Kramer Procedure: Example Machine1 Machine2 Machine3 25.40 23.40 20.00 26.31 21.80 22.20 24.10 23.50 19.75 23.74 22.75 20.60 25.10 21.60 20.40 2. Compute critical range: Critical Range QU ( c,nc ) 1. Compute absolute mean differences: X 1 X 2 24.93 22.61 2.32 X 1 X 3 24.93 20.59 4.34 X 2 X 3 22.61 20.59 2.02 MSW 2 1 1 1.618 nj nj' 3. All of the absolute mean differences are greater than the critical range. There is a significant difference between each pair of means at the 5% level of significance. 62 Levene’s Test for Homogeneity of Variance • The Null Hypothesis 2 2 2 H : – 0 1 2 c – The c population variances are all equal • The Alternative Hypothesis 2 – H1 : Not all j are equal ( j 1, 2, , c) – Not all the c population variances are equal 63 Levene’s Test for Homogeneity of Variance: Procedure 1. For each observation in each group, obtain the absolute value of the difference between each observation and the median of the group. 2. Perform a one-way analysis of variance on these absolute differences. 64 Levene’s Test for Homogeneity of Variances: Example As production manager, you want to see if 3 filling machines have different variance in filling times. You assign 15 similarly trained & experienced workers, 5 per machine, to the machines. At the .05 significance level, is there a difference in the variance in filling times? Machine1 Machine2 Machine3 25.40 26.31 24.10 23.74 25.10 23.40 21.80 23.50 22.75 21.60 20.00 22.20 19.75 20.60 20.40 65 Levene’s Test: Absolute Difference from the Median median Machine1 25.4 26.31 24.1 23.74 25.1 25.1 Time Machine2 Machine3 23.4 20 21.8 22.2 23.5 19.75 22.75 20.6 21.6 20.4 22.75 20.4 abs(Time - median(Time)) Machine1 Machine2 Machine3 0.3 0.65 0.4 1.21 0.95 1.8 1 0.75 0.65 1.36 0 0.2 0 1.15 0 66 Summary Table SUMMARY Groups Machine1 Machine2 Machine3 Count 5 5 5 ANOVA Source of Variation SS Between Groups 0.067453 Within Groups 4.17032 Total 4.237773 Sum Average Variance 3.87 0.774 0.35208 3.5 0.7 0.19 3.05 0.61 0.5005 df MS F P-value F crit 2 0.033727 0.097048 0.908218 3.88529 12 0.347527 14 67 Levene’s Test Example: Solution 2 2 2 H0: 1 2 3 H1: Not All Equal Test Statistic: MSA 0.0337 F 0.0970 MSW 0.3475 = .05 df1= 2 df2 = 12 Decision: Critical Value(s): Do not reject at = 0.05. = 0.05 0 3.89 F Conclusion: There is no evidence that 2 at least one j differs from the rest. 68