c 2 Content Further guidance Hypothesis testing involves making a conjecture (assumption) about some facet of our world, collecting data from a sample, performing a specific mathematical test, and then deciding whether or not the conjecture is true. A conjecture must be stated in two parts: › The null hypothesis (H0) – states that there is no significant difference between the two parameters being tested (they are “not related to” each other, i.e. independent) › The alternative hypothesis (H1) states that this is a significant difference. (they are “related” in some way, i.e dependent) The only hypothesis test covered by the Studies SL course is the Chi Squared test. The Chi-square test itself is quite straight forward, your GDC can do it in two steps but you also must know the formula and be able to do it by hand The hypothesis test which uses Chi-square determines whether or not two variables are related. It follows a general pattern: (1) Make a conjecture (2) Write the null hypothesis using “is not related to, or “independent”; and write the alternative hypothesis using is related to or “dependent” (3) Calculate the chi-square test (4) Determine reference values (5) Compare the two and either accept or reject the null hypothesis You can find chi-squared on your GDC by using the statistics mode Press Menu 6: Statistics Press 7: Stat Tests Select 8: x2 2-way Test Enter the name of the observed matrix Note : You must have entered the data in to Matrix mode first and name the matrix!! The table shows the results of a sample of 400 randomly selected adults classified according to gender and regular exercise. The observed table, called 2 by 2 contingency table, is given as: Regular exercise No regular exercise sum Male 110 106 216 Female 98 86 184 sum 208 192 400 c 2 test is used when we deal with two categorical variables The and we wish to determine whether the variables are dependent, for example females may tend to exercise more regularly, or independent, where there is no evidence that the gender of person has an effect on whether they exercise regularly. Enter the observed table as a matrix and save as a. Then use the chi squared test. After you enter the results will show. How can we obtain the expected frequency table by hand? For each cell, we multiply the row sum by the column sum and divide by the total. Regular exercise No regular exercise sum Male 110 106 216 Female 98 86 184 sum 208 192 400 row sum total column sum Regular exercise Male Female sum 216x208/400 184´ 208 400 208 No regular exercise 216´ 192 400 sum 216 184x192/400 184 192 400 Regular exercise No regular exercise sum Male 112.32 103.68 216 Female 95.68 88.32 184 sum 208 192 400 How do we calculate the chi squared? The chi squared test examines the difference between the observed values we obtained from the original sample, and the expected values we have calculated . This value will be obtained from your GDC. In this case x2=0.217 c 2 The critical value of chi squared depends on the significance level and degrees of freedom (size of the table). For a contingency table which has r rows and c columns, degrees of freedom df are: df = (r - 1)(c - 1) Regular exercise No regular exercise sum Male 110 106 216 Female 98 86 184 sum 208 192 400 df = (r - 1)(c - 1) = (2 - 1)(2 - 1) = 1 As 0.217 is less than 3.841 we accept the null hypothesis and conclude that gender and regular exercise are independent. Use critical chi squared value of 7.815 BLACK WHITE RED BLUE Total Male 51 22 33 24 130 Female 45 36 22 27 130 Total 96 58 55 51 260 BLACK Male 130´ 96 260 WHITE 58 Total 130 130´ 55 260 96 BLUE 130´ 58 260 Female Total RED 55 130´ 51 260 51 130 260 [48.,29.,27.5,25.5] [48.,29.,27.5,25.5] As 6.13 is less than 7.815 we accept the null hypothesis. Conclusion: The favourite colour is independent of gender. Calculate degrees of freedom: Number of rows r = 2 Number of column c = 4 df=(2-1)(4-1)= 3 Hand calculations of c2 We use the formula, as given in your formula booklet: 2 c2= å ( f0 - fe ) fe Hand calculations of chi squared. We need to construct the table: 2 fo fe ( f0 - fe ) 2 f0 - fe ( f0 - fe ) 51 48 3 9 0.1875 22 29 -7 49 1.6897 33 27.5 5.5 30.25 1.1 24 25.5 -1.5 2.25 0.08824 45 48 -3 9 0.1875 36 29 7 49 1.68966 22 27.5 -5.5 30.25 1.1 27 25.5 1.5 2.25 0.08824 Total 6.13084 fe From what Lauren observed, she believes that the number of hours exercised per week is dependent on gender. She collected data randomly and organised the results in the table shown. Determine whether there is enough evidence to accept or reject the null hypothesis: a) for α=0.01 › b) for α=0.05 › c) for α=0.10 › Hours exercised per week Male 5 10 12 Female 9 8 4 H0 – The number of hours exercised each week independent on gender › H1 – The number of hours exercised each week is dependent on gender › Hours exercised per week Write the null and alternative hypotheses Calculate chi-square and the pvalue X 2 Test X 2 = 4.69 (3sf) p = 0.0959 (3sf) df = 2 Whilst it is not technically correct to say “accept H0” it is still accepted in the IB. Male 5 10 12 Female 9 8 4 Compare p-value to each signficance level a) 0.09>0.01, hence accept null hypothesis b) 0.09>0.05, hence accept null hypothesis c) 0.09<0.10, hence we reject the null hypothesis • This formula is on the IB formula sheet 2 calc f o fe 2 fe › fo is the observed frequencies (i.e the raw data) › fe is the expected frequencies It is easiest to perform this sum calculation using a table one step at a time. If you are comparing p-value with α-level then if: › p > α accept the null hypothesis › p < α reject the null hypothesis If you are comparing X 2 with CV then if: › X 2 < CV accept the null hypothesis › X 2 > CV reject the null hypothesis