BFN2254 FINANCIAL STATISTICAL ANALYSIS TRIMESTER 1 SESSION 2021/2022 ▪ The chi-square goodness-of-fit is a nonparametric test. ▪ It is also known as Pearson’s chi-square goodness-of-fit. ▪ A goodness-of-fit (GOF) test is used to find out how the observed value of a given phenomena is significantly different from the expected value. ▪ The term goodness of fit is used to compare the observed sample distribution with the expected probability distribution. ▪ The proportion of cases expected in each group of the categorical variables can be EQUAL or UNEQUAL –this part is critical – Not only is it an important aspect of your research design, but from a practical perspective, it will determine how you carry out the test in SPSS, as well as how you interpret the results. ▪ Null hypothesis assumes that there is no significant difference between the observed and the expected value. ▪ Alternative hypothesis assumes that there is a significant difference between the observed and the expected value. Objectives – Equal vs Unequal Proportion: To compare for any proportion differences among Two or more categorical responses (variables) of independent populations; refer Example 1. To compare expected (theoretical) frequencies of categories from a population distribution to the observed (actual) frequencies from a distribution; refer Example 2. It is appropriate to use this test to give us a valid result if our data meets the following assumptions: ▪ Involve ONE CATEGORICAL variable (binary, nominal or ordinal) only and their frequencies of occurrences, ▪ The sample was randomly & independently drawn from population. the ▪ The groups of the categorical variable are mutually exclusive. ▪ Minimum expectation of five occurrences in each category – it will be shown in your SPSS output when we run the test. H0: 1 = 2 =….. = k (The proportions are equal) H1: Not all ’s are equal (At least one proportion is different) H0: 1 = a; 2 =b; …..; i =k H1: Not all ’s are equal to the specified values. ▪ Example : The frequencies of 1000 data values based on three (3) type of gifts: 1 The frequency data has already been summated for the various categories. 2 The data is in raw (not yet summated the frequencies for each groups) The records of an investment banking firm shows that frequencies distribution of clients who primarily interested in the stock market, the bonds market, and in the futures market are identical. A recent sample of 200 clients showed that 132 were primarily interested in stocks, 52 in the bonds market and 16 in the futures market. Test if there is a significant difference in the proportion of primary interest of clients across the three type of investment funds. H0: stock = bonds = futures H1: Not all ’s are equal ***This procedure is necessary only when you have summated your categories 1 2 3 Category Observed Expected N N Residual Stock 132 66.7 65.3 Bond 52 66.7 -14.7 Future 16 66.7 -50.7 Total 200 Test Statistics Category Chi-Square 105.760a df 2 Asymp. Sig. .000 a. 0 cells (0.0%) have expected frequencies less than 5. The minimum expected cell frequency is 66.7. Conclusion: ❑ Since ² test statistic = 105.760 with p-value =0.000 < α=0.05, ❑ Hence, reject H0. ❑ It can be concluded that at least one proportion of clients’ interest is different across the three investment funds. The records of an investment banking firm shows that, historically 60% of its clients were primarily interested in the stock market, 36% in the bonds market, and 4% in the futures market. A recent sample of 200 clients showed that 132 were primarily interested in stocks, 52 in the bonds market and 16 in the futures market. Is there sufficient evidence to conclude, at the 1% level of significance, that there has been a shift in the primary interest of clients? H0: 1 = 0.60, 2 = 0.36, 3 = 0.04 H1: Not all ’s are equal to the specified values Descriptive Statistics: Category Observed Expected Residua N N l Stock 132 120.0 12.0 Bond 52 72.0 -20.0 Future 16 8.0 8.0 Total 200 Test Statistics: Test Statistics Category Chi-Square 14.756a df 2 Asymp. Sig. .001 a. 0 cells (0.0%) have expected frequencies less than 5. The minimum expected cell frequency is 8.0. Conclusion: ▪ ▪ ▪ Since the 2 test-statistic = 14.756 with p-value=0.001<α = 0.01, Thus, Reject H0 It can be concluded that there is a significant change in the proportions of client preferred investment. More shoppers do the majority of their grocery shopping on Saturday than any other day of the week. From last year’s record, the proportion of people who do the majority of their grocery shopping on Saturday for three age groups were 40% for under 35 years old; 35% for 35 to 54 years old and 25% for over 54 years old. From recent study with 128 shoppers showed that 48 (under 35 years old),56 (35 to 54 years old) and 24 (over 54) do their grocery shopping on Saturday. Is there sufficient evidence to conclude, at the 5% level of significance, that there has been a shift in the percentage of doing the grocery shopping on Saturday from last year? ▪ It is a nonparametric test. ▪ Also known as a Test of Association ▪ To analyze the relationship between two categorical variables. ▪ This test utilized a contingency table (crosstab) to analyze the data ▪ Examples: ▪ Gender vs. Methods of Payment ▪ Age Group vs. Types of Sports ▪ Geographical Region vs. Size of Company It is appropriate to use this test to give us a valid result if our data meets the following assumptions: ▪ Involve TWO CATEGORICAL variable (binary, nominal or ordinal) only and their frequencies of occurrences. ▪ Independence of observations. ▪ Relatively large sample size ▪ Expected frequencies for each cell are at least 1. ▪ Expected frequencies should be at least 5 for the majority (80%) of the cells. H0: There is no significant association between A and B (i.e. independent) H1: There is a significant association between A and B (i.e. dependent) ▪ There are two different ways in which your data can be set up. ▪ The format of the data will determine how to proceed with the test procedures. 1 ▪ The frequency data has already been summated for the various categories. ▪ Each row in the dataset represents a distinct combination of the categories. ▪ The values in the “frequencies” column for a given row in the number of unique subjects with that combination of categories. ▪ You should have three (3) variables. ***Before running the test, we must activate “Weight cases” – set the frequency variable as the weight. 2 Raw Data (each row is a subject): • Case represent subjects. • Each row represents an observation from a unique subject. • The dataset contains at least two categorical variables (string or numeric) A credit card company carried out a study to determine if there is any association between gender and their preferred method of payment. A sample of 600 respondents were selected and their responses were classified in the two way cross tabulation shown below. At the 5% significance level, is there enough evidence to indicate a significant association between gender and preferred mode of payment? Gender Method of Payment Total Cash Cheque Credit card Male 30 105 180 315 Female 36 114 135 285 Total 66 219 315 600 Hypothesis: H0:There is no significant association between gender and preferred mode of payment H1:There is a significant association between gender and preferred mode of payment ▪ In SPSS, the Chi-Square Test of Independence is an option within the Crosstabs procedure. ▪ To create a crosstab and perform a chi-square test of independence, Click Analyze Descriptive Crosstabs To produce the main test statistic & its significance value. Optional: To examine the effect size (strength of association) CROSSTABULATION TABLE: ❑ To study the distribution of the dataset for each combination of one-level from each categorical variable. Gender * Method Crosstabulation Method Gender Male Female Total Cash 30 9.5% Cheque 105 33.3% Credit Card 180 57.1% Total 315 100.0% % within Method 45.5% 47.9% 57.1% 52.5% Count % within Gender 36 12.6% 114 40.0% 135 47.4% 285 100.0% % within Method 54.5% 52.1% 42.9% 47.5% Count % within Gender 66 11.0% 219 36.5% 315 52.5% 600 100.0% % within Method 100.0% 100.0% 100.0% 100.0% Count % within Gender Chi-Square Tests Pearson ChiSquare Likelihood Ratio Linear-byLinear Association N of Valid Cases Value df 5.859a 2 5.866 5.357 2 1 Asymptotic Significance (2-sided) .053 .053 .021 600 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 31.35. ❑The value of the test statistic is 5.859. ❑From the footnote, No cells had an expected count less than 5, so this assumption was met. ❑The p-value is 0.053 > α=0.05, thus, Do Not Reject H0 . ❑ It can be concluded that there is insufficient evidence to infer if there is a significant association between gender and preferred mode of payment. Symmetric Measures Nominal by Nominal N of Valid Cases Phi Cramer's V Value .099 .099 600 Approximate Significance .053 .053 ❑Phi and Cramer’s V are both tests of the strength of association. ❑Particularly for nominal vs nominal OR nominal by ordinal. ❑From the coefficient value which is 0.099, it shows that the strength of association between the variables is very weak. ❑Additionally, this finding is NOT statistically significant; pvalue=0.053> α=0.05. ▪ An educator has an opinion that the grades high school students make depend on the amount of time they spend listening to music. The data from his survey on 400 students are recorded. ▪ Using a 5% significance level, test whether grades and time spent listening to music are independent or not. Hypothesis: H0:There is no significant association between grade and amount time spent listening to music H1:There is a significant association between grade and amount time spent listening to music ▪ In SPSS, the Chi-Square Test of Independence is an option within the Crosstabs procedure. ▪ To create a crosstab and perform a chi-square test of independence, Click Analyze Descriptive Crosstabs Hours * Grade Crosstabulation Count Grade A B C D E Total Hours <5 hours 13 10 11 16 5 55 5to10hrs 20 27 27 19 2 95 11to20hrs 9 27 71 16 32 155 >20hours 8 11 41 24 11 95 50 75 150 75 50 400 Total Chi-Square Tests Value 63.830a df 12 Asymptotic Significance (2-sided) .000 Pearson ChiSquare Likelihood Ratio 67.534 12 .000 Linear-by-Linear 13.160 1 .000 Association N of Valid Cases 400 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 6.88. ❑ The value of the test statistic is 63.830. ❑ From the footnote, No cells had an expected count less than 5, so this assumption was met. ❑ The p-value is 0.000 < α=0.05, thus, Reject H0 . ❑ It can be concluded that there is a significant association between grades and time spent listening to music. Symmetric Measures Asymptotic Approximate Approximate Standard Tb Significance Value Errora .207 .056 3.687 .000 Gamma Ordinal by Ordinal N of Valid Cases 400 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. ❑Gamma is a test of the strength of association; particularly for ordinal vs ordinal. ❑From the coefficient value which is 0.207, it shows that the strength of association between the variables is weak. ❑Additionally, this finding is statistically significant; p-value=0.000< α=0.05. Thus, it can be used to support the chi-square main test result. ▪ Students in MMU were surveyed in order to evaluate the effect of gender and price on purchasing pizza from Pizza Hut. The students had to decide between ordering from Pizza Hut at a reduced price of RM8.49 or ordering from a different pizzeria. The results are summarized in the following contingency table: Gender Female Male Pizzeria Pizza Hut Other 5 13 6 12 ▪ Using 0.05 level of significance, is there evidence of a significant difference between male and female in their pizzeria selection?