Uploaded by Yong Ler Kee

BFN2254 Topic 5

advertisement
BFN2254
FINANCIAL STATISTICAL ANALYSIS
TRIMESTER 1 SESSION 2021/2022
▪ The chi-square goodness-of-fit is a nonparametric test.
▪ It is also known as Pearson’s chi-square goodness-of-fit.
▪ A goodness-of-fit (GOF) test is used to find out how the observed
value of a given phenomena is significantly different from the
expected value.
▪
The term goodness of fit is used to compare the observed
sample distribution with the expected probability distribution.
▪ The proportion of cases expected in each group of the categorical
variables can be EQUAL or UNEQUAL –this part is critical – Not only
is it an important aspect of your research design, but from a practical
perspective, it will determine how you carry out the test in SPSS, as
well as how you interpret the results.
▪ Null
hypothesis assumes that there is no
significant difference between the observed and
the expected value.
▪ Alternative hypothesis assumes that there is a
significant difference between the observed and
the expected value.
Objectives – Equal vs Unequal Proportion:
 To compare for any proportion differences among
Two or more categorical responses (variables) of
independent populations; refer Example 1.
 To compare expected (theoretical) frequencies of
categories from a population distribution to the
observed (actual) frequencies from a distribution;
refer Example 2.
It is appropriate to use this test to give us a valid result if our data
meets the following assumptions:
▪ Involve ONE CATEGORICAL variable (binary, nominal or ordinal)
only and their frequencies of occurrences,
▪ The sample was randomly & independently drawn from
population.
the
▪ The groups of the categorical variable are mutually exclusive.
▪ Minimum expectation of five occurrences in each category – it will
be shown in your SPSS output when we run the test.
H0: 1 = 2 =….. = k
(The proportions are equal)
H1: Not all ’s are equal
(At least one proportion is different)
H0: 1 = a; 2 =b; …..; i =k
H1: Not all ’s are equal to the specified values.
▪ Example : The frequencies of 1000 data values based on three (3) type of gifts:
1
The frequency data has already
been summated for the various
categories.
2
The data is in raw (not yet
summated the frequencies for
each groups)
 The records of an investment banking firm shows that frequencies
distribution of clients who primarily interested in the stock market,
the bonds market, and in the futures market are identical.
 A recent sample of 200 clients showed that 132 were primarily
interested in stocks, 52 in the bonds market and 16 in the futures
market.
 Test if there is a significant difference in the proportion of primary
interest of clients across the three type of investment funds.
H0: stock = bonds = futures
H1: Not all ’s are equal
***This procedure is necessary only when you have
summated your categories
1
2
3
Category
Observed Expected
N
N
Residual
Stock
132
66.7
65.3
Bond
52
66.7
-14.7
Future
16
66.7
-50.7
Total
200
Test Statistics
Category
Chi-Square 105.760a
df
2
Asymp. Sig.
.000
a. 0 cells (0.0%) have
expected frequencies
less than 5. The
minimum expected
cell frequency is 66.7.
Conclusion:
❑ Since ² test statistic =
105.760
with
p-value
=0.000 < α=0.05,
❑ Hence, reject H0.
❑ It can be concluded that
at least one proportion of
clients’
interest
is
different across the three
investment funds.
The records of an investment banking firm shows that, historically
60% of its clients were primarily interested in the stock market,
36% in the bonds market, and 4% in the futures market.
A recent sample of 200 clients showed that 132 were primarily
interested in stocks, 52 in the bonds market and 16 in the futures
market.
Is there sufficient evidence to conclude, at the 1% level of
significance, that there has been a shift in the primary interest of
clients?
H0: 1 = 0.60, 2 = 0.36, 3 = 0.04
H1: Not all ’s are equal to the
specified values
Descriptive Statistics:
Category
Observed Expected Residua
N
N
l
Stock
132
120.0
12.0
Bond
52
72.0
-20.0
Future
16
8.0
8.0
Total
200
Test Statistics:
Test Statistics
Category
Chi-Square
14.756a
df
2
Asymp. Sig.
.001
a. 0 cells (0.0%) have
expected frequencies less
than 5. The minimum
expected cell frequency is
8.0.
Conclusion:
▪
▪
▪
Since the 2 test-statistic = 14.756
with p-value=0.001<α = 0.01,
Thus, Reject H0
It can be concluded that there is
a significant change in the
proportions of client preferred
investment.
 More shoppers do the majority of their grocery shopping on Saturday
than any other day of the week. From last year’s record, the proportion
of people who do the majority of their grocery shopping on Saturday
for three age groups were 40% for under 35 years old; 35% for 35 to 54
years old and 25% for over 54 years old.
 From recent study with 128 shoppers showed that 48 (under 35 years
old),56 (35 to 54 years old) and 24 (over 54) do their grocery shopping
on Saturday.
 Is there sufficient evidence to conclude, at the 5% level of significance,
that there has been a shift in the percentage of doing the grocery
shopping on Saturday from last year?
▪ It is a nonparametric test.
▪ Also known as a Test of Association
▪ To
analyze the relationship between two categorical
variables.
▪ This test utilized a contingency table (crosstab) to analyze the
data
▪ Examples:
▪ Gender vs. Methods of Payment
▪ Age Group vs. Types of Sports
▪ Geographical Region vs. Size of Company
It is appropriate to use this test to give us a valid result if our data
meets the following assumptions:
▪ Involve TWO CATEGORICAL variable (binary, nominal or
ordinal) only and their frequencies of occurrences.
▪ Independence of observations.
▪ Relatively large sample size
▪ Expected frequencies for each cell are at least 1.
▪ Expected frequencies should be at least 5 for the majority (80%) of
the cells.
H0: There is no significant association between
A and B (i.e. independent)
H1: There is a significant association between A
and B (i.e. dependent)
▪ There are two different ways in which your data can be set up.
▪ The format of the data will determine how to proceed with the test procedures.
1
▪ The frequency data has already been
summated for the various categories.
▪ Each row in the dataset represents a distinct
combination of the categories.
▪ The values in the “frequencies” column for a
given row in the number of unique subjects
with that combination of categories.
▪ You should have three (3) variables.
***Before running the test, we must activate
“Weight cases” – set the frequency variable as
the weight.
2
Raw Data (each row is a subject):
• Case represent subjects.
• Each row represents an
observation from a unique
subject.
• The dataset contains at least two
categorical variables (string or
numeric)
A credit card company carried out a study to determine if there is any
association between gender and their preferred method of payment. A
sample of 600 respondents were selected and their responses were
classified in the two way cross tabulation shown below.
At the 5% significance level, is there enough evidence to indicate a
significant association between gender and preferred mode of payment?
Gender
Method of Payment
Total
Cash
Cheque
Credit card
Male
30
105
180
315
Female
36
114
135
285
Total
66
219
315
600
Hypothesis:
H0:There is no significant association between gender
and preferred mode of payment
H1:There is a significant association between gender
and preferred mode of payment
▪ In SPSS, the Chi-Square Test of Independence is an option within the Crosstabs
procedure.
▪ To create a crosstab and perform a chi-square test of independence,
Click Analyze
Descriptive
Crosstabs
To produce the main test statistic
& its significance value.
Optional: To examine the effect
size (strength of association)
CROSSTABULATION TABLE:
❑ To study the distribution of the
dataset for each combination of
one-level from each categorical
variable.
Gender * Method Crosstabulation
Method
Gender
Male
Female
Total
Cash
30
9.5%
Cheque
105
33.3%
Credit Card
180
57.1%
Total
315
100.0%
% within
Method
45.5%
47.9%
57.1%
52.5%
Count
% within
Gender
36
12.6%
114
40.0%
135
47.4%
285
100.0%
% within
Method
54.5%
52.1%
42.9%
47.5%
Count
% within
Gender
66
11.0%
219
36.5%
315
52.5%
600
100.0%
% within
Method
100.0% 100.0%
100.0%
100.0%
Count
% within
Gender
Chi-Square Tests
Pearson ChiSquare
Likelihood Ratio
Linear-byLinear
Association
N of Valid Cases
Value df
5.859a
2
5.866
5.357
2
1
Asymptotic
Significance
(2-sided)
.053
.053
.021
600
a. 0 cells (0.0%) have expected count less
than 5. The minimum expected count is 31.35.
❑The value of the test statistic is
5.859.
❑From the footnote, No cells had an
expected count less than 5, so this
assumption was met.
❑The p-value is 0.053 > α=0.05, thus,
Do Not Reject H0 .
❑ It can be concluded that there is
insufficient evidence to infer if
there is a significant association
between gender and preferred
mode of payment.
Symmetric Measures
Nominal by
Nominal
N of Valid Cases
Phi
Cramer's V
Value
.099
.099
600
Approximate
Significance
.053
.053
❑Phi and Cramer’s V are both tests of the strength of association.
❑Particularly for nominal vs nominal OR nominal by ordinal.
❑From the coefficient value which is 0.099, it shows that the strength of
association between the variables is very weak.
❑Additionally, this finding is NOT statistically significant; pvalue=0.053> α=0.05.
▪ An educator has an opinion that the grades high
school students make depend on the amount of time
they spend listening to music. The data from his
survey on 400 students are recorded.
▪ Using a 5% significance level, test whether grades
and time spent listening to music are independent
or not.
Hypothesis:
H0:There is no significant association between grade and
amount time spent listening to music
H1:There is a significant association between grade and
amount time spent listening to music
▪ In SPSS, the Chi-Square Test of Independence is an option within the Crosstabs
procedure.
▪ To create a crosstab and perform a chi-square test of independence,
Click Analyze
Descriptive
Crosstabs
Hours * Grade Crosstabulation
Count
Grade
A
B
C
D
E
Total
Hours <5 hours
13
10
11
16
5
55
5to10hrs
20
27
27
19
2
95
11to20hrs
9
27
71
16
32
155
>20hours
8
11
41
24
11
95
50
75 150 75
50
400
Total
Chi-Square Tests
Value
63.830a
df
12
Asymptotic
Significance
(2-sided)
.000
Pearson ChiSquare
Likelihood Ratio
67.534 12
.000
Linear-by-Linear
13.160
1
.000
Association
N of Valid Cases
400
a. 0 cells (0.0%) have expected count less than
5. The minimum expected count is 6.88.
❑ The value of the test statistic is 63.830.
❑ From the footnote, No cells had an
expected count less than 5, so this
assumption was met.
❑ The p-value is 0.000 < α=0.05, thus,
Reject H0 .
❑ It can be concluded that there is a
significant
association
between
grades and time spent listening to
music.
Symmetric Measures
Asymptotic
Approximate Approximate
Standard
Tb
Significance
Value
Errora
.207
.056
3.687
.000
Gamma
Ordinal by
Ordinal
N of Valid Cases
400
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.
❑Gamma is a test of the strength of association; particularly for ordinal
vs ordinal.
❑From the coefficient value which is 0.207, it shows that the strength of
association between the variables is weak.
❑Additionally, this finding is statistically significant; p-value=0.000<
α=0.05. Thus, it can be used to support the chi-square main test result.
▪ Students in MMU were surveyed in order to evaluate the effect of
gender and price on purchasing pizza from Pizza Hut. The students
had to decide between ordering from Pizza Hut at a reduced price of
RM8.49 or ordering from a different pizzeria. The results are
summarized in the following contingency table:
Gender
Female
Male
Pizzeria
Pizza Hut
Other
5
13
6
12
▪ Using 0.05 level of significance, is there evidence of a significant
difference between male and female in their pizzeria selection?
Download