Uploaded by Leslie Anne Quilit

Chapter 4 Correlational Analysis

advertisement
CHAPTER IV:
CORRELATIONAL ANALYSIS
Topic Outline:
1.
2.
3.
4.
5.
6.
7.
8.
Introduction
Hypothesis Testing for Correlation
Pearson Product-Moment Correlation (Pearson r)
Spearman Rank Correlation (Spearman rho, )
Gamma Correlation (G)
Point-Biserial Correlation (rpb)
Lambda Correlation ()
Chi-Square (2) Tests
Learning Outcomes:
At the end of the unit, the students must have:
1. discussed the conditions imposed by each measure of
relationship/associations;
2. computed and interpreted each measure of relationship;
3. performed hypothesis testing involving each of the measures of relationship;
and
4. differentiated multiple from partial correlation.
STAT 201: Statistical Methods I
Prepared by:
Prof. Jeanne Valerie Agbayani-Agpaoa
Dr. Virgilio Julius P. Manzano, Jr.
Engr. Lawrence John C. Tagata
CHAPTER IV: CORRELATIONAL ANALYSIS
TOPIC 1:
INTRODUCTION
Measure of correlation or relationship is used to find the amount and degree of relationship or the absence
of relationship between two sets of values, characteristics or variables. This relationship is expressed by a factor
called Coefficient of Correlation. It may be expressed as an abstract number. It is the ratio of two values, or series
of values, or variables being compared. It can also be expressed in percent.
Correlation is a measure of degree of relationship between paired data. All statistical research aim to
establish relationship between paired variables to enable the researcher to predict one variable in terms of the
other variable. For example, grades in Science and English tend to be related to high grades in Mathematics.
However, in some instances, there may be weak or none at all, such as the bulk of sales of candy tend to be
unrelated to the rate of crime in a particular place. It must be remembered that correlation does not determine the
cause and effect of the relationship, but rather it merely focuses on the strength of the relationship between paired
data.
Simple correlation is amenable to either ungrouped or grouped data, for nominal, ordinal, or interval
scales of data. Usually, however, rank correlation is aptly applied to ordinal data when the number of items or
cases is rather small (less than 30).
The term correlation refers to the association which occurs between two or more statistical series of
values. The coefficient of correlation which represents correlation values shows the extent to which two variables
are related and to what extent variations in one group of data go the variations in the other. Coefficient of
correlation is a single number that tells us to what extent two values are related. It can vary from 1.00 which means
perfect positive correlation through 0, which means no correlation at all, and -1.00 which means perfect negative
correlation.
Perfect correlation refers to direct relationship between any two sets of data in that any increase in the
values of the first set of data will correspondingly generate in a corresponding increase or decrease in the second
set of data, respectively. When correlation is negative, an inverse behaviour of data is observed, that is, a decrease
in values of a first set of data will result in an increase in the second set being compared or vice versa. When there
is a minimal or even zero change at one time or another between two sets of data being correlated, there is little
or no correlation at all.
The coefficient of correlation does not give directly anything like a percentage of relationship. It cannot
be concluded that a correlation value of 0.50 indicates twice the relationship that is indicated by a correlation
value of 0.25. A coefficient of relationship is an index number, not a measurement on an interval scale. Moreover,
we cannot compute a coefficient of correlation from just two measurements on one person alone.
STAT 201: Statistical Methods I
Coefficient of correlation has some uses which are as follows:
1. It indicates the amount of agreement between scores on any two sets of data. It is an index of the
predictive value of a test.
2. It is a form or reliability coefficient which can be obtained by correlating scores of two alternatives or
parallel forms of the same test.
3. The correlation value is always relative to the situation under which it is obtained and should be
interpreted in the light of those circumstances. Its size does not represent absolute natural facts.
51
CHAPTER IV: CORRELATIONAL ANALYSIS
Below is a guide in interpreting Coefficient of Correlation:
-1.00
Perfect Negative Correlation
-0.99 to -0.75
Very High Negative Correlation
-0.74 to -0.50
High Negative Correlation
-0.49 to -0.25
Moderately Small Negative Correlation
-0.24 to -0.01
Very Small Negative Correlation
0
No Correlation
+0.01 to +0.24
Very Small Positive Correlation
+0.25 to +0.49
Moderately Small Positive Correlation
+0.50 to +0.74
High Positive Correlation
+0.74 to +0.99
Very High Positive Correlation
+1.00
Perfect Positive Correlation
Anybody who wants to interpret the result of the coefficient of correlation should be guided by the
following reminders:
1. The relationship of two variables does not necessarily mean that one is the cause or the effect of the other
variable. It does not imply cause-effect relationship.
2. When the computed r is high, it does not necessarily mean that one factor is strongly dependent on the
other. This is shown by height and intelligence of people. Making a correlation here does not make any
sense at all. On the other hand, when the computed r is small, it does not necessarily mean that one factor
has no dependence on the other factor. This may be applicable to IQ and grades in school. A low grade
would suggest that a student did not make use of his time in studying.
3. If there is a reason to believe that the two variables are related and the computed r is high, these two
variables are really meant to be associated. On the other hand, if the variables correlated are low (though
theoretically related), other factors might be responsible for such small associations.
4. Lastly, the meaning of correlation coefficient simply informs us that when two variables change, there
may be a strong or weak relationship taking place.
TOPIC 2:
HYPOTHESIS TESTING FOR CORRELATIONS
It is often useful to test the hypotheses
Ho: 𝜌 = 0. (There is no significant relationship)
Ha: 𝜌 ≠ 0. (There is a significant relationship)
𝑻𝟎 =
𝑹√𝒏 − 𝟐
√𝟏 − 𝑹𝟐
which has the t distribution with 𝑛 − 2 degrees of freedom if Ho: 𝜌 = 0 is true.
Therefore, we would reject the null hypothesis if
|𝒕𝟎 | > 𝒕𝜶,𝒏−𝟐
𝟐
STAT 201: Statistical Methods I
Test Statistic
for Zero
Correlation
52
CHAPTER IV: CORRELATIONAL ANALYSIS
TOPIC 3:
PEARSON PRODUCT MOMENT COEFFICIENT OF CORRELATION
This is a linear correlation necessary to find the degree of the association of two sets of variables, x and
y. this is the most commonly used measure of correlation to determine the relationship between two sets of
variables quantitatively.
For any two variables, x and y, the correlation coefficient between them can be determined using Pearson Product
Moment Coefficient of Correlation:
𝒏 ∑ 𝒙𝒚 − [(∑ 𝒙)(∑ 𝒚)]
𝒓𝒙𝒚 =
√[(𝒏 ∑ 𝒙𝟐 ) − (∑ 𝒙)𝟐 ][[(𝒏 ∑ 𝒚𝟐 ) − (∑ 𝒚)𝟐 ]]
Consider the values of x and y on the descriptive problem, “What is the relationship
between the NSAT percentile rank and the scholastic rating of BS Physics students in
selected universities and colleges in a certain region?
Example:
Student
NSAT Percentile Rank, x
Scholastic Rating, y
Student
1
2
3
4
5
6
7
8
9
10
Totals
𝑟𝑥𝑦 =
NSAT
Percentile
Rank,
x
60
73
61
70
75
79
65
67
77
80
707
1
60
78
2
73
87
3
61
80
4
70
86
5
75
87
6
79
90
7
65
85
8
67
84
Scholastic
Rating,
y
x2
y2
xy
78
87
80
86
87
90
85
84
89
90
856
3,600
5,329
3,721
4,900
5,625
6,241
4,225
4,489
5,929
6,400
50,459
6,084
7,569
6,400
7,396
7,569
8,100
7,225
7,056
7,921
8,100
73,420
4,680
6,351
4,880
6,020
6,525
7,110
5,525
5,628
6,853
7,200
60,772
𝑛 ∑ 𝑥𝑦 − [(∑ 𝑥)(∑ 𝑦)]
10
80
90
(10 ∗ 60,772) − (707 ∗ 856)
=
√[(𝑛 ∑ 𝑥 2 ) − (∑ 𝑥)2 ][[(𝑛 ∑ 𝑦 2 ) − (∑ 𝑦)2 ]]
9
77
89
√((10 ∗ 50,459) − 7072 ) ∗ ((10 ∗ 73,420) − 8562 )
𝒓𝒙𝒚 = 𝟎. 𝟗𝟓𝟗𝟔
STAT 201: Statistical Methods I
Interpretation:
The rxy value obtained is 0.9596 which denotes very high positive relationship. This means the higher the NSAT
percentile rank, the higher is the scholastic rating of the BS Physics students.
53
CHAPTER IV: CORRELATIONAL ANALYSIS
TOPIC 4:
SPEARMAN RANK CORRELATION COEFFICIENT OR SPEARMAN RHO (r s)
A Spearman rho correlation of coefficient is a statistic which is used to measure the relationship of paired
ranks assigned to indicate individual scores on two variables. A correlation estimates the degree of association of
two sets of variables in at least an ordinal scale (first, second, third, and so on) so that the subjects under study
may be ranked in a two ordered series. This is commonly used to measure the disarray, ∑ 𝐷 2 , where a coefficient
of rank correlation has a value of +1 when paired ranks are in similar order and a value of -1 when paired ranks
are in the reverse order.
Spearman rho is the most widely used of the rank correlation methods. It is much easier and therefore,
faster to compute. This is for 30 cases or less only.
To obtain the Spearman rho (rs), consider the formula:
𝟔 ∑ 𝑫𝟐
𝒓𝒔 = 𝟏 − 𝟑
𝒏 −𝒏
where:
rs
= Spearman rho
∑ 𝐷2
= sum of the squared difference between ranks
n
= number of cases/measurements
Example: Consider the specific problem: “What is the rank relationship between capital and profit of light bulbs?”
Capital, x,
Profit, y
Rx
Ry
D = |Rx – Ry|
D2
1
20,000
5,000
6
7
1
1
2
50,000
15,000
3
3.5
0.5
0.25
3
10,000
3,000
9
9.5
0.5
0.25
4
100,000
30,000
2
2
0
0
5
15,000
4,000
7
8
1
1
6
25,000
9,000
5
5
0
0
7
11,000
6,000
8
6
2
4
8
150,000
70,000
1
1
0
0
9
5,000
3,000
10
9.5
0.5
0.25
10
40,000
15,000
4
3.5
0.5
0.25
TOTAL
7.0
6 ∑ 𝐷2
6∗7
=1− 3
= 𝟎. 𝟗𝟓𝟕𝟔
𝑛3 − 𝑛
10 − 10
STAT 201: Statistical Methods I
𝑟𝑠 = 1 −
54
CHAPTER IV: CORRELATIONAL ANALYSIS
TOPIC 5:
GAMMA (G)
An alternative to the rank-order correlation coefficient is the Goodman’s and Kruskal’s Gamma (G). The
value of one variable can be estimated or predicted from the other variable when you have the knowledge of their
values. The gamma can also be used when ties are found in the ranking of the data. The formula for gamma is:
𝑵𝒔 − 𝑵𝒊
𝑮=
𝑵𝒔 + 𝑵𝒊
where:
G
= the difference between the proportion of pairs ordered in the parallel direction and the
proportion off pairs ordered in the opposite direction
Ni
= the number of pairs ordered in the opposite direction
Ns
= the number of pairs in the parallel direction
Example: Compute for the gamma for the data shown below
Socio-Economic
Educational Status
Status
Upper
Middle
Lower
Upper
24
19
5
Middle
12
54
29
Lower
9
26
25
Step 2.
Arrange the ordering for one of the two characteristics from the highest to the lowest or vice
versa from top to bottom through the rows and for the other characteristics from the highest
to the lowest or vice versa from left to right through the column.
Compute Ns by multiplying the frequency in every cell by the series of the frequencies in all
of the other cells which are both to the right of the original cell below it and then sum up the
products obtained.
Ns = 24*(54 + 29 + 26 + 25) + 19*(29 + 25) + 12*(26 + 25) + 54*(25)
Ns = 6,204
Step 3.
To solve for Ni, simply reverse partially the process described in Step 2. Multiply the
frequency of every cell by the sum of the frequencies in all of the cells to the left of the
original cell below it, and then sum up the products obtained.
Ni = 19*(12 + 9) + 5*(12 + 54 + 9 + 26) + (54*9) + 29*(9 + 26)
Ni = 2,405
Step 4.
Apply gamma formula.
𝐺=
𝑁𝑠 − 𝑁𝑖 6204 − 2405
=
= 𝟎. 𝟒𝟒𝟏𝟑
𝑁𝑠 + 𝑁𝑖 6204 + 2405
STAT 201: Statistical Methods I
Solution:
Step 1.
55
CHAPTER IV: CORRELATIONAL ANALYSIS
TOPIC 6:
CORRELATION BETWEEN AN INTERVAL AND NOMINAL DATA:
THE POINT BISERIAL COEFFICIENT OF CORRELATION (rpbi)
There are instances when you are interested in getting the degree of relationship between two variables
where one variable is continuous (e.g. test scores) and the other is a dichotomous variable (e.g. gender). A question
perhaps is, “Is gender related to intelligence?” in this case, the most appropriate statistical technique if the point
biserial correlation, rpbi. The formula is:
(𝒙
̅𝟏 − 𝒙
̅𝟐 )
𝒓𝒑𝒃𝒊 =
√𝒑𝒒
𝒔𝒅𝒚
where:
𝑟𝑝𝑏𝑖
= Point biserial coefficient of correlation
̅𝟏
𝒙
= mean score of group 1
̅
𝒙𝟐
= mean score of group 2
𝑝
= proportion of group 1
𝑞
= proportion of group 2
𝑠𝑑𝑦
= standard deviation of all the scores
Example:
A researcher wishes to determine if a significant relationship exists between the sex of the worker and if they
experience pain while performing an electronics assembly task. The independent variable is the question which
asks “What is your sex, male or female?” (Dichotomous). The dependent variable is from the question that asks
“How many years have you been performing the task?” (Ratio).
Mean
Standard Deviation
1
M
10
2
M
11
3
M
6
Males
10
11
6
11
12
10.0
4
M
11
5
F
4
6
F
3
7
M
12
8
F
2
9
F
2
10
F
1
Females
4
3
2
2
1
2.4
4.37
𝑟𝑝𝑏𝑖 =
(𝑥̅1 − 𝑥̅ 2 )
(10 − 2.4) 5 5
√ ∗
= 𝟎. 𝟖𝟔𝟗𝟔
√𝑝𝑞 =
𝑠𝑑𝑦
4.37
10 10
STAT 201: Statistical Methods I
Respondent
Sex
Number of years
56
CHAPTER IV: CORRELATIONAL ANALYSIS
TOPIC 7:
CORRELATION BETWEEN NOMINAL DATA:
LAMBDA CORRELATION (𝝀𝑪 )
This is represented by the lower-case Greek letter 𝜆 which is also known as Guttman’s coefficient of
predictability. This is defined as the proportionate reduction in error measure which shows the index of how much
an error is reduced in prediction of one variable from one value of another. It is also another way of measuring to
what degree of accuracy of the prediction can be improved. If you have a lambda of 0.80, you have minimized
the error of your prediction about the values of the dependent variable by 80%, if your lambda is 0.30, you have
minimized the error of your prediction by only 30%. The lambda coefficient is a measure of association of
comparing several groups or categories at the nominal level.
Formula:
𝝀𝒄 =
𝑭𝒃𝒊 − 𝑴𝒃𝒄
𝑵 − 𝑴𝒃𝒄
where:
𝑭𝒃𝒊
= the biggest cell frequencies in the ith row (with the sum taken over all of the rows)
𝑴𝒃𝒄
= the biggest of the column totals
𝑵
= the number of observations
However, if your dependent variable is regarded as the row variable, the formula to be used is:
𝑭𝒃𝒋 − 𝑴𝒃𝒓
𝝀𝒓 =
𝑵 − 𝑴𝒃𝒓
where:
𝑭𝒃𝒋
= the biggest cell frequencies in the jth column (with the sum taken over all of the columns)
𝑴𝒃𝒓
= the biggest of the row totals
𝑵
= the number of observations
Example: Compute 𝝀𝒄 and 𝝀𝒓 for the data in the table below.
𝜆𝑐 =
𝐹𝑏𝑖 − 𝑀𝑏𝑐 (49 + 72 + 26) − 122
=
= 𝟎. 𝟏𝟒𝟖𝟖
𝑁 − 𝑀𝑏𝑐
290 − 122
𝜆𝑟 =
𝐹𝑏𝑗 − 𝑀𝑏𝑟 (49 + 72 + 21) − 127
=
= 𝟎. 𝟎𝟗𝟐𝟎
𝑁 − 𝑀𝑏𝑟
290 − 127
TOTAL
92
127
71
290
STAT 201: Statistical Methods I
A Segment of the Filipino Electorate according to Religion and Political Party
Political Party
Religion
PPC
LDP
Independent
Catholic
49
25
18
Iglesia ni Cristo
34
72
21
Protestant
26
25
20
TOTAL
109
122
59
57
CHAPTER IV: CORRELATIONAL ANALYSIS
TOPIC 8:
CHI-SQUARE DISTRIBUTION, 𝝌𝟐
Chi-square distribution was discovered by Karl Pearson. The distribution was introduced to determine
whether or not discrepancies between observed and theoretical counts were significant. The test used to find out
how well an observed frequency distribution conforms to or fits some theoretical frequency distribution is referred
to as a “goodness of fit test”.
Also, chi-square distribution can be used to test the normality of any distribution. Testing a hypothesis
made about several population proportions are sometimes considered. In this section, a discussion for testing the
normality with the use of chi-square is being emphasized.
On the other hand, tables representing rows and columns are often called contingency tables. This
particular topic is equally important. It helps us determine whether the two classifications of variances are
independent. The value of chi-square varies from each number of degrees of freedom, one of the assumptions that
apply for a contingency table is to have 5 expected frequencies for every one of the X categories.
USES OF CHI-SQUARE
1. Chi-square is used in descriptive research if the researcher wants to determine the significant difference
between the observed and the expected or theoretical frequencies from independent variables.
2. It is used to test the goodness of fit where a theoretical distribution is fitted to some data, i.e., the fitting
of a normal curve.
3. It is used to test the hypothesis that the variances of a normal population are equal to a given value.
4. It is also used for the construction of confidence interval for variances.
5. It is used to compare two uncorrelated and correlated proportions.
DEGREES OF FREEDOM FOR THE CHI-SQUARE
The degree of freedom involved in the one-variable chi-square is determined by this formula:
𝒅𝒇 = 𝒌 − 𝟏, where k is the number of categories. On the other hand, the degrees of freedom to use in the twovariable chi-square is determined by the formula df = (𝒓 − 𝟏) ∗ (𝒄 − 𝟏), where r is the number of rows and c is
the number of columns.
Using the degree of freedom, we can use the table of chi-square values in order to compare our obtained
𝝌𝟐 value. If our computed 𝝌𝟐 is equal or greater than the table value, in the degree of freedom required
and the probability level chosen, our chi-square value is significant and the null hypothesis earlier set is
rejected.
TESTING GOODNESS OF FIT
Testing goodness of fit can be used to test how well an observed frequency distribution fits to some
theoretical frequency distribution. For example,
Suppose we want to test the claim that the fatal accidents occur at the different widths of the
road.
Number of Accidents
4.0 to 4.5 m
95
𝝌𝟐 =
Width of the Road
4.6 to 5.0 m
5.1 to 5.5 m
90
83
(𝑶 − 𝑬)𝟐
𝑬
where
𝜒2 = chi-square
O = observed frequency
E = expected frequency
Observed frequency
Expected frequency
95
85.25
90
85.25
83
85.25
73
85.25
5.6 to 6.0 m
73
STAT 201: Statistical Methods I
Example 1:
58
CHAPTER IV: CORRELATIONAL ANALYSIS
Ho:
Ha:
Fatal accidents do not occur at the different widths of the road.
Fatal accidents occur at the different widths of the road
𝜒2 = ∑
(𝑂 − 𝐸)2 (95 − 85.25)2 (90 − 85.25)2 (83 − 85.25)2 (73 − 85.25)2
=
+
+
+
𝐸
85.25
85.25
85.25
85.25
𝝌𝟐 = 𝟑. 𝟏𝟗𝟗𝟒
The tabular value of 𝜒2 at 0.05 level of significance with degrees of freedom of 𝑑𝑓 = 𝑘 − 1 = 4 − 1 = 3 is 7.815.
Since the computed value is less than the critical value of 𝜒2 , the null hypothesis is not rejected.
Thus, we can say that fatal accidents do not occur at the different widths of the road.
Example 2:
Students from MMSU claim that among the four most popular flavors of ice cream, students
have these preference rates: 58% prefer Double Dutch, 25% prefer Rocky Road, 12% prefer
chocolate mocha and 5% prefer vanilla. A random sample of 300 students was chosen. Test the
claim that the percentages given by the students are correct. Use 0.01 significance level.
Flavor
Number of Students
Double Dutch
123
Rocky Road
72
Chocolate Mocha
55
Vanilla
50
Solution:
Ho: The claim of the students is correct, that is P1=0.58 and P2=0.25 and P3=0.12 and P4=0.05.
Ha: At least one of the proportions is not equal to the value claimed.
Observed frequency
Preference Rate
Expected frequency
𝜒2 = ∑
Double
Dutch
123
58%
71.34
Rocky
Road
72
25%
18
Chocolate
Mocha
55
12%
6.6
Vanilla
50
5%
2.5
(𝑂 − 𝐸)2 (123 − 71.34)2 (72 − 18)2 (55 − 6.6)2 (50 − 2.5)2
=
+
+
+
𝐸
71.34
18
6.6
2.5
𝟐
𝝌 = 𝟏, 𝟒𝟓𝟔. 𝟖𝟒
The critical value of 𝜒2 at 0.01 level of significance with degrees of freedom of 𝑑𝑓 = 𝑘 − 1 = 4 − 1 = 3 is
11.345.
Since the computed value is greater than the critical value of 𝜒2 , the null hypothesis is rejected.
TESTING THE NORMALITY
Many statistical tests require normality in the distribution. Chi-square is one of the tests that can be used
to determine if the distribution is normal. Summary of the steps on how to apply the Chi-square for testing the
normality are listed below:
Step 1:
Step 2:
Step 3:
Step 4:
Step 5:
Step 6:
Use the mean and the standard deviation of the sample to estimate the mean and the standard
deviation of the population if not known or assumed.
Group the sample data into class intervals or categories.
Calculate for the z-values for the class boundaries.
Determine the area under the standard normal curve between z-values to obtain the hypothesis
proportion of the sample in each class.
Multiply each proportion by the total number of observations to obtain FE.
Compute for the 𝝌𝟐 .
STAT 201: Statistical Methods I
Thus, we can say that at least one of the proportions is not equal to the value claimed.
59
CHAPTER IV: CORRELATIONAL ANALYSIS
Remarks:
1. The hypothesis being tested is that the sample came from a population that has a normal distribution.
2. The degrees of freedom for the chi-square test is 𝑘 − 1 − 𝑚, where k is the number of classes and m is
the number of population parameters estimated. If the sample mean and the standard deviation have been
used to estimate the population mean and the standard deviation, then 𝑚 = 2; thus, the
𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 (𝑑𝑓) = 𝑘 − 3.
CONTINGENCY TABLES
In contingency tables, we intend to test that the row variable is independent of the column variable.
Computation for expected frequency for the contingency table is different from the one in the goodness of fit. The
expected frequency E can be computed with the use of this formula:
𝑹𝒐𝒘 𝑻𝒐𝒕𝒂𝒍 ∗ 𝑪𝒐𝒍𝒖𝒎𝒏 𝑻𝒐𝒕𝒂𝒍
𝑬=
𝑮𝒓𝒂𝒏𝒅 𝑻𝒐𝒕𝒂𝒍
Teenagers and Young adults have their own style of studying. Some prefer to study with music; others
do not. A group of psychologists conducted a study to determine the particular age of the students who like
studying with music. At the 0.01 level of significance, test the claim that style of studying is independent of the
listed age groups. The table below summarizes the information.
Study Habit
9-12
89
28
With Music
Without Music
Age Groups
13-16
17-20
75
63
20
34
21-24
52
39
The following website may be used for Chi-Square Test:
http://www.socscistatistics.com/tests/chisquare2/Default2.aspx
Contingency Table:
Study Habit
With Music
Without Music
Column Totals
9-12
89 (81.61)
[0.67]
28 (35.39)
[1.54]
117
Age Groups
13-16
17-20
75 (66.26)
63 (67.66)
[1.15]
[0.32]
20 (28.74)
34 (29.34)
[2.66]
[0.74]
95
97
21-24
52 (63.47)
[2.07]
39 (27.53)
[4.78]
91
Row
Totals
279
121
400
The critical value of 𝜒2 at 0.01 level of significance with degrees of freedom of 𝑑𝑓 = (𝑟 − 1) ∗ (𝑐 − 1) =
(2 − 1) ∗ (4 − 1) = 3 is 11.345.
STAT 201: Statistical Methods I
Interpretation:
At the 0.01 significance level, the tabulated 𝝌𝟐 = 𝟏𝟑. 𝟗𝟑𝟕𝟑 and the obtained value lies within the rejection
region. Therefore, there is a sufficient evidence to reject the null hypothesis. The result further implies that the
type of study habit has something to do with age.
60
CHAPTER IV: CORRELATIONAL ANALYSIS
ONE-WAY CLASSIFICATION
Chi-square in one way of classification in applicable when the researcher is interested in determining the
number of subjects, objects or responses which fall in various categories.
Example:
The subjects are 30 women and 30 men, or a total of 60 subjects in all. When asked “Can divorce be applied in
the Philippines?” of the 30 women, 9 answered yes, 12, no; and 9, undecided, and of the 30 men, 15 answered
yes; 2, no; and 13, undecided. Test the significant difference in their responses.
Responses
Yes
No
Undecided
Column Totals
Sex
Women
9 (12.00) [0.75]
12 (7.00) [3.57]
9 (11.00) [0.36]
30
Men
15 (12.00) [0.75]
2 (7.00) [3.57]
13 (11.00) [0.36]
30
Row Totals
24
14
22
60
The critical value of 𝜒2 at 0.05 level of significance with degrees of freedom of 𝑑𝑓 = (𝑟 − 1) ∗ (𝑐 − 1) =
(3 − 1) ∗ (2 − 1) = 2 is 5.991.
Interpretation:
At the 0.05 significance level, the tabulated 𝝌𝟐 = 𝟗. 𝟑𝟕𝟎𝟏 and the obtained value lies within the rejection region.
Therefore, there is a sufficient evidence to reject the null hypothesis. The result further implies that the response
to the survey question has something to do with sex.
INDEPENDENCE IN A 2X2 TABLE
Independence in a 2x2 table chi-square or fourfold table involves two variables to test if these variables
are independent form each other. These values are usually nominal. These values are arranged in the form of a
2x2 table which is composed of two rows (R) and two columns (C).
Example:
The frequencies shown in the table below are observed frequencies. The specific question is “Is there a significant
difference in the job performance of mentors who failed and mentors who passed the teacher’s licensure
examination?” Of the 100 subjects, 20 failed but with satisfactory job performance; 40 passed with satisfactory
job performance; 25 failed with unsatisfactory job performance; and 15 passed with unsatisfactory job
performance. Test the significant difference existing in the foregoing data.
Job Performance
Satisfactory
Unsatisfactory
Total
Ha:
There is no significant difference in the job performance of mentors who failed and mentors who
passed the teacher’s licensure examination.
There is a significant difference in the job performance of mentors who failed and mentors who
passed the teacher’s licensure examination.
Ho:
Job Performance
Satisfactory
Unsatisfactory
Total
Teachers Licensure Examination
Failed
Passed
Total
20 (27.00)
40 (33.00)
60
[1.81]
[1.48]
25 (18.00)
15 (22.00)
40
[2.72]
[2.23]
45
55
100
STAT 201: Statistical Methods I
Ho:
Teachers Licensure Examination
Failed
Passed
Total
20
40
60
25
15
40
45
55
100
61
CHAPTER IV: CORRELATIONAL ANALYSIS
The critical value of 𝜒2 at 0.05 level of significance with degrees of freedom of 𝑑𝑓 = (𝑟 − 1) ∗ (𝑐 − 1) =
(2 − 1) ∗ (2 − 1) = 1 is 3.841.
Interpretation:
At the 0.05 significance level, the tabulated 𝝌𝟐 = 𝟖. 𝟐𝟒𝟗𝟐 and the obtained value lies within the rejection region.
Therefore, there is a sufficient evidence to reject the null hypothesis. The result further implies that there is a
significant difference in the job performance of mentors who failed and mentors who passed the teacher’s
licensure examination.
ASSESSMENT
Login to mVLE portal to access the assessment for Chapter IV.
REFERENCES:
•
•
D.C. Montgomery and G.C. Runger, Applied Statistics and Probability for Engineers, 5th Edition, John
Wiley & Sons, Inc., 2011.
R.E. Walpole. R.H. Myers, S.L. Myers and K. Ye, Probability and Statistics for Engineers and
Scientists, 9th Edition, Pearson International Edition, 2012.
Zulueta, F. M. and Nestor Edilberto B. Costales, Jr. (2005). Methods of Research: Thesis Writing and
Applied Statistics. Mandaluyong City: National Bookstore, Inc.
STAT 201: Statistical Methods I
•
62
Download