CHAPTER IV: CORRELATIONAL ANALYSIS Topic Outline: 1. 2. 3. 4. 5. 6. 7. 8. Introduction Hypothesis Testing for Correlation Pearson Product-Moment Correlation (Pearson r) Spearman Rank Correlation (Spearman rho, ) Gamma Correlation (G) Point-Biserial Correlation (rpb) Lambda Correlation () Chi-Square (2) Tests Learning Outcomes: At the end of the unit, the students must have: 1. discussed the conditions imposed by each measure of relationship/associations; 2. computed and interpreted each measure of relationship; 3. performed hypothesis testing involving each of the measures of relationship; and 4. differentiated multiple from partial correlation. STAT 201: Statistical Methods I Prepared by: Prof. Jeanne Valerie Agbayani-Agpaoa Dr. Virgilio Julius P. Manzano, Jr. Engr. Lawrence John C. Tagata CHAPTER IV: CORRELATIONAL ANALYSIS TOPIC 1: INTRODUCTION Measure of correlation or relationship is used to find the amount and degree of relationship or the absence of relationship between two sets of values, characteristics or variables. This relationship is expressed by a factor called Coefficient of Correlation. It may be expressed as an abstract number. It is the ratio of two values, or series of values, or variables being compared. It can also be expressed in percent. Correlation is a measure of degree of relationship between paired data. All statistical research aim to establish relationship between paired variables to enable the researcher to predict one variable in terms of the other variable. For example, grades in Science and English tend to be related to high grades in Mathematics. However, in some instances, there may be weak or none at all, such as the bulk of sales of candy tend to be unrelated to the rate of crime in a particular place. It must be remembered that correlation does not determine the cause and effect of the relationship, but rather it merely focuses on the strength of the relationship between paired data. Simple correlation is amenable to either ungrouped or grouped data, for nominal, ordinal, or interval scales of data. Usually, however, rank correlation is aptly applied to ordinal data when the number of items or cases is rather small (less than 30). The term correlation refers to the association which occurs between two or more statistical series of values. The coefficient of correlation which represents correlation values shows the extent to which two variables are related and to what extent variations in one group of data go the variations in the other. Coefficient of correlation is a single number that tells us to what extent two values are related. It can vary from 1.00 which means perfect positive correlation through 0, which means no correlation at all, and -1.00 which means perfect negative correlation. Perfect correlation refers to direct relationship between any two sets of data in that any increase in the values of the first set of data will correspondingly generate in a corresponding increase or decrease in the second set of data, respectively. When correlation is negative, an inverse behaviour of data is observed, that is, a decrease in values of a first set of data will result in an increase in the second set being compared or vice versa. When there is a minimal or even zero change at one time or another between two sets of data being correlated, there is little or no correlation at all. The coefficient of correlation does not give directly anything like a percentage of relationship. It cannot be concluded that a correlation value of 0.50 indicates twice the relationship that is indicated by a correlation value of 0.25. A coefficient of relationship is an index number, not a measurement on an interval scale. Moreover, we cannot compute a coefficient of correlation from just two measurements on one person alone. STAT 201: Statistical Methods I Coefficient of correlation has some uses which are as follows: 1. It indicates the amount of agreement between scores on any two sets of data. It is an index of the predictive value of a test. 2. It is a form or reliability coefficient which can be obtained by correlating scores of two alternatives or parallel forms of the same test. 3. The correlation value is always relative to the situation under which it is obtained and should be interpreted in the light of those circumstances. Its size does not represent absolute natural facts. 51 CHAPTER IV: CORRELATIONAL ANALYSIS Below is a guide in interpreting Coefficient of Correlation: -1.00 Perfect Negative Correlation -0.99 to -0.75 Very High Negative Correlation -0.74 to -0.50 High Negative Correlation -0.49 to -0.25 Moderately Small Negative Correlation -0.24 to -0.01 Very Small Negative Correlation 0 No Correlation +0.01 to +0.24 Very Small Positive Correlation +0.25 to +0.49 Moderately Small Positive Correlation +0.50 to +0.74 High Positive Correlation +0.74 to +0.99 Very High Positive Correlation +1.00 Perfect Positive Correlation Anybody who wants to interpret the result of the coefficient of correlation should be guided by the following reminders: 1. The relationship of two variables does not necessarily mean that one is the cause or the effect of the other variable. It does not imply cause-effect relationship. 2. When the computed r is high, it does not necessarily mean that one factor is strongly dependent on the other. This is shown by height and intelligence of people. Making a correlation here does not make any sense at all. On the other hand, when the computed r is small, it does not necessarily mean that one factor has no dependence on the other factor. This may be applicable to IQ and grades in school. A low grade would suggest that a student did not make use of his time in studying. 3. If there is a reason to believe that the two variables are related and the computed r is high, these two variables are really meant to be associated. On the other hand, if the variables correlated are low (though theoretically related), other factors might be responsible for such small associations. 4. Lastly, the meaning of correlation coefficient simply informs us that when two variables change, there may be a strong or weak relationship taking place. TOPIC 2: HYPOTHESIS TESTING FOR CORRELATIONS It is often useful to test the hypotheses Ho: 𝜌 = 0. (There is no significant relationship) Ha: 𝜌 ≠ 0. (There is a significant relationship) 𝑻𝟎 = 𝑹√𝒏 − 𝟐 √𝟏 − 𝑹𝟐 which has the t distribution with 𝑛 − 2 degrees of freedom if Ho: 𝜌 = 0 is true. Therefore, we would reject the null hypothesis if |𝒕𝟎 | > 𝒕𝜶,𝒏−𝟐 𝟐 STAT 201: Statistical Methods I Test Statistic for Zero Correlation 52 CHAPTER IV: CORRELATIONAL ANALYSIS TOPIC 3: PEARSON PRODUCT MOMENT COEFFICIENT OF CORRELATION This is a linear correlation necessary to find the degree of the association of two sets of variables, x and y. this is the most commonly used measure of correlation to determine the relationship between two sets of variables quantitatively. For any two variables, x and y, the correlation coefficient between them can be determined using Pearson Product Moment Coefficient of Correlation: 𝒏 ∑ 𝒙𝒚 − [(∑ 𝒙)(∑ 𝒚)] 𝒓𝒙𝒚 = √[(𝒏 ∑ 𝒙𝟐 ) − (∑ 𝒙)𝟐 ][[(𝒏 ∑ 𝒚𝟐 ) − (∑ 𝒚)𝟐 ]] Consider the values of x and y on the descriptive problem, “What is the relationship between the NSAT percentile rank and the scholastic rating of BS Physics students in selected universities and colleges in a certain region? Example: Student NSAT Percentile Rank, x Scholastic Rating, y Student 1 2 3 4 5 6 7 8 9 10 Totals 𝑟𝑥𝑦 = NSAT Percentile Rank, x 60 73 61 70 75 79 65 67 77 80 707 1 60 78 2 73 87 3 61 80 4 70 86 5 75 87 6 79 90 7 65 85 8 67 84 Scholastic Rating, y x2 y2 xy 78 87 80 86 87 90 85 84 89 90 856 3,600 5,329 3,721 4,900 5,625 6,241 4,225 4,489 5,929 6,400 50,459 6,084 7,569 6,400 7,396 7,569 8,100 7,225 7,056 7,921 8,100 73,420 4,680 6,351 4,880 6,020 6,525 7,110 5,525 5,628 6,853 7,200 60,772 𝑛 ∑ 𝑥𝑦 − [(∑ 𝑥)(∑ 𝑦)] 10 80 90 (10 ∗ 60,772) − (707 ∗ 856) = √[(𝑛 ∑ 𝑥 2 ) − (∑ 𝑥)2 ][[(𝑛 ∑ 𝑦 2 ) − (∑ 𝑦)2 ]] 9 77 89 √((10 ∗ 50,459) − 7072 ) ∗ ((10 ∗ 73,420) − 8562 ) 𝒓𝒙𝒚 = 𝟎. 𝟗𝟓𝟗𝟔 STAT 201: Statistical Methods I Interpretation: The rxy value obtained is 0.9596 which denotes very high positive relationship. This means the higher the NSAT percentile rank, the higher is the scholastic rating of the BS Physics students. 53 CHAPTER IV: CORRELATIONAL ANALYSIS TOPIC 4: SPEARMAN RANK CORRELATION COEFFICIENT OR SPEARMAN RHO (r s) A Spearman rho correlation of coefficient is a statistic which is used to measure the relationship of paired ranks assigned to indicate individual scores on two variables. A correlation estimates the degree of association of two sets of variables in at least an ordinal scale (first, second, third, and so on) so that the subjects under study may be ranked in a two ordered series. This is commonly used to measure the disarray, ∑ 𝐷 2 , where a coefficient of rank correlation has a value of +1 when paired ranks are in similar order and a value of -1 when paired ranks are in the reverse order. Spearman rho is the most widely used of the rank correlation methods. It is much easier and therefore, faster to compute. This is for 30 cases or less only. To obtain the Spearman rho (rs), consider the formula: 𝟔 ∑ 𝑫𝟐 𝒓𝒔 = 𝟏 − 𝟑 𝒏 −𝒏 where: rs = Spearman rho ∑ 𝐷2 = sum of the squared difference between ranks n = number of cases/measurements Example: Consider the specific problem: “What is the rank relationship between capital and profit of light bulbs?” Capital, x, Profit, y Rx Ry D = |Rx – Ry| D2 1 20,000 5,000 6 7 1 1 2 50,000 15,000 3 3.5 0.5 0.25 3 10,000 3,000 9 9.5 0.5 0.25 4 100,000 30,000 2 2 0 0 5 15,000 4,000 7 8 1 1 6 25,000 9,000 5 5 0 0 7 11,000 6,000 8 6 2 4 8 150,000 70,000 1 1 0 0 9 5,000 3,000 10 9.5 0.5 0.25 10 40,000 15,000 4 3.5 0.5 0.25 TOTAL 7.0 6 ∑ 𝐷2 6∗7 =1− 3 = 𝟎. 𝟗𝟓𝟕𝟔 𝑛3 − 𝑛 10 − 10 STAT 201: Statistical Methods I 𝑟𝑠 = 1 − 54 CHAPTER IV: CORRELATIONAL ANALYSIS TOPIC 5: GAMMA (G) An alternative to the rank-order correlation coefficient is the Goodman’s and Kruskal’s Gamma (G). The value of one variable can be estimated or predicted from the other variable when you have the knowledge of their values. The gamma can also be used when ties are found in the ranking of the data. The formula for gamma is: 𝑵𝒔 − 𝑵𝒊 𝑮= 𝑵𝒔 + 𝑵𝒊 where: G = the difference between the proportion of pairs ordered in the parallel direction and the proportion off pairs ordered in the opposite direction Ni = the number of pairs ordered in the opposite direction Ns = the number of pairs in the parallel direction Example: Compute for the gamma for the data shown below Socio-Economic Educational Status Status Upper Middle Lower Upper 24 19 5 Middle 12 54 29 Lower 9 26 25 Step 2. Arrange the ordering for one of the two characteristics from the highest to the lowest or vice versa from top to bottom through the rows and for the other characteristics from the highest to the lowest or vice versa from left to right through the column. Compute Ns by multiplying the frequency in every cell by the series of the frequencies in all of the other cells which are both to the right of the original cell below it and then sum up the products obtained. Ns = 24*(54 + 29 + 26 + 25) + 19*(29 + 25) + 12*(26 + 25) + 54*(25) Ns = 6,204 Step 3. To solve for Ni, simply reverse partially the process described in Step 2. Multiply the frequency of every cell by the sum of the frequencies in all of the cells to the left of the original cell below it, and then sum up the products obtained. Ni = 19*(12 + 9) + 5*(12 + 54 + 9 + 26) + (54*9) + 29*(9 + 26) Ni = 2,405 Step 4. Apply gamma formula. 𝐺= 𝑁𝑠 − 𝑁𝑖 6204 − 2405 = = 𝟎. 𝟒𝟒𝟏𝟑 𝑁𝑠 + 𝑁𝑖 6204 + 2405 STAT 201: Statistical Methods I Solution: Step 1. 55 CHAPTER IV: CORRELATIONAL ANALYSIS TOPIC 6: CORRELATION BETWEEN AN INTERVAL AND NOMINAL DATA: THE POINT BISERIAL COEFFICIENT OF CORRELATION (rpbi) There are instances when you are interested in getting the degree of relationship between two variables where one variable is continuous (e.g. test scores) and the other is a dichotomous variable (e.g. gender). A question perhaps is, “Is gender related to intelligence?” in this case, the most appropriate statistical technique if the point biserial correlation, rpbi. The formula is: (𝒙 ̅𝟏 − 𝒙 ̅𝟐 ) 𝒓𝒑𝒃𝒊 = √𝒑𝒒 𝒔𝒅𝒚 where: 𝑟𝑝𝑏𝑖 = Point biserial coefficient of correlation ̅𝟏 𝒙 = mean score of group 1 ̅ 𝒙𝟐 = mean score of group 2 𝑝 = proportion of group 1 𝑞 = proportion of group 2 𝑠𝑑𝑦 = standard deviation of all the scores Example: A researcher wishes to determine if a significant relationship exists between the sex of the worker and if they experience pain while performing an electronics assembly task. The independent variable is the question which asks “What is your sex, male or female?” (Dichotomous). The dependent variable is from the question that asks “How many years have you been performing the task?” (Ratio). Mean Standard Deviation 1 M 10 2 M 11 3 M 6 Males 10 11 6 11 12 10.0 4 M 11 5 F 4 6 F 3 7 M 12 8 F 2 9 F 2 10 F 1 Females 4 3 2 2 1 2.4 4.37 𝑟𝑝𝑏𝑖 = (𝑥̅1 − 𝑥̅ 2 ) (10 − 2.4) 5 5 √ ∗ = 𝟎. 𝟖𝟔𝟗𝟔 √𝑝𝑞 = 𝑠𝑑𝑦 4.37 10 10 STAT 201: Statistical Methods I Respondent Sex Number of years 56 CHAPTER IV: CORRELATIONAL ANALYSIS TOPIC 7: CORRELATION BETWEEN NOMINAL DATA: LAMBDA CORRELATION (𝝀𝑪 ) This is represented by the lower-case Greek letter 𝜆 which is also known as Guttman’s coefficient of predictability. This is defined as the proportionate reduction in error measure which shows the index of how much an error is reduced in prediction of one variable from one value of another. It is also another way of measuring to what degree of accuracy of the prediction can be improved. If you have a lambda of 0.80, you have minimized the error of your prediction about the values of the dependent variable by 80%, if your lambda is 0.30, you have minimized the error of your prediction by only 30%. The lambda coefficient is a measure of association of comparing several groups or categories at the nominal level. Formula: 𝝀𝒄 = 𝑭𝒃𝒊 − 𝑴𝒃𝒄 𝑵 − 𝑴𝒃𝒄 where: 𝑭𝒃𝒊 = the biggest cell frequencies in the ith row (with the sum taken over all of the rows) 𝑴𝒃𝒄 = the biggest of the column totals 𝑵 = the number of observations However, if your dependent variable is regarded as the row variable, the formula to be used is: 𝑭𝒃𝒋 − 𝑴𝒃𝒓 𝝀𝒓 = 𝑵 − 𝑴𝒃𝒓 where: 𝑭𝒃𝒋 = the biggest cell frequencies in the jth column (with the sum taken over all of the columns) 𝑴𝒃𝒓 = the biggest of the row totals 𝑵 = the number of observations Example: Compute 𝝀𝒄 and 𝝀𝒓 for the data in the table below. 𝜆𝑐 = 𝐹𝑏𝑖 − 𝑀𝑏𝑐 (49 + 72 + 26) − 122 = = 𝟎. 𝟏𝟒𝟖𝟖 𝑁 − 𝑀𝑏𝑐 290 − 122 𝜆𝑟 = 𝐹𝑏𝑗 − 𝑀𝑏𝑟 (49 + 72 + 21) − 127 = = 𝟎. 𝟎𝟗𝟐𝟎 𝑁 − 𝑀𝑏𝑟 290 − 127 TOTAL 92 127 71 290 STAT 201: Statistical Methods I A Segment of the Filipino Electorate according to Religion and Political Party Political Party Religion PPC LDP Independent Catholic 49 25 18 Iglesia ni Cristo 34 72 21 Protestant 26 25 20 TOTAL 109 122 59 57 CHAPTER IV: CORRELATIONAL ANALYSIS TOPIC 8: CHI-SQUARE DISTRIBUTION, 𝝌𝟐 Chi-square distribution was discovered by Karl Pearson. The distribution was introduced to determine whether or not discrepancies between observed and theoretical counts were significant. The test used to find out how well an observed frequency distribution conforms to or fits some theoretical frequency distribution is referred to as a “goodness of fit test”. Also, chi-square distribution can be used to test the normality of any distribution. Testing a hypothesis made about several population proportions are sometimes considered. In this section, a discussion for testing the normality with the use of chi-square is being emphasized. On the other hand, tables representing rows and columns are often called contingency tables. This particular topic is equally important. It helps us determine whether the two classifications of variances are independent. The value of chi-square varies from each number of degrees of freedom, one of the assumptions that apply for a contingency table is to have 5 expected frequencies for every one of the X categories. USES OF CHI-SQUARE 1. Chi-square is used in descriptive research if the researcher wants to determine the significant difference between the observed and the expected or theoretical frequencies from independent variables. 2. It is used to test the goodness of fit where a theoretical distribution is fitted to some data, i.e., the fitting of a normal curve. 3. It is used to test the hypothesis that the variances of a normal population are equal to a given value. 4. It is also used for the construction of confidence interval for variances. 5. It is used to compare two uncorrelated and correlated proportions. DEGREES OF FREEDOM FOR THE CHI-SQUARE The degree of freedom involved in the one-variable chi-square is determined by this formula: 𝒅𝒇 = 𝒌 − 𝟏, where k is the number of categories. On the other hand, the degrees of freedom to use in the twovariable chi-square is determined by the formula df = (𝒓 − 𝟏) ∗ (𝒄 − 𝟏), where r is the number of rows and c is the number of columns. Using the degree of freedom, we can use the table of chi-square values in order to compare our obtained 𝝌𝟐 value. If our computed 𝝌𝟐 is equal or greater than the table value, in the degree of freedom required and the probability level chosen, our chi-square value is significant and the null hypothesis earlier set is rejected. TESTING GOODNESS OF FIT Testing goodness of fit can be used to test how well an observed frequency distribution fits to some theoretical frequency distribution. For example, Suppose we want to test the claim that the fatal accidents occur at the different widths of the road. Number of Accidents 4.0 to 4.5 m 95 𝝌𝟐 = Width of the Road 4.6 to 5.0 m 5.1 to 5.5 m 90 83 (𝑶 − 𝑬)𝟐 𝑬 where 𝜒2 = chi-square O = observed frequency E = expected frequency Observed frequency Expected frequency 95 85.25 90 85.25 83 85.25 73 85.25 5.6 to 6.0 m 73 STAT 201: Statistical Methods I Example 1: 58 CHAPTER IV: CORRELATIONAL ANALYSIS Ho: Ha: Fatal accidents do not occur at the different widths of the road. Fatal accidents occur at the different widths of the road 𝜒2 = ∑ (𝑂 − 𝐸)2 (95 − 85.25)2 (90 − 85.25)2 (83 − 85.25)2 (73 − 85.25)2 = + + + 𝐸 85.25 85.25 85.25 85.25 𝝌𝟐 = 𝟑. 𝟏𝟗𝟗𝟒 The tabular value of 𝜒2 at 0.05 level of significance with degrees of freedom of 𝑑𝑓 = 𝑘 − 1 = 4 − 1 = 3 is 7.815. Since the computed value is less than the critical value of 𝜒2 , the null hypothesis is not rejected. Thus, we can say that fatal accidents do not occur at the different widths of the road. Example 2: Students from MMSU claim that among the four most popular flavors of ice cream, students have these preference rates: 58% prefer Double Dutch, 25% prefer Rocky Road, 12% prefer chocolate mocha and 5% prefer vanilla. A random sample of 300 students was chosen. Test the claim that the percentages given by the students are correct. Use 0.01 significance level. Flavor Number of Students Double Dutch 123 Rocky Road 72 Chocolate Mocha 55 Vanilla 50 Solution: Ho: The claim of the students is correct, that is P1=0.58 and P2=0.25 and P3=0.12 and P4=0.05. Ha: At least one of the proportions is not equal to the value claimed. Observed frequency Preference Rate Expected frequency 𝜒2 = ∑ Double Dutch 123 58% 71.34 Rocky Road 72 25% 18 Chocolate Mocha 55 12% 6.6 Vanilla 50 5% 2.5 (𝑂 − 𝐸)2 (123 − 71.34)2 (72 − 18)2 (55 − 6.6)2 (50 − 2.5)2 = + + + 𝐸 71.34 18 6.6 2.5 𝟐 𝝌 = 𝟏, 𝟒𝟓𝟔. 𝟖𝟒 The critical value of 𝜒2 at 0.01 level of significance with degrees of freedom of 𝑑𝑓 = 𝑘 − 1 = 4 − 1 = 3 is 11.345. Since the computed value is greater than the critical value of 𝜒2 , the null hypothesis is rejected. TESTING THE NORMALITY Many statistical tests require normality in the distribution. Chi-square is one of the tests that can be used to determine if the distribution is normal. Summary of the steps on how to apply the Chi-square for testing the normality are listed below: Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Use the mean and the standard deviation of the sample to estimate the mean and the standard deviation of the population if not known or assumed. Group the sample data into class intervals or categories. Calculate for the z-values for the class boundaries. Determine the area under the standard normal curve between z-values to obtain the hypothesis proportion of the sample in each class. Multiply each proportion by the total number of observations to obtain FE. Compute for the 𝝌𝟐 . STAT 201: Statistical Methods I Thus, we can say that at least one of the proportions is not equal to the value claimed. 59 CHAPTER IV: CORRELATIONAL ANALYSIS Remarks: 1. The hypothesis being tested is that the sample came from a population that has a normal distribution. 2. The degrees of freedom for the chi-square test is 𝑘 − 1 − 𝑚, where k is the number of classes and m is the number of population parameters estimated. If the sample mean and the standard deviation have been used to estimate the population mean and the standard deviation, then 𝑚 = 2; thus, the 𝑑𝑒𝑔𝑟𝑒𝑒𝑠 𝑜𝑓 𝑓𝑟𝑒𝑒𝑑𝑜𝑚 (𝑑𝑓) = 𝑘 − 3. CONTINGENCY TABLES In contingency tables, we intend to test that the row variable is independent of the column variable. Computation for expected frequency for the contingency table is different from the one in the goodness of fit. The expected frequency E can be computed with the use of this formula: 𝑹𝒐𝒘 𝑻𝒐𝒕𝒂𝒍 ∗ 𝑪𝒐𝒍𝒖𝒎𝒏 𝑻𝒐𝒕𝒂𝒍 𝑬= 𝑮𝒓𝒂𝒏𝒅 𝑻𝒐𝒕𝒂𝒍 Teenagers and Young adults have their own style of studying. Some prefer to study with music; others do not. A group of psychologists conducted a study to determine the particular age of the students who like studying with music. At the 0.01 level of significance, test the claim that style of studying is independent of the listed age groups. The table below summarizes the information. Study Habit 9-12 89 28 With Music Without Music Age Groups 13-16 17-20 75 63 20 34 21-24 52 39 The following website may be used for Chi-Square Test: http://www.socscistatistics.com/tests/chisquare2/Default2.aspx Contingency Table: Study Habit With Music Without Music Column Totals 9-12 89 (81.61) [0.67] 28 (35.39) [1.54] 117 Age Groups 13-16 17-20 75 (66.26) 63 (67.66) [1.15] [0.32] 20 (28.74) 34 (29.34) [2.66] [0.74] 95 97 21-24 52 (63.47) [2.07] 39 (27.53) [4.78] 91 Row Totals 279 121 400 The critical value of 𝜒2 at 0.01 level of significance with degrees of freedom of 𝑑𝑓 = (𝑟 − 1) ∗ (𝑐 − 1) = (2 − 1) ∗ (4 − 1) = 3 is 11.345. STAT 201: Statistical Methods I Interpretation: At the 0.01 significance level, the tabulated 𝝌𝟐 = 𝟏𝟑. 𝟗𝟑𝟕𝟑 and the obtained value lies within the rejection region. Therefore, there is a sufficient evidence to reject the null hypothesis. The result further implies that the type of study habit has something to do with age. 60 CHAPTER IV: CORRELATIONAL ANALYSIS ONE-WAY CLASSIFICATION Chi-square in one way of classification in applicable when the researcher is interested in determining the number of subjects, objects or responses which fall in various categories. Example: The subjects are 30 women and 30 men, or a total of 60 subjects in all. When asked “Can divorce be applied in the Philippines?” of the 30 women, 9 answered yes, 12, no; and 9, undecided, and of the 30 men, 15 answered yes; 2, no; and 13, undecided. Test the significant difference in their responses. Responses Yes No Undecided Column Totals Sex Women 9 (12.00) [0.75] 12 (7.00) [3.57] 9 (11.00) [0.36] 30 Men 15 (12.00) [0.75] 2 (7.00) [3.57] 13 (11.00) [0.36] 30 Row Totals 24 14 22 60 The critical value of 𝜒2 at 0.05 level of significance with degrees of freedom of 𝑑𝑓 = (𝑟 − 1) ∗ (𝑐 − 1) = (3 − 1) ∗ (2 − 1) = 2 is 5.991. Interpretation: At the 0.05 significance level, the tabulated 𝝌𝟐 = 𝟗. 𝟑𝟕𝟎𝟏 and the obtained value lies within the rejection region. Therefore, there is a sufficient evidence to reject the null hypothesis. The result further implies that the response to the survey question has something to do with sex. INDEPENDENCE IN A 2X2 TABLE Independence in a 2x2 table chi-square or fourfold table involves two variables to test if these variables are independent form each other. These values are usually nominal. These values are arranged in the form of a 2x2 table which is composed of two rows (R) and two columns (C). Example: The frequencies shown in the table below are observed frequencies. The specific question is “Is there a significant difference in the job performance of mentors who failed and mentors who passed the teacher’s licensure examination?” Of the 100 subjects, 20 failed but with satisfactory job performance; 40 passed with satisfactory job performance; 25 failed with unsatisfactory job performance; and 15 passed with unsatisfactory job performance. Test the significant difference existing in the foregoing data. Job Performance Satisfactory Unsatisfactory Total Ha: There is no significant difference in the job performance of mentors who failed and mentors who passed the teacher’s licensure examination. There is a significant difference in the job performance of mentors who failed and mentors who passed the teacher’s licensure examination. Ho: Job Performance Satisfactory Unsatisfactory Total Teachers Licensure Examination Failed Passed Total 20 (27.00) 40 (33.00) 60 [1.81] [1.48] 25 (18.00) 15 (22.00) 40 [2.72] [2.23] 45 55 100 STAT 201: Statistical Methods I Ho: Teachers Licensure Examination Failed Passed Total 20 40 60 25 15 40 45 55 100 61 CHAPTER IV: CORRELATIONAL ANALYSIS The critical value of 𝜒2 at 0.05 level of significance with degrees of freedom of 𝑑𝑓 = (𝑟 − 1) ∗ (𝑐 − 1) = (2 − 1) ∗ (2 − 1) = 1 is 3.841. Interpretation: At the 0.05 significance level, the tabulated 𝝌𝟐 = 𝟖. 𝟐𝟒𝟗𝟐 and the obtained value lies within the rejection region. Therefore, there is a sufficient evidence to reject the null hypothesis. The result further implies that there is a significant difference in the job performance of mentors who failed and mentors who passed the teacher’s licensure examination. ASSESSMENT Login to mVLE portal to access the assessment for Chapter IV. REFERENCES: • • D.C. Montgomery and G.C. Runger, Applied Statistics and Probability for Engineers, 5th Edition, John Wiley & Sons, Inc., 2011. R.E. Walpole. R.H. Myers, S.L. Myers and K. Ye, Probability and Statistics for Engineers and Scientists, 9th Edition, Pearson International Edition, 2012. Zulueta, F. M. and Nestor Edilberto B. Costales, Jr. (2005). Methods of Research: Thesis Writing and Applied Statistics. Mandaluyong City: National Bookstore, Inc. STAT 201: Statistical Methods I • 62