A COMPARATIVE STUDY OF READING TEST ABOUT NEWS ITEM MATERIAL USING MULTIPLE-CHOICE AND ESSAY FORMAT AT THE FIRST GRADE STUDENTS OF SMA NEGERI 1 PADALARANG IN ACADEMIC YEAR 2012/2013 Rani Ranjani Permana ranz_myoung@yahoo.com English Education Study Program Language and Arts Departement STKIP Siliwangi Bandung Abstract The objectives of this research entitled “A Comparative Study of Reading Test about News Item Material Using Multiplechoice and Essay Format at the First Grade Students of SMA Negeri 1 Padalarang in Academic Year 2012/2013” were 1) to know whether or not there is a significant difference students’ score between multiple-choice and essay test; 2) to find out what factor that cause the significant difference; 3) to find out the index of difficulty or facility falue (FV); and 4) to find out the index of discrimination (D) of multiple-choice and essay tests. The design of this research was causal-comparative study and this research used quantitative research method. The instruments of this research were the tests, interview guide and questionnaire. The population of this research was 9 classes of students of the first grade and used random sampling technique to decide the samples of this research that consists of 38 students in 1 class. The collected data were analysing using t test for non-independent means. The results showed that: 1) the t-observed was 8.88 with the critical value df = 37 and significance level at 0,001. Then, the alternative hypothesis was accepted because the t-observed was bigger than t-critical value (8.88 > 3.33). 2) The factors that influenced different score of multiple-choice and essay tests were test items itself, time of conducting a test, and the students them self. 3) 45% of multiple-choice test items were easy and 55% of them were difficult to answer by students. In addition, 85% of essay test items were easy and 15% of them were difficult to answer by students. 5% items of multiple-choice belong to zero discrimination index, 15% items belong to negative discrimination index, 30% of items belong to middle discrimination index, and 50% items belong to low discrimination index. And 5% of essay items belong to negative discrimination index, 15% of items belong to middle discrimination index and 80% items belong to low discrimination index. Keywords: Comparative Study, Multiple-choice, Essay, Test common and widely used assessment tool for the measurement of knowledge, ability and complex learning outcomes” (Annie & Alan, 2009). Beside, “since a hundred years ago, all college course exams were essay items and today many teachers still consider essay questions the preferred method of assessment” (http://cfe.unc.edu, 2012). A. BACKGROUND Test is a specific way to measure performance because it will produce a number as a result of measuring individual’s performances. This numbers will be calculated because “measurement is a process of assigning numbers to qualities or characteristics of an object or person according to some rule or scale and analyzing that data based on psychometric and statistical theory” (Zimmaro, 2004:4). He also explained that: After the data assigned and analyzed in measurement process, the data will be assess in assessment process - process of gathering, describing or quantifying information about performance. Then, the data will be evaluated in evaluation process examining information about many components of the thing being evaluated and comparing or judging its quality, worth or effectiveness in order to make decisions. Finally, by passing testing, measurement, assessment and evaluation process, the learner’s performance can be described as student achievement. There are two kinds of test which is usually used in language teaching. They are multiple-choice and essay test. This fact reflected from many statements from experts. They are “today multiple-choice test is the most B. LITERATURE REVIEW 1. Test Douglas (2003) explained that: Test is a method because test is an instrument – a set of techniques, procedures or items – that requires performance on the part of the test-taker. Then, a test itself must measure general ability or very specific competencies or objectives from an individual’s ability, knowledge or performance. In language test, this performance tests is measure one’s ability to perform language like how to speak, write, read or listen to a subset of language. Finally, a test itself has to measure a given domain which is only measure the desired criterion and not include the other factors inadvertently. 1 whole test tended to do well or badly on each item in the test. The formula of item discrimination (D) shows in box: 2. Multiple-choice Items Definition of multiple-choice items was explained by Cunningham (2005:76), he argued that “multiplechoice items is one type of objective response items – conventional tests - which has four options although it may have three or five option provided”. Burton et al (1991) claim that “a multiple-choice items consists of two basic parts: they are a problem (stem) and the list of suggested solutions (alternatives)”. He also explains that “the stem may be in the form of either a question or an incomplete statement and the list of alternatives contains one correct or best alternative (answer) and a number of incorrect or inferior alternatives (distracters)”. 3. Essay Items According to Cunningham (2005:76) “essay items is one type of constructed response items which requires more emphasis on style, the mechanics of writing and other considerations that allow students to demonstrate their knowledge of a topic or their ability to perform the cognitive tasks”. He also claims that “essay test item requires the student to construct a longer response than is required for a short answer item. Its length can range from a paragraph to an answer that requires multiple pages”. Based on that definition that definitions, the writer wanted to know wheather or not significant different score between multiple-choice and essay score to find out the findings of Hickson & Reed (2009) research that claims “multiple-choice questions give higher scores than essay questions” and “multiple-choice scores have mean scores higher than essay scores”. 4. Item Analysis Item analysis is needed in analyzing a test questions because all items should be examined from the point of view of (a) their difficulty used the index of difficulty or facility value (FV) and (b) their level of discrimination used the index of discrimination (D) by Heaton (1990). a. The Index of Difficulty or Facility Value (FV) In his book, Heaton (1990:178) explains that “the index of difficulty (or facility value) of an item simply shows how easy or difficult the particular item proved in the test”. The formula of the index of difficulty or facility value (FV) shows in box: FV = D = Correct U – Correct L n *Note: ~ n = the number of candidates in either the upper or lower group (n = ½ N) ~ N = the number in the whole group as used previously in the facility value (FV) ~ U is Upper Groups of twenty students who got higher scores ~ L is Lower Groups of twenty students who got lower scores C. RESEARCH METHODOLOGY 1. Research Method Regarding the main objective of the study, the writer used quantitative research: causal-comparative study or ex post facto research because the writer systematically select two criterions of scores from the same groups of students (matched pairs subject) while answered the problems of this research to find out the cause for the significant differences between multiple-choice and essay scores in reading test about news item material in the same groups of students by directly observed and collected the data from the place that the research was placed in the first grade students class at SMA Negeri 1 Padalarang in academic year 2012/2013 consists of thirty eight students that conducted in one class – X1 class. Then, the other data took from all of students fill the questionnaire that given by the writer and also three of English teachers was visited by the writer to had interview sessions with them. 2. Instrument of the Research This research was analyzing the student comprehension by implemented achievement test. This test using multiple-choice and essay format in the first grade students of SMA Negeri 1 Padalarang conducted with reading test about news item material, then after collected the data from test, the scores that produced by students analyzed by using item analysis based on their difficulty and their level of discrimination also interview session used to collected data from English teachers and used questionnaire to collecting the data from the students to supported the other data from test and interview session. 3. Research Population and Samples The study of this research took place in SMA Negeri 1 Padalarang and covers the first grade students.And the samples for this research are first grade students at X1 class that consists of thirty eight students and three English teachers. 4. Data Collection The data were collected using three techniques, test, interview and questionnaire. This research employed test as an instrument to gain detail information. Each tests consists of twenty questions which was divided into four texts about news item n multiple-choice and essay format. Then, 𝑹 𝑵 * Note : ~ FV = Facility Value or Index of Difficulty ~ R = the number of correct answers ~ N = the number of students who are taking the test b. The Index of Discrimination (D) Still based on the same expert, Heaton (1990:179) explained that the index of discrimination (D) tells us whether those students who performed well on the 2 interview guide for English teachers consists of three questions about test in multiple-choice and essay. Also, the questionnaire for students who did tests consists of three questions about test in multiple-choice and essay format. 5. Data Analysis In this research, the data which had been collected from test, interview session and questionnaire was analyzed by many steps, they are: a. The data from test were analyzed using t-test for nonindependent means and item analysis (The Item Difficulty / FV and The Item Discrimination / D). b. The data from interview session was analyzed by generating natural units of meaning – limitation for the topic in categories, classifying, categorizing and ordering these units of meaning, structuring narratives to describe the interview and interpreting the interview that data. c. The data from questionnaire was analyzed by data reducing and data editing. 6. Research Procedures The writer divided the procedures of this research into three days of observation. 1. In the first day, the writer gave the multiple-choice test of news item material that consists of twenty questions to the thirty eights students in X1 class and questionnaire about multiple-choice and essay tests. 2. In the second day, the writer gave the essay test of news item material that consists of twenty questions to the thirty students in X1 class. 3. In the third day, the writer interviewed the three of English teacher with three questions about multiplechoice and essay tests. 2. The t Test The writer of this research employed t test for nonindependent or correlated means – in minitab 14 called paired t test – to determine whether there is a statistically significant difference between the means of two matched or non-independent samples – two groups consisting of the same people or matched pairs (Fraenkel et al, 2012:G-9; Crowl, 1996:432). The summary of that paired samples data presents in Table 2. Table 2 Statistical Summary of Paired Sample Test Paired Samples Test Paired Differences 95 % Confidence Interval of the Difference Mean Multiplechoice Scores Essay Scores Mean (M) Minimum Maximum Std. Dev (SD) SE Mean 38 44.60 25.00 70.00 10.48 1.70 38 60.42 28.75 81.25 11.30 1.83 1 & Essay Dev Mean -15.82 10.98 Lower 1.78 -19.43 Upper -12.21 t 8.88 df 37 Sig. (2tailed) 0.000 B. Research Discussion 1. Explanation of Significant Difference between Multiple-choice and essay scores based on Calculation of t Test for Non-independent or Correlated Means – in minitab paired t test Based on the Table 1 above, the mean of essay scores (M = 60.42; SD = 11.30) is significantly different from the mean of multiple-choice scores (M = 44.60; SD = 10.48) of the same group who had been tested in news item material using multiple-choice and essay format test. It was a fact that for each student who did tests, the scores of essay test is bigger than multiple- Table 1 Means and Standard Deviation in a Paired Samples t Test Observations Multiple-choice SE 3. Interview This research employed interview session for collecting the descriptive data from the respondents who are three teachers of English Major at SMA Negeri 1 Padalarang. All of the respondents was given three questions about test that specifically talked about multiple-choice and essay test. The writer tried to find out the preference test of each English teacher between multiple-choice and essay format in which is always used in their school environment. Then, the writer wanted to know the opinion of all of the respondents about significant different scores between multiplechoice and essay test which has been experienced by them in the class. And finally, the writer tried to find out types of test which always used by the respondents to evaluate their students but specifically to multiplechoice and essay test. 4. Questionnaire This research employed questionnaire for collecting the descriptive data from the respondents who are thirty eight students in X1 class at SMA Negeri 1 Padalarang. All of the respondents was given three questions about test that specifically talked about multiple-choice and essay test. D. RESEARCH FINDINGS AND DISCUSSIONS A. Research Findings Based on the data which was collected in this research, there are several findings that can be discussed as a result in this research. 1. Means and Standard Deviation in a Paired Samples t Test for Multiple-choice and Essay Scores The writer employed t test for non-independent – in minitab 14 is paired samples test. And The data then presented the statistical summary of the data in Table 1. Variable Pair Std. 3 choice. So, it was because the mean for essay scores is bigger than the mean for multiple-choice scores. Also, based on the Table 2 above, the difference mean of multiple-choice (M = 44.60; SD = 10.48) and essay scores (M = 60.42; SD = 11.30) is M = -15.82 and SD = 10.98. Besides, the t value for paired sample test between multiple-choice and essay test is -8.88. The negative t value means that the essay scores in row 2 is higher than multiple-choice scores in row 1. But, it isn’t a problem because “a t value may be either positive or negative” depend on the arrangement of the data in the table (Crowl, 1996:180). Finnally, it presented the evidence that there is a statistically significant difference between the two means of multiple-choice and essay scores which was tested in the same groups of samples – thirty eight students in paired sample – because the t value for df= 37 are 3.3 (more detail t value = 3.326) and 3.5 (more detail t value = 3.474) where the obtained t value of 8.88 is higher than 3.3 and 3.5. Based on Table 4, the writer classified four performances of items for multiple-choice questions. First, 1 item (5%) of multiple-choice item belong to zero discriminant index. Second, 3 items (15%) of multiple-choice items belong to negative discriminant index. Third, 6 items (30%) of multiple-choice items belong to middle discriminant index. And fourth, 10 items (50%) of multiple-choice items belong to low discriminant index. In addition, based on Table 4 the writer classified three performances of items for essay questions. First, 1 item (5%) of essay item belong to to negative discriminant index. Second, 3 items (15%) of essay items belong to middle discriminant index. And third, 16 items (80%) of essay items belong to low discriminant index. The detail information present in the Table 4. Table 4 Table of Item Difficulty or Index of Facility Value (FV) and Index of Discriminat (D) of Essay Items Items 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2. Explanation of Item Difficulty or Index of Facility Value (FV) of Multiple-choice and Essay Items Based on the Table 5, the writer classified two performances of items for multiple-choice questions. First, the writer claims that only 9 items (45%) – the facility value (FV) with black and green colors – from 20 items were easy to answer by students. Second, 11 items (55%) –the facility value (FV) with red color – from 20 items were difficult to answer by students. In addition, based on Table 3 the writer classified two performances of items for essay questions. First, the writer claims that only 3 items (15%) – the facility value (FV) with red color – from 20 items were difficult to answer by students. And 17 items (85%) – the facility value (FV) with green and blue colors – from 20 items were easy to answer by students. The detail information present in the Table 3. U+L 3 31 9 10 5 7 20 26 3 37 13 12 1 30 28 9 27 33 26 11 FV 0,08 0,82 0,24 0,26 0,13 0,18 0,53 0,68 0,08 0,97 0,34 0,32 0,03 0,79 0,74 0,24 0,71 0,87 0,68 0,29 U-L 1 1 7 8 3 -1 8 0 1 1 7 4 -1 6 8 3 3 5 2 -1 FV 0,62 0,42 0,47 0,32 0,64 0,88 0,75 0,88 0,48 0,70 0,36 0,66 0,37 0,67 0,67 0,53 0,68 0,71 0,78 0,51 U-L 2,00 0,50 6,25 2,50 1,75 -0,25 1,00 4,75 2,75 4,25 4,50 6,25 1,50 5,50 2,50 0,50 2,00 8,00 2,25 2,75 D 0,11 0,03 0,33 0,13 0,09 -0,01 0,05 0,25 0,14 0,22 0,24 0,33 0,08 0,29 0,13 0,03 0,11 0,42 0,12 0,14 3. Interpretation of Result of Interview Session with Teachers This research employed interview session for collecting the descriptive data from the respondents who are three teachers of English Major at SMA Negeri 1 Padalarang. Then, all of the respondents were given three questions about test that specifically talked about multiple-choice and essay test. The writer tried to find out the preference test of each English teacher between multiple-choice and essay format in which is always used in their school environment. Then, the writer wanted to know the opinion of all of the respondents about significant different scores between multiplechoice and essay test which has been experienced by them in the class. And finally, the writer tried to find out types of test which always used by the respondents to evaluate their students but specific to multiple-choice and essay test. After transcribing, analyzing and verifying the data from interview with the three respondents, the detail information presented in the Table 5. Table 3 Table of Item Difficulty or Index of Facility Value (FV) and Index of Discriminant (D) of Multiple-choice Items Items 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 U+L 23,50 16,00 17,75 12,00 24,25 33,25 28,50 33,25 18,25 26,75 13,50 25,25 14,00 25,50 25,50 20,00 26,00 27,00 29,75 19,25 D 0,05 0,05 0,37 0,42 0,16 -0,05 0,42 0 0,05 0,05 0.37 0,21 -0,05 0,32 0,42 0,16 0,16 0,26 0,11 -0,05 4 Table 5 Descriptions of Interview Contents Multiple-choice Test 1. Multiple-choice was choosed by the respondents because this test can give advantages not only for the students but also the teacher as tester. 2. The advantage for the students itself is can help them with the optional answers if they don’t know the answer or may be feel confuse with the questions (although it’s also a disadvantage of multiplechoice test by doing guessing or choose the answer randomly) 3. Multiple-choice test also give an analytical thinking for the students because to answer the questions correctly, the students have to think analytically to choose the right answer between the optional answers that always given in multiple-choice test. 4. The advantage also can be felt by teacher while they are checking their students’ answer. To correcting ot checking the multiple-choice is not take a long time. 5. Multiple-choice also able to produce higher scores but it’s depend on the quantity of questions (how many questions are given in one test) and the quantity of scores for each question (how many points are given in each items) or when this scores calculated with essay or the other scores that given in one time of test. 6. Multiple-choice could give higher scores for students if the material in the test prevoiusly given to the students although in different format test, students able to answer because this material understood and known by students before. Students' Preference Tests Essay Test 1. Essay test was choosed by the respondent because this test can give advantages not only for the students but also the teacher as tester. 2. The advantage for the students itself is can help them to express their opinion or comprehension about material that have been studied by them freely (although it’s always not supported by students who unable to express their opinion in written text). 3. Essay test also give the opportunity for the teacher to measure their students’ ability in writing and make students can’t easily cheat while they do the test (although this test is take a long time to check and correct all of the students’ answer. 4. Essay test mostly belived can give higher scores by both of the students or teacher because the quantity of each item is higher than multiplechoice item (although essay test only consists of a few items). Multiplechoice Test 13% Essay Test 32% 55% Both Multiplechoice and Essay Test Chart 1 Percentage of Students Preference Tests b. Students’ Preference Tests that Produce Higher Scores Chart Based on the data of questionnaire from the students, the writer found that: First, 63% of the students in X1 class choose essay test as their preference test that can give higher scores (if multiplechoice and essay test compared). Second, 18% of the students in X1 class choose multiple-choice test as their preference test that can give higher scores (if multiplechoice and essay test compared). And third, 18% of the students in X1 class choose both of multiple-choice and essay tests as their preference test that can give higher scores (if multiple-choice and essay test compared). The detail interpretation present in the Chart 2. Students' Preference Tests that Produce Higher Scores Multiplechoice Test 18% 19% Essay Test Both Multiplechoice and Essay Test 63% Chart 2 Percentage of Tests that Produce Higher Scores c. Students’ Suggestion Tests for Teacher Chart Based on the data of questionnaire from the students, the writer found that: First, 50% of the students in X1 class choose both of multiple-choice and essay tests as their suggestion tests to teacher to be given for their students. Second, 37 % of the students in X1 class choose essay test as their suggestion test for teacher to be given to their students. And third, 13% of the students in X1 class choose multiple-choice test as their suggestion test for teacher to be given to their students. The detail interpretation present in the Chart 3. 4. Interpretation of the Result of Questionnaire from Students After reduced and edited by the writer, the data from questionnaire presents in the Table 4.12 but to summarized the data, the writer reduced the data and changed into pie charts that consists of three chart which is each chart presents a description data from questionnaire. a. Students’ Preference Tests Chart Based on the data of questionnaire from the students, the writer found that: First, 55% of the students in X1 class choose multiple-choice test as their preference test. Second, 32% of the students in X1 choose essay test as their preference test. And third, 13% of the students in X1 choose both of multiplechoice and essay tests as their preference test. The detail interpretation present in the Chart 1. Students Suggestion's Tests for Teacher Multiple-choice Test 13% Essay Test 50% 37% Both Multiplechoice and Essay Test Chart 3 Percentage of Students Suggestion’s Tests for Teacher 5 Research”. Journal the International Multi Conference of Engineers and Computer Scientists (Vol II), pp. 18 – 20. Brown, H. D. (2003). Language Assessment: Principles and Classroom Practices. California: Longman. Burton, S. J. et al. (1991). “How to Prepare Better Multiple-Choice Test Items: Guidelines for University Faculty”. Journal of Brigham Young University Testing Services and The Department of Instructional Science., pp. 1-33. Cohen, L. et al. (2007). Research Methods in Education (Sixth Edition). New York: the Taylor & Francis e-Library. Crowl, T. K. (1996). Fundamentals of Educational Research (Second Edition). Dubuque: Times Higher Education Group Inc. Cunningham, G. K. (2005). Assessment in the Classroom: Constructing and Interpreting Tests. London: Falmer Press. Doddy, A. et al. (2008). Developing English Competencies for Senior High School Grade X. Jakarta: Setia Purna Invest. Fraenkel, J. R. et al. (2012). How to design and evaluate research in education (Eighth edition).New York: McGraw Hill. Fulcher, G. (2010). Practical Language Testing. London: Hodder Education. Heaton, J. B. (1990). Writing English Language Tests. New York: Longman Inc. Hickson, S.And Reed, W. R. (2009).“Do Essay and Multiple-choice Questions measure the Same Thing?”. March (3rd), pp. 127. Hughes, A. (2003). Testing for Language Teachers (Second Edition). Cambridge: Cambridge University Press. Karimi, L. and Mehrdad, A. G. (2012).“Investigating Administered Essay and Multiple-choice Tests in the English Department of Islamic Azad University, Hamedan Branch”. Journal of Canadian Center of Science and Education Vol. 2 (3), pp. 69-76. Kuechler, W. L. and Simkin, M. G. “How Well Do Multiple Choice Tests Evaluate Student Understanding in Computer Programming Classes?”. Journal of Information Systems Education, Vol. 14(4), pp. 389-399. Kusumawardhani, S. (2012). “The Effect of Test using English on Students’ Achievement in Mathemathics Formative Test”. Apple3l of Journal vol. 1, pp. 66-79. Meyers, J. L. et al.(2010). “Performance of Ability Estimation Methods for Writing Assessments under Conditions of Multidimensionality”.Journal of National Council on Measurement in Education.May, pp. 1-45. Ramraje, S. N. (2011). “Comparison of the Effect of Post-instruction Multiple-choice and Short-answer Tests on Delayed Retention Learning”. Journal of Australasian Medical, 4, 6, pp. 332-339. University of North Carolina at Chapel Hill.(2012). “Writing and Grading Essay Questions”.March.Retrieved from http://cfe.unc.edu. Walstad, W. And Becker, W. (1994). “Achievement Differences on Multiple-Choice and Essay Tests in Economics”.Journal of CBA Faculty Publications.Retrieved from http://digitalcommons.unl.edu/cbafacpub/34. Webster, N. (1979). Webster’s Deluxe Unabridged Dictionary. USA: Simon & Schuster, a division of Gulf and Webster Corporation. Zimmaro, D. M. (2004). “Writing Good Multiple-Choice Exams”. Jornal of the Division of Instructional Innovation and Assessment”. August, pp. 1-40). E. CONCLUSIONS AND SUGGESTIONS A. Conclusions 1. There is significant difference in score based on multiple-choice mean (M = 44.60) and essay mean (M = 60.42) calculated by using t test for non-directional or correlated means – in minitab paired t test – and found that the mean difference of their means is -15.82, t observed is -8,88, the critical value with df = 37 and significant at level 0,001 (t table value = 3.32). 2. The factors that influenced different score of multiple-choice and essay tests were test items itself, time of conducting a test, and the students them self ( as subject of learning process). 3. Based on item difficulty (FV), 45% of multiplechoice test items were easy and 55% of them were difficult to answer by students. And in addition, 85% of essay test items were easy and 15% of them were difficult to answer by students. 4. Based on item discrimination (D), 5% items of multiple-choice belong to zero discrimination index, 15% items belong to negative index discrimination index, 30% of items belong to middle discrimination index, and 50% items belong to low discrimination index. In addition, 5% of essay items belong to negative discrimination index, 15% of items belong to middle discrimination index and 80% items belong to low discrimination index. B. Suggestion Based on the findings of this research, the writer proposes several suggestions as follows: 1. Teachers should produce a test with a good proportion that can be answered by the students correctly when they want to evaluate their students’ comprehension about the topic. 2. Teachers should use not only one type of test but at least combine two types of test to make valid measurements of their students’ ability. 3. Teachers should design test items with item difficulty (FV) in balanced proportion such as easy items = 25%, moderate items = 50% and difficult items = 25%. 4. Teachers should design the test with good discriminating by employing index of discrimination (D) to distinguish the ability of her/his students (between above-average student and below-average student). 5. For another future research, it is important to analyze each kind of tests as the focus of the research. BIBLIOGRAPHY Annie, W. Y. N. And Alan, H. S. C. (2009).“Different Methods of Multiple-Choice Test: Implications and Design for Further 6