Final Project of Assessment Alfian Bagus H (100221404953) Dwi Rudi H(10221404963) Type of Test Based on the Purpose and Teacher’s Scoring Type of test based on the purpose Type of test Purpose Material Orientation 1. Aptitude test To measure one’s talents in language Language as a general ability To predict a person’s success to exposure to the second language 2. Screening test To select or to allow able candidates Reflecting requirements in the future performance 3. Placement test To place individual to their appropriate level/class Reflecting the materials in the instructional program To predict the future performance of the candidates Pupils will not be misplaced so that the pupil’s learning will be optimum 4. Achievement test To assess the extent to which the students have achieved Reflecting the instructional materials in To know if the standard of competence Dimension and Aspect MLAT&PLAB MLAT: Number learning; Phonetic script; Spelling cues; Words in sentences; Paired associates SNMPTN TPA; Second Language Placement Test at San Fransisco Assessing comprehension and production, responding through written and oral performance, openended and limited responses, selection and gap-filling formats Formative and summative test the pre stated learning objectives Identifying students’ learning strengths and weaknesses in an instructional program syllabus/curri culum have been achieved Covering the instructional materials of a learning program To know the progress of the teaching and learning activity 6. Proficiency test To know the current stage regardless the education had taken before Language as a general ability For academic future 7. Research purpose To answer the research questions To measure the conceptualize trait 8. Program evaluation To see the effectiveness of a certain program Conceptualize general language ability Conceptualize general language ability 5. Diagnostic test GMAT Arithmetic (roots, powers, number properties, special character, modules); Statistic; Word problems (min/max, overlapping sets, rate, work, mixture); Geometri; Probability; Combination; Algebra TOEFL, IELTS, TOEIC, BULATS, etc TOEFL: reading, listening, speaking, writing comprehension Type of test based on the teacher’s scoring 1. Subjective test It is a test in which the learners ability or performance are judged by examiners’ opinion and judgment. It requires the examinees to create their own responses. No single wording (or set of actions) can be regarded as the only correct response, and a response may earn full or partial credit. Responses must be scored subjectively by content experts. Example: writing essay and short answer. 2. Objective test It is a test in which learners ability or performance are measured using specific set of answer. This test consists of factual questions and requires extremely short answers that can be quickly and unambiguously scored by anyone with an answer key, thus minimizing subjective judgments by both the person taking the test and the person scoring it. They tend to focus more on specific facts than on general ideas and concepts. Example: multiple choices test, true or false test, matching and problem based questions. ATTACHMENT The Example of the Questions 1. MLAT PART I: NUMBER LEARNING Part I of the MLAT has 43 possible points. This part of the MLAT tests auditory and memory abilities associated with sound-meaning relationships. In this part of the MLAT, you will learn the names of numbers in a new language. Subsequently, you will hear the names of numbers spoken aloud, and you will be asked to write down these numbers. For example, if you heard someone say the number “seventeen” in English, you would write down 1 7. But in this test, you will hear the numbers in a new language. Here’s how it will work: You will hear some instructions read aloud. The speaker will then teach you some numbers (not the same as these samples, of course). The speaker will say something like: [The red text represents the voice you will hear.] Now I will teach you some numbers in the new language. First, we will learn some single-digit numbers: “ba” is “one” “baba” is “two” “dee” is “three” Now I will say the name of the number in the new language, and you write down the number you hear. Try to do so before I tell you the answer: “ba” -- That was “one” “dee” -– That was “three” “baba” -– That was “two” Now we will learn some two-digit numbers: “tu” is “twenty” “ti” is “thirty” “tu-ba” is “twenty-one” in this language -- because “tu” is twenty and “ba” is one. “ti-ba” is “thirty-one “ – because “ti” is thirty and “ba” is one. Now let’s begin. Write down the number you hear. a. ti-ba [you have only about 5 seconds to write down your answer] b. ti-dee c. baba d. tu-dee PART II: PHONETIC SCRIPT Part II of the MLAT is a test of your ability to learn a system for writing English sounds phonetically. There are 30 possible points in this section. First you will learn phonetic symbols for some common English sounds. For each question, you will see a set of four separate syllables. Each syllable is spelled phonetically. A speaker will model the sounds for you by pronouncing each of the four syllables in a set. Then the speaker will model the sounds in the next set. After the speaker models the sounds in five sets, you will be asked to look back at the first set. The speaker will go through the groups again, but this time the speaker will say only one of the 4 syllables in a set. Your task is to select the syllable that has a phonetic spelling that matches the syllable you heard. For example, you would look at the first five sets. They would look something like this: 1. bot but bok buk 2. bok buk bov bof 3. geet gut beet but 4. beek beev but buv 5. geeb geet buf but [Remember, the red text represents the voice of the speaker that you will hear] The speaker will then pronounce each of the four syllables in each of the five sets. You follow along: 1. “bot” “but” “bok” “buk” 2. “bok” “buk” “bov” “bof” 3. “geet” “gut” “beet” “but” 4. “beek” “beev” “but” “buv” 5 . “geeb” “geet” “buf” “but” Then the speaker will go back to number 1 and pronounce just one syllable from the set of four. So, you might hear: 1. “buk” During the actual test, you must indicate which syllable you heard by darkening the corresponding space on the computer answer sheet. Then you hear the next question: PART III: SPELLING CUES Part III of the MLAT has 50 questions. This part of the MLAT requires the ability to associate sounds with symbols and depends somewhat on knowledge of English vocabulary. It is also somewhat speeded, and therefore, it is much more challenging than the following exercise, which consists of only 4 practice question. Nonetheless, trying these sample questions will give you a good idea of what Part III is like. Each question below has a group of words. The word at the top of the group is not spelled in the usual way. Instead, it is spelled approximately as it is pronounced. Your task is to recognize the disguised word from the spelling. In order to show that you recognize the disguised word, look for one of the five words beneath it that corresponds most closely in meaning to the disguised word. When you find this word or phrase, write down the letter that corresponds to your choice. Try all four samples; then click below to check your answers. NOW GO RIGHT AHEAD WITH THESE SAMPLE QUESTIONS. WORK RAPIDLY! 1. kloz A. attire B. nearby C. stick D. giant E. relatives 2. restrnt A. food B. self-control C. sleep D. space explorer E. drug PART IV: WORDS IN SENTENCES There are 45 questions in MLAT Part IV. The following exercise consists of only 4 practice questions. The MLAT questions test recognition, analogy, and understanding of a far greater range of syntactic structures than the 4 sample questions shown here. In each of the following questions, we will call the first sentence the key sentence. One word in the key sentence will be underlined and printed in capital letters. Your task is to select the letter of the word in the second sentence that plays the same role in that sentence as the underlined word in the key sentence. Look at the following sample question: Sample: JOHN took a long walk in the woods. Children in blue jeans were singing and dancing in the park. A B C D E You would select “A.” because the key sentence is about “John” and the second sentence is about “children.” NOW GO RIGHT AHEAD WITH THESE SAMPLE QUESTIONS. Write down your answers so that you can check them when you are finished. 1. MARY is happy. From the look on your face, I can tell that you must have had a bad day. A B C D E 2. We wanted to go out, BUT we were too tired. Because of our extensive training, we were confident when we were out sailing, A B C yet we were always aware of the potential dangers of being on the lake. D E PART V. PAIRED ASSOCIATES Part V of the MLAT focuses on the rote memory aspect of learning foreign languages. On the actual test, you will have 2 minutes to memorize 24 words. You will then do a practice exercise. You can look back at the vocabulary during this practice exercise, but you will not be permitted to look at the vocabulary or at your practice sheet while you are doing the Part V questions that follow the exercise. Your task here is to MEMORIZE the Maya-English vocabulary below. There are only six words to memorize on this practice test. Keep in mind that the vocabulary list on Part V of the MLAT will be 4 times longer than this sample. Take 40 seconds to memorize this vocabulary. Then click below to go to the questions. Do not look back at the vocabulary until you have finished responding to the sample questions. Vocabulary Maya – English c?on gun si? wood k?ab hand kab juice bat ax pal son NOW GO RIGHT AHEAD WITH THESE SAMPLE QUESTIONS. Write down your answers so that you can check them when you are finished. 1. bat A. animal B. stick C. jump D. ax E. stone 2. kab A. juice B. cart C. corn D. tool E. run 3. c?on A. story B. gun C. eat D. mix E. bird 2. English Placement Test Practice 1 - Usage - Choose the Best Answer Mr. Smith ___________ to the store, bought some milk, gave the clerk $5.00 and _________ back $2.25 in change. a) gone / got b) went / had c) went / got d) gone / had 2 - Usage - Which sentence is punctuated properly? a) The suspect broke, free ran through the street, turned the corner and escaped. b) The suspect broke free, ran through the street, turned the corner and escaped c) The suspect broke free, ran through the street, turned the corner and escaped. d) The suspect broke, free ran through the street, turned the corner and, escaped. 3 - Comprehension - Read the Short Paragraph and Choose the Answer That Must Be True Jack had 3 sisters and 3 brothers. He was not the oldest and not the youngest. All of the girls had red hair. Jack had red hair. Everyone else had brown hair. a) Jack had 3 older brothers and 3 younger sisters. b) Jack's brothers all had brown hair. c) Jack was one of 6 children. d) Jack liked his oldest sister more than any other sibling. 3. Achievement test for 3rd grade of elementary school Directions: Carefully read each question. Fill in the circle next to the correct answer. 1. What word is an antonym for cool? O A. warm O B. mild O C. damp 2. The student misbehaved in class. What does the word misbehaved mean? O A. behaved well O B. behaved quietly O C. behaved badly 4. GMAT Diagnostic test ARITHMETIC (ROOTS) 1. 324 + 289 = ? (A). 32 (B). 33 (C). 34 (D). 35 (E). 36 2. 36 + 64 + 52 + 20 = ? A. 19 + 20 B. 19 20 C. 145 D. 5 100 + 20 E. 7 5 3. If x is an integer and x ´ x - x = a, which of the following must be true? I. a is Even II. a is Positive III. a is an Integer A. I only B. II only C. III only D. I and II E. None of the above 4. Proficiency test (TOEFL) SECTION 1: Reading Comprehension 1. According to the passage, how do memories get transferred to the STM? A) They revert from the long term memory. B) They are filtered from the sensory storage area. C) They get chunked when they enter the brain. D) They enter via the nervous system. 2. The word elapses in paragraph 1 is closest in meaning to: A) passes B) adds up C) appears D) continues 5. Objectives test 1. The respiratory center in the brainstem is NOT affected by which situation? A. high levels of carbon dioxide in the blood B. high levels of hydrogen ions in the blood C. low levels of oxygen molecules in the blood 2. What is a waste product normally excreted in the urine? A. excess glucose B. excess protein C. red blood cells D. urea Test type Test type based on test construction 1. Direct test An item which test the students' ability to do something, such as write a letter or make a speech rather than testing individual language points. Ex : writing test 2. Indirect test An item that tests knowledge of the language (grammar and vocabulary) during the students' ability to do things such as write a letter, make a speech E.g.: “Structure & Written Expression” section of theTOEFL Test type based on score interpretation 1. Norm-Referenced testing Norm-referenced tests (or NRTs) compare an examinee’s performance to that of other examinees. The goal is to rank the set of examinees so that decisions about their opportunity for success Ex: college entrance. The Stanford, Metropolitan, and California Achievement Tests (SAT, MAT, and CAT), as well as the Iowa and Comprehensive Tests of Basic Skills (ITBS and CTBS) 2. Criterion-referenced tests Criterion-referenced tests (or CRTs) differ in that each examinee’s performance is compared to a pre-defined set of criteria or a standard (SK KD) The goal with these tests is to determine whether or not the candidate has the demonstrated mastery of a certain skill or set of skills. These results are reaching the standard exactly, below the standard, beyond the standard. E.g : A national board medical exam. Either the examinee has the skills to practice the profession, in which case he or she is licensed, or does not 3. Communicative testing Linguistics competence: the purpose of this test to know the ability in using good and right language and this test also called grammatical competence. Sociolinguistics competence: the purpose of this test to know the ability in using good and right language in the right time. Strategic competence: the ability in using communication strategy to make the message that will be delivered easy to understand. Organizational competence: the purpose of this test to know the ability in making statements to be delivered well. 4. Performance Performance based assessment means that you may have a difficult time distinguishing between formal and informal assessment. A characteristic of many performance based language assessment is the presence of interactive task. In such cases, the assessment involves learners in actually performing the behavior that we want to measure. Ex: the test takers are measured in the act of speaking, requesting, responding or in combining listening and speaking and in integrating reading and writing. Test type based on approaches 1. Discrete-Point Test The test is based on mastering grammar only, vocabulary or both. Discrete-Point Test called atomistic because an approach emphasize on dividing language to a small part. Another characteristic of this test is using multiple choices. 2. Integrative testing An item which tests more than one skill at a time (e.g. a writing task tests the students grammar, vocabulary, punctuation, spelling,) 3. Power test To measure the level of performance with sufficient difficulty and ample time to complete. This test is higher order thinking and also very difficult, need much time, need much money Ex: research 4. Speed test In the speed test the scope of the questions is limited and the methods you need to use to answer them is clear. Taken individually, the questions appear relatively straightforward. Speed test are concerned with how many questions you can answer correctly in the limited time. Example : IQ test, psychology test 5. Computer Adaptive testing Computerized adaptive testing (CAT) is a form of computer based test that adapts To the examinee's ability level. For this reason, it has also been called tailored testing. The computer is programmed to fulfill the test design as it continuously adjusts to find question of appropriate difficulty for test-takers at all performance level. In CATs, the test-taker sees only one question at a time, and the computer scores each question before selecting the next one. As a result, test-takers cannot skip question and once they have entered and confirmed their answer, they cannot return to question or to any earlier part of the test. Example : GMAT (Graduate Management Admission Test). 6. Standardized test a standardized test is a test that is administered and scored in a consistent, or "standard", manner. Standardized tests are designed in such a way that the questions, conditions for administering, scoring procedures, and interpretations are consistent and are administered and scored in a predetermined, standard manner Example : ACT (American College Testing), GRE (Graduate Record Examination) Approaches to language testing 1. Classical approach It’s to Emphasis the grammatical rule, memorizing vocabulary, translating classic text. It applied in GTM on learning English (concentrated on how to teach grammar and drilling vocabulary). The classical approach will produce good translator. Attachment Ex: He . . . plays on the computer 1) He . . . listens to the radio. 2) They . . . read book. 3) Pete . . . gets angry. 4) 5) 6) 7) Tom is . . . very friendly. I . . . tae sugar in my coffee. Ramon is . . . hungry My grandmother . . . goes for a walk in the evening. The weakness students’ language knowledge is not communicative Students will feel bored and language feels difficult Speech and listening are neglected Not natural The strength The students’ knowledge about the language target is undoubted. Students’ competence in writing and translating will develop. 2. Discrete approach The test is based on mastering grammar only, vocabulary or both. Discrete-Point Test called atomistic because an approach emphasize on dividing language to a small part. Another characteristics of this test is using multiple choice. Ex: dialog completion Pair dialog performance The weakness Time consuming and not natural the success in taking the test do not means that they can use the language in everyday life The strength the test can cope a wide range of material Objective scoring and efficient test Allow quantification on students’ response 3. Integrative Approach/ Unitary Competence Hypothesis English cannot be broke into several components, it is a unity The approach is to emphasis in mastering integrative skill. The test is usually a test that is demanded on several skill to manage Ex: cloze test Dictation Interview Writing composition The weakness Complicated mode of test The weakness is merely the weakness of the particular test used The strength Students’ ability to communicate will develop Seems natural Challenging and can reveal students’ integrative skill Approach to language testing: Communication-Based Movement Approach and The Performance-Based Movement Approach 1) Communication-Based Movement Approach Language is a means of communication Emphasis on the function of the language skill that is listening, speaking, reading, and writing. The test has no explicit testing of language component grammar and vocabulary Applied according to the social context Ex: At a bank between teller and customer At a class between students and teacher Examples Writing test based on real life situation Multiple choices using dialog Ex: “that women over there looks confused” Why don’ you ask her? a) Does she need help? b) If she need help c) Whether she need help d) Do you need help e) She needs helps or not Advantages Realistic in terms of formats Widens the concepts of language ability beyond those of grammatical ability A prediction indicator of success in communicating in real life? The use of language skills in integration Weaknesses Generalizability of the test results The influence of native language Example: If the teacher comes from the same region, they might correct the students’ errors influenced by native language. 2) The performance-based movement approach (Authentic Assessment) The believe of The performance-based movement approach language is the vehicle of content Performance-Based Movement Approach give the students chance to demonstrate their knowledge but also disclose more in-depth information on students’ academic needs. To emphasis in the task and language behaviours, and also contents and learning outcomes with the present reference. Applied in CTL(performance based) is linked to task and language behavior and also relates to content and learning outcomes with the present reference Testing practice in the CTL is authentic assessment. Class language test Portofolio Project Experiment Extended Response Advantages The pillars of the CTL similar to the seven components of effective learning Students are under the guidance of the teacher Students’ speaking presentation is essentially a form of public taking. Teacher has vital role Weaknesses Fewer questions and call for a greater degree of subjective judgment There are no clear right and wrong answers Different teacher might grade a student’s work differently. Characteristics of a Good Test: “RELIABILITY” A good test need to be reliable”, so what is reliable ? Reliable’ means ‘stable’ or ‘consistence’ Reliable test is a test that can produce stable scores or consistent scores Test scores demonstrated consistency or stability no matter who administers the test, when or where the test is administered Mathematical term of a score X=T+E X : Obtained score/ observed score T : True score E : Error How to estimate reliability? Test retest The test-retest reliability estimation involves utilization of the same test to a number of test takers on different testing occasions Used to assess the consistency of a measure from one time to another Weakness It is not easy to create a similar condition on different testing occasion. It is not known exactly what the best time interval for conducting the second test administration ; too long or too close. Strength It has only one set of test to be constructed. , so we need less time and energy Parallel forms The parallel forms technique requires the construction of two or more sets of test which these parallels test are made equal in every aspects of the test. weakness Making test that are equally similar in all aspect is not an easy task. It needs more energy and time consuming It is not easy to keep the test taker’s mental condition the same when they responded to two sets of test administered almost at the same time. Strength Two form can be use in independent. Internal consistency This internal consistency approach is based on the logic that if the items in the test are highly correlated, the test is said to be reliable. Internal consistency Split half In split-half estimation we randomly divide all items that purport to measure the same construct into two sets Weakness It does not fully reflect the true of reliability of the test Inter-item Inter -item estimation uses all of the items on our instruments that are designed to measure the same construct. Used to assess the degree to which different raters/observers give consistent estimates of the same phenomenon. Characteristics of a Good Test “Validity” Based on Kline (1993: 15), a test is said to be valid if it measures what it claims to measure. Simply, validity is the precision of the test in measuring what is intended to be measured. There are three aspects or dimensions of the validity (validity instruments): 1. Face validity ‘The concept of face validity relates more to what a test appears to measures than what test actually measures’ (Cohen et al., 1988:125). Face validity of a test then is linked to what a test looks. If a test looks to measures what is intended to measure on “the face of it” or on its look, the test can be said to be face valid. In brief, face validity refers to the extent to which physical appearance of the test corresponds to what it is claimed to measure. For example: a speaking test is constructed and claims to test speaking abilities. When test takers are to respond to the speaking test, and they produce speaking oral language as the test claims, we can say that such a speaking is face valid to the test takers. 2. Content validity Based on Wiersma and Jurs (1990:184), content validity means the extent to which the test is representative of a defined body of content consisting of topics and processes. For instance, a grammar test contains grammatical points to be tested such as infinitive, gerunds, modals, tenses, etc. 3. Empirical validity Empirical validity describes how closely scores on a test correspond (correlate) with behavior as measured in other contexts. Moreover, we can say an instrument has empirical validity when it is had tested. Example: Students' scores on a test of academic aptitude may be compared with their school grades (a commonly used criterion). Empirical evidence of this kind of ability can be differentiated into two based on the time for data collection of the external measure. There are concurrent and predictive validity. Concurrent validity/simultaneous validity: if the results are supported by other concurrent performance beyond the assessment itself. On the other word, it is when the score in a test related to another score that was made. Predictive validity: which means to assess and predict test takers’ prospect in the future life. 4. Construct validity Based on (Grondlund, 1985:72), construct validity is ‘. . . the extent to which the test performance can be interpreted in terms of on or more psychological construct.’ Examination on construct validity requires a complex process. Wiersma and Jurs (1990:193) states that there are two stages required in the examination of construct validity. These are logical analysis and empirical analysis. Wash back To know the effect between the learning activity of the students with teaching learning activity of the teacher. Test that influence learning of the students and teaching learning activity of the teacher has good wash back validity. Some factors that influence validity instruments: Based on Sukardi (2009:38), there are some factors that can influence the evaluation test become not valid such as internal factors, external factors and factors from the students them self. 1. The internal factors from the test: The instruction is not clear, so it can decrease the test validity. The words that are used in the structure of instrument evaluation are too difficult. The construction of the test items is not good. The difficulties level of the test items is not appropriate. The time allocated is not appropriate. The test items are not representing the content of the materials. The answers of the questions can be predicted by the students. 2. The external factors: The time allocated is not enough for the students. The assessment is not consistent. There is another person from outside helping the student to do the test. 3. The factors from the students them self. The wrong interpretation from the students. The students cannot concentrate well. ITEM ANALYSIS 1. 2. 3. 4. 1) 2) • • Item analysis is a process which examines student responses to individual test items (questions) in order to assess the quality of those items and of the test as a whole. PURPOSE To improve items which will be used again in later tests. To eliminate ambiguous items in a single test administration. To increase instructors' skills in test construction. To identify specific areas of course content which need greater emphasis or clarity. Method ITEM DIFFICULTY ITEM DISCRIMINATION ITEM CHARACTERISTIC CURVE 3. Item validity (point biserial method) 4. Effectiveness of distractors . ITEM DIFFICULTY item difficulty is determined by the number of people who answer a particular test item correctly (p). For example, if the first question on a test was answered correctly by 76% of the class, then the difficulty level (p or percentage passing) for that question is p = .76. If the second question on a test was answered correctly by only 48% of the class, then the difficulty level for that question is p = .48. The higher the percentage of people who answer correctly, the easier the item, so that a difficulty level of .48 indicates that question two was more difficult than question one, which had a difficulty level of .76. Method of Item Difficulty a) Method for Dichotomously Scored Item b) Method for Polytomously Scored Item Grouping Method a). Method for Dichotomously Scored Item R N P p is the difficulty of a certain item. R is the number of examinees who get that item correct. N is the total number of examinees. Example 1 There are 80 high school students attending a science achievement test, and 61 students pass item 1. Please calculate the difficulty for item 1. Answer : 0.7 b). Method for Polytomously Scored Items P X X max X the mean of total examinees’ scores on one item X max, the perfect scores of that item • Example 2 The perfect scores of one open- ended item is 20 points, the average score of total examinees on this item is 11 points. What is the item difficulty? Answer : 0.55 c). Grouping Method (Use of Extreme Groups) Upper (U) and Lower (L) Criterion groups are selected from the extremes of distribution of test scores or job ratings P PU PL 2 PU is the proportion for examinees of upper group who get the item correct. • • PL is the proportion for examinees of lower group who get the item correct. Example 3 There are 370 examinees attending a language test. Known that 64 examinees of 27% upper extreme group pass item 5, and 33 examinees of 27% lower extreme group pass the same item. Please compute the difficulty of item 5. Answer : 0.49 II. ITEM DISCRIMINATION Item discrimination refers to the degree to which an item differentiates correctly among test takers in the behavior that the test is designed to measure. item discrimination determines whether those who did well on the entire test did well on a particular item. Method of Item Discrimination a). Index of Discrimination D = pH - pL (7.5) We need to set one or two cutting scores to divide the examinees into upper scoring group and lower scoring group. pH is the proportion in the upper group who answer the item correctly and pL is the proportion in the lower group who answer the item correctly. Values of D may range from -1.00 to 1.00. ILLUSTRATION BY GGS INFORMATION SERVICES. CENGAGE LEARNING, GALE. Another example 50 Examinees’ Test Data on 8-Item Scale About Job Stress. Question: There are 140 students attending a world history test. (1) If we use the ratio 27% to determine the upper and lower group, then how many examinees are there in the upper and lower group separately? (2)If 18 examinees in upper group answer item 5 correctly, and 6 examinees in lower group answer it correctly, then calculate the discrimination index for item 5 ITEM VALIDITY :POINT-BISERIAL METHOD ITEM VALIDITY :POINT-BISERIAL METHOD ILLUSTRATION BY GGS INFORMATION SERVICES. CENGAGE LEARNING, GALE. • Another way to determine the discriminability of an item is to determine the correlation coefficient between performance on an item and performance on a test, or the tendency of students selecting the correct answer to have high overall scores. POINT-BISERIAL METHOD Distractor (Incorrect Alternatives) • Analyzing the distractors (e.i., incorrect alternatives) is useful in determining the relative usefulness of the decoys in each item. Items should be modified if students consistently fail to select certain multiple choice alternatives. The alternatives are probably totally implausible and therefore of little use as decoys in multiple choice items. A discrimination index or discrimination coefficient should be obtained for each option in order to determine each distractor's usefulness . Stages of Test Construction There is a set general procedure for test constructions. 1. Statement of the problem. In constructing a test, a test maker has to make sure about what he/she wants to know and for what purpose. The following questions have to be answered: a. What kind of test is it to be? ( Achievement/proficiency/diagnostic/placement test) b. What is the precise purpose? c. What abilities are to be tested? d. How detailed must the results be? e. How accurate must the results be? f. How important is backwash? g. What constraints are set by unavailability of expertise, facilities, time (for construction, administration and scoring) 2. Providing a solution to the problem. After the problems are clear, then steps can be taken to solve it. Efforts should be made to gather information on tests that have been designed for similar situation. 3. Writing specifications for the test The first form that the solution takes is a set of specifications for the test. This will include information on: content, format and timing, criteria levels of performance and scoring procedures. a. Content This refers not to the content of a single, particular version of a test, but to the entire potential content of any number of versions. Sample of this content will appear in individual versions of the test. - Operations: The tasks that candidates have able to carry out. For a reading test, these might include, for example: scan text to locate specific information, guess meaning of unknown words from context. - Types of Text For writing test, these might include: letters, forms, academic essay up to pages in length. - Addressees This refers to the kinds of people that the candidate is expected to be able to write or speak to (for example native speakers of the same age and status): or the people for whom reading and listening materials are primarily intended ( for example nativespeaker university students) - Topics Topics are selected according to suitable for the candidate and the type of test. b. Format and Timing This should specify test structure (including time allocated to components) and item types/elicitation procedures, with examples. It should state what weighting is to be assigned to each component. It should also say how many passages will normally be presented (in the case of reading or listening) or required (in the case of writing), how many items there will be in each component. c. Criteria Level of Performance The required levels of performance for (different levels of) success should be specified. This may involve a simple statement to the effect that, to demonstrate “mastery”, 80% of the items must be responded to correctly. It may be more complex: The Basic level oral interaction specifications of the Royal Society of Arts (RSA) Test of the Communicative Use of English as a Foreign Language will serve as an example. These refer to accuracy, appropriacy, range, flexibility and size, Thus: BASIC DESCRIPTORS Accuracy Pronunciations may be heavily influenced by L1 and accented through general intelligible. Any confusion caused by grammatical/lexical errors can be classified by the candidate Appropriacy Use of language broadly appropriate to function, through no subtlety should be expected. The intention of the speaker can be perceived without excessive effort. Range Severely limited range of expression is acceptable. My often have to search for a way to convoy the desired meaning Flexibility Need not usually take the initiative in conversation. My take time to respond to a change of topic. Interlocutor may have to make considerable allowances and often adopt a supportive role Size Contributions generally limited to one or two simple utterances are acceptable. d. Scoring Procedures These are most relevant where scoring will be subjective. The test constructors should be clear as to how they will achieve high scorer rebility. 4. Writing the test a. Sampling It is most unlikely that everything found under the heading of “Content” in the specifications can be included in any one version of the test. Choices have to be made. For content validity and for beneficial backwash, the important thing is to choose widely from the whole area of content. One should not concentrate on those elements known to be easy to test. Succeeding versions of the test should also sample widely and unpredictably. b. Item writing and moderation The writing of successful items is extremely difficult. No one can expect to be able consistently to produce perfect items. Some items will have to be rejected, others reworked. The best way to identify items that have to be improved or abandoned is through teamwork. Colleagues must really try to find fault, and despite the seemingly inevitable emotion attachment that item writers develop to items that they have created, they must be open to, and ready to accept, the criticism that are offered to them. Good personal relations are a desirable quality in any test writing team. Moderation is a process/an attempt should be made to administer the test to native speakers of a similar educational background to future test candidates. These native speakers should score 100%, or close to it. Items that prove difficult for comparable native speakers almost certainly need version or replacement. 5. c. Writing and moderation of scoring key Once the items have been agreed, the next task is to write the scoring key where this is appropriate. Where there is intended to be only one correct response, this is perfectly straightforward matter. Where there are alternative acceptable responses, which may be awarded different scores, or where partial credit may be given for incomplete responses, greater care is necessary. Once again, the criticism of colleagues should be sought as a matter of course. Pretesting Pretesting is needed although careful moderation has been administered. There are likely to be some problem with every test. It is obviously better if there problem can be identify before the test is administrated to the group for which it is intended. The aim should be to administer it first to a group as similar as possible to the one for which it is really intended. Problem in administration and scoring are noted. The reliability coefficients of the whole test and of its components are calculated, and individual items are analyzed. 6. Validation of the test Validity in particular test usually refers to criterion related to validity. We are looking for empirical evidence that the test will perform well against some criterion. For example: The achievement test might be validated against the ratings of students by their current language teachers and by their future subject teachers soon after the beginning of their academic courses. Scoring, Grading, Test-Score Interpretation Scoring is a process to utilizing a number to represent the responses made by the test taker The score is basically raw (raw score) because in order for the score to be meaningful, further analyses are required. Types of scoring Dichotomous vs Continuous Scoring Dichotomous Scoring entails viewing and treating the response as either one of two distinct, exclusive categories. Example: scoring in multiple choice, true-false, and correct-incorrect (1 is assigned to a correct answer; 0 to an incorrect one) Continuous Scoring views and treats the test taker’s response as being graded in nature. Example: speaking and writing test ( may be scored as 1,2,3,4,5 in term of fluency for speaking test) Holistic, Primary Trait and Analytic Scoring Holistic Scoring considers the test taker’s response as a whole totality rather than as consisting of fragmented part. Example: speaking test will be scored used Test of Spoken English (TSE) scoring guide Primary trait scoring focused on one specific type of features or traits that the test takers need to demonstrate. For example, in writing, the teacher scored only the content of the writing product. For example, in speaking, the teacher scored only the clarity in expressing the idea. Analytic scoring emphasizes on individual points or components of the test takers’ response. Linguistic and non linguistic features are both important to be scored. It is classified based on how the test taker’s response is viewed and treated. Grading Grades reflect the standard to have the weighting system of a quality. In grading, you can include both achievement aspect and non-achievement aspect If you pre-specify standards of performance on a numerical point system, you are using an absolute system of grading. For example, having established points for a midterm test, points for a final exam, and points accumulated for the semester, are set by the institution Relative grading is usually accomplished by ranking students in order of performance (percentile ranks) and assigning cut-off points for grades. It is allowing your own interpretation and of adjusting for unpredicted ease or difficulty of a test. Test Interpretation Steps: 1. Frequency Distribution 2. Measures of Central Tendency 3. Measures of Variability 4. Item Analysis Frequency Distribution Showing the number of students who obtained mark awarded. From 30 students, there are 6 students get 100. Measures of Central Tendency Indicators of how well most students perform in the group or in brief, indicators of group performance. Mean Median Mode Measures of Variability Indicators of homogenity or heterogenity of a group. Range Standard deviation Variance Item Analysis A process which examines students’ responses to individual test items (questions) in order to assess the quality of those items and of the tests as a whole. Characteristics of Authentic Assessment 1. Include task to perform and rubric for scoring the performance. Accomplishing this approach, teacher should always give task to be accomplished in every meeting. So does the teacher who should always make a scoring rubric for assessing the students’ performance (group work and individual). There are 3 kinds of rubric: Holistic Analytic Primary three 2. Construct the students’ response. The teachers help/guide the students to construct their way of thinking, not just analyzing, identifying, and stating something. 3. Higher Order Thinking (HOT). the task given is expected to make the students to use the higer order thinking (analyzing, evaluating, and creating). thus, the students will be more creative and construct their own knowledge. 4. Integrated skill (not integrative). Combine some language skills in one assessing time. This is different with integrative approach which integrates some language components in one assessing time. 5. Process and product equally consider important. In this approach, the important thing is not only the product, but also the process for reaching the product. 6. Depth information for the teacher. By using the rubric, the teacher will get the complete information about the students’ skill, their weakness and strength in facing the lesson. From that information, the teacher can construct the best way to help the students to face the challenge. The Benefit for the Students The students will know how well their mastery toward the learning material. The students will know and strengthen their mastery in some skill. The students will connect their learning with their experiences, world, and society in wider scope. The students will sharpen their skill in high order thinking. The students will have responsibilities and choices. The students will cooperate with other students. The students will learn to measure the level of their performances. Variety 1. Interview This is actually the speaking test in the performance-based assessment. The teacher can ask some question to the students while the lesson is going. 2. Retelling Different from the previous kind of test, this test is more casual and usually done in daily activity. Interviewing the students is usually done in every end of the chapter (it is like chapter test), while retelling is usually done in every lesson (if needed). This kind of test is usually done for assessing the activeness of the students. This is done after the students observing something, hearing some explanation, listened to something, and gain some conclusion based on their own understanding. Three things to be evaluated on retelling: Students’ speaking skill. Organization text order. Students’ response to the text. 3. Composing Composing means making an essay (narrative, descriptive, etc.) or any report in written form. There are two approaches for this assessment: Global Approach In this kind of assessment, the teacher focuses in the whole of the essay. This way of assessing is good but the teacher should give the detail score for each part then accumulate it to reach the total score which will give us the score for whole assignment. Component Approach Different with the Global Approach, this assessing, only focus in one particular skill or aspect. For example: in assessing an essay, the teacher only focuses on the idea that being generated and pays no attention to the generic structure or the vocabulary used (diction). 4. Project/Exhibition The teacher can ask students to make a project (group or individual) and hold an exhibition so that they will feel appreciated and will work harder and serious for their own project. 5. Experiment this experiment is actually a simulation in doing something. in English class, this experiment is simply the teaching of procedure text. thus, the students ask to do something (simply a simple experiemnt that is popular and easy so that it will not trouble you) and explaining with procedure demonstrating or perform the procedure of something. this assessment is actually a practice to do an experiment since the students in Indonesia is rarely deal with an experiment and turn out difficult in the end of college year. experiment assessment cannot stand alone, this assessment is actually part of larger activity. 6. Written/Spoken Response the students ask to write or perform the report after do some experiment or observation. then, the teacher give some questions (observasing or comprehending question) and after the students finish answering all the question, the teacher and students can discussing it. this assessment is also the part of larger activity. 7. Observation there is two kinds of observation: spontaneous observation observation without complicated preparation/plan. structured observation observation with plan or preparation. 8. Portfolio collecting students’ the best work and compile it. this assessmnet is used to gain the information of students’ development from time to time. . .