PERFECT NA FINALS CUTIE! REVIEWER IN ASL students got the correct answer and more students mastered the content measured by that item. UNIT 3: ITEM ANALYSIS, RELIABILITY AND VALIDITY ITEM ANALYSIS Item analysis is a process of examining the student’s response to individual item in the test. It consists of different procedures for assessing the quality of the test items given to the students. Through the use of item analysis, we can identify which of the given are good and defective test items. Good items are to be retained and defective items are to be improved, to be revised or to be rejected. USES OF ITEM ANALYSIS 1. Item analysis data provide a basis for efficient class discussion of the test results. 2. Item analysis data provide a basis for remedial work. 3. Item analysis data provide a basis for general improvement of classroom instruction. 4. Item analysis data provide a basis for increased skills in test construction. 2. DISCRIMINATION INDEX It is the power of the item to discriminate the students who know the lesson and those who do not know the lesson. It also refers to the number of students in the upper group who got an item correctly minus the number of students in the lower group who got an item correctly. Divide the difference by either the number of the students in the upper group or number of students in the lower group or get the higher number if they are not equal. Discrimination index is the basis of measuring the validity of an item. This index can be interpreted as an indication of the extent to which overall knowledge of the content area or mastery of the skills is related to the response on an item. The formula used to compute for the discrimination index is: where: 5. Item analysis procedures provide a basis for constructing test bank. DI = discrimination index value; TYPES OF QUANTITATIVE ITEM ANALYSIS CUG = number of the students selecting the correct answer in the upper group; 1. DIFFICULTY INDEX It refers to the proportion of the number of students in the upper and lower groups who answered an item correctly. The larger the proportion, the more students, who have learned the subject is measured by the item. To compute the difficulty index of an item, use the formula: where: DF = difficulty index; n = number of the students selecting item correctly in the upper group and in the lower group; and N = total number of students who answered the test. LEVEL OF DIFFICULTY To determine the level of difficulty of an item, find first the difficulty index using the formula and identify the level of difficulty using the range given below. The higher the value of the index of difficulty, the easier the item is. Hence, more CLG = number of students selecting the correct answer in the lower group; and D = the number of students in either the lower group or upper group TYPES OF DISCRIMINATION INDEX 1. Positive discrimination happens when more students in the upper group got the item correctly than those students in the lower group. 2. Negative discrimination occurs when more students in the lower group got the item correctly than the students in the upper group. 3. Zero discrimination happens when several students in the upper group and lower group who answer the test correctly are equal, hence, the test item cannot distinguish the students who performed in the overall test and the students whose performance are very poor. LEVEL OF DISCRIMINATION Ebel and Frisbie (1986) as cited by Hetzel (1997) recommended the use of the Level of Discrimination of an Item for easier interpretation. not know the correct answer. The attractiveness of the incorrect options is determined when more students in the lower group than in the upper group choose it. Analyzing the incorrect options allow the teachers to improve the test items so that it can be used again in the future. DISTRACTER ANALYSIS STEPS IN SOLVING DIFFICULTY INDEX AND DISCRIMINATION INDEX 1. Arrange the scores from highest to lowest. 2. Separate the scores in the upper group and lower group. There are different methods to do this: (a) if a class consists of 30 students who takes an exam, arrange their scores from highest to lowest, then divide them into two groups. The highest score belongs to the upper group. The lowest score belongs to the lower group. (b) Other literature suggested to use 27%, 30%, or 33% of the students for the upper group and lower group. (c)However, in the Licensure Examination for Teachers (LET) the test developers always used 27% of the students who participated in the examination for the upper and lower groups. 3. Count the number of those who chose the alternatives in the upper and lower group for each item and record the information using the template. Note: Put asterisk for the correct answer 4. Compute the value of the difficulty index and the discrimination index and also the analysis of each response in the distracters. 5. Make an analysis for each item. 3. ANALYSIS OF RESPONSE OPTIONS Aside from identifying the difficulty index and discrimination index, another way to evaluate the performance of the entire test item is through the analysis of the response options. It is very important to examine the performance of each option in a multiple-choice item. Through this, you can determine whether the distracters or incorrect options are effective or attractive to those who do 1. Distracter. It is the term used for the incorrect options in the multiple-choice type of test while the correct answer represents the key. 2. Mis keyed item. The test item is a potential miskey if there are more students from the upper group who choose the incorrect options than the key. 3. Guessing item. Students from the upper group have equal spread of choices among the given alternatives. 4. Ambiguous item. This happen when more students from the upper group choose equally an incorrect option and the keyed answer. HOW TO IMPROVE THE TEST ITEM Example 1. A class is composed of 40 students. Divide the group into two. Option B is the correct answer. Based from the given data on the table, as a teacher, what would you do with the test item? 1. Compute the difficulty index. N = 10 + 4 = 14 N = 40 2. Compute the discrimination index CUG = 10 CLG = 4 D = 20 3. Make an analysis about the level of difficulty, discrimination and distracters. a. Only 35% of the examinees got the answer correctly, hence, the item is difficult. b. More students from the upper group got the answer correctly, hence, it has positive discrimination. c. Retain options A, C, and E because most of the students who did not perform well in the overall examination selected it. Those options attract most students from the lower group 3. Conclusion: Retain the test item but change option D, make it more realistic to make it effective for the upper and lower groups. At least 5% of the examinees choose the incorrect option. Example 2. Below is the result of an item analysis for a test item in Mathematics. Are you going to reject, revise or retain the test item? 1. Compute the difficulty index. N=4+3=7 N = 39 2. Compute the discrimination index CUG = 4 CLG = 3 D = 20 3. Make an analysis about the level of difficulty, discrimination and distracters. a. Only 18% of the examinees got the answer correctly, hence, the item is very difficult. b. More students from the upper group got the answer correctly, hence, it has a positive discrimination of 5% c. Students respond about equally to all alternatives, an indication that they are guessing. d. If the test item is well-written but too difficult, reteach the material to the class. 4. Conclusion: Reject the item because it is very difficult and the discrimination index is very poor, and option A and B are not effective distracters. Example 3. A class is composed of 50 students. Use 27% to get the upper and the lower groups. Analyze the item given the following results. Option D is the correct answer. What will you do with the test item? 1. Compute the difficulty index. N = 6 + 4 = 10 N = 28 2. Compute the discrimination index CUG = 6 CLG = 4 D = 14 3. Make an analysis about the level of difficulty, discrimination and distracters. a. Only 36% of the examinees got the answer correctly, hence, the item is difficult. b. More students from the upper group got the answer correctly, hence, it has a positive discrimination of 14% c. Modify options B and E because more students from the upper group chose them compare with the lower group, hence, they are not effective distracters because most of the students who performed well in the overall examination selected them as their answers. d. Retain options A and C because most of the students who did not perform well in the overall examination selected them as the correct answers. Hence, options A and C are effective distracters. 4. Conclusion: Revised the item by modifying options B and E. d. Retain options A and C because most of the students who did not perform well in the overall examination selected them as the correct answers. Hence, options A and C are effective distracters. 5. Conclusion: Revised the item by modifying options B and E. TEST RELIABILITY Reliability refers to the consistency with which it yields the same rank for individuals who take the test more than once (Kubiszyn and Borich, 2007). That is, how consistent test results or other assessment results from one measurement to another. A test is reliable when it can be used to predict practically the same scores when test administered twice to the same group of students and with a reliability index of 0.60 or above. The reliability of a test can be determined by means of Pearson Product Moment of Correlation, spearman-Brown Formula, Kuder-Richardson Formulas, Cronbach’s Alpha, etc. FACTORS AFFECTING RELIABILITY OF A TEST 1. Length of the test 2. Moderate item difficulty 3. Objective scoring 4. Heterogeneity of the student group 5. Limited time METHODS OF ESTABLISHING RELIABILITY OF A TEST 1. TEST-RETEST METHOD A type of reliability determined by administering the same test twice to the same group of students with any time interval between the tests. The results of the test scores are correlated using the Pearson Product Correlation Coefficient (r) and this this 2. 3. 4. correlation coefficient provides a measure of stability. This indicated how stable the test result over a period. EQUIVALENT/PARALLEL/ALTERNATE FORMS A type of reliability determined by administering two different but equivalent forms of the test to the same group of students in close succession. The equivalent forms are constructed to the same set of specification that is similar in content, type of items and difficulty. The results of the test scores are correlated using the Pearson Product Correlation Coefficient and this correlation coefficient provides a measure of the degree to which generalization about the performance of students from one assessment to another assessment is justified. It measures the equivalence of the tests. SPLIT-HALF METHOD Administer test once and score two equivalent halves of the test. To split the test into halves that are equivalent, the usual procedure is to score the even-numbered and the odd-numbered test item separately. This provides two scores for each student. The results of the test scores are correlated using the Spearman-Brown formula and this correlation coefficient provides a measure of internal consistency. It indicates the degree to which consistent results are obtained from two halves of the test. KUDER-RICHARDSON FORMULA Administer the test once. Score the total test and apply the Kuder Richardson (KR) formula. The KR-20 formula is applicable only in situations where students’ responses are scored dichotomously, and therefore, is most useful with traditional test items that are scored as right or wrong, true or false, and yes or no type. KR-20 formula estimates of reliability provide information whether the degree to which the items in the test measure is of the same characteristic, it is an assumption that all items are of equal in difficulty. Another formula for testing the internal consistency of a test is the KR-21 formula, which is not limited to test items that are scored dichotomously RELIABILITY COEFFICIENT Reliability coefficient is a measure of the amount of error associated with the test scores. Reliability Coefficient has the following description: (a) The range of the reliability coefficient is from 0 to 1.0; (b) The acceptable range value is 0.60 or higher; (c) The higher the value of the reliability coefficient, the more reliable the overall test scores; (d) Higher reliability indicates that the test items measure the same thing. INTERPRETING RELIABILITY COEFFICIENT 1. The group variability will affect the size of the reliability coefficient. Higher coefficient results from heterogeneous groups than from the homogeneous groups. As group variability increases, reliability goes up. 2. Scoring reliability limits test score reliability. If tests are scored unreliable, error is introduced. This will limit the reliability of the test scores. 3. Test length affects test score reliability. As the length increases, the reliability tends to go up. 4. Item difficulty affects test score reliability. As test items become very easy or very hard, the test’s reliability goes down. Using split-half method, is the test reliable? Show the complete solution. Example 1. Prof. Joel conducted a test to his 10 students in Elementary Statistics class twice after oneday interval. The test given after one day is exactly the same test given the first time. Scores below were gathered in the first test (FT) and second test (ST). Using test-retest method, is the test reliable? Show the complete solution. Steps: 1. Use the Pearson Product Correlation Coefficient Formula to solve for r 2. Find the reliability of the original test using the formula: 3. Analysis: The reliability coefficient using Brown formula is 0.50, which is questionable reliability. Hence, the test items should be revised. Example 3. Ms. Tan administered a 40-item test in English for her Grade VI pupils in UEPLES. Below are the scores of 15 pupils, find the reliability using the Kuder-Richardson formula. Analysis: The reliability coefficient using the Pearson r = 0.91 means that it has a very high reliability. The scores of the 10 students conducted twice with one-day interval are consistent. Hence, the test has a very high reliability. Example 2. Prof. Glenn conducted a test to his 10 students in his Chemistry class. The test was. given only once. The scores of the students in odd and even items below were gathered, (O) odd items and (E) even items. Steps: 1. Solve the mean and the standard deviation of the scores using the given table. 2. CONTENT VALIDITY. A type of validation that refers to the relationship between test and the instructional objectives, establishes content so that the test measures what is supposed to. measure. Things to remember about validity: a. The evidence of the content validity of a test is found in the Table of specification. b. This is the most important type of validity for a classroom teacher. c. There is no coefficient for content validity. It is determined by experts judgmentally, not empirically. 2. Solve for Standard Deviation using the formula: 3. Solve for the Mean using the formula: 4. Solve the reliability coefficient using the KuderRichardson formula 5. Analysis: The reliability coefficient using KR-21 formula is 0.90 which means that the test has a very good reliability. Meaning, the test is very good for a classroom test. TEST VALIDITY Validity is concerned whether the information obtained from an assessment permits the teacher to make a correct decision about a student’s learning. This means that the appropriateness of score-based inferences or decisions made are based on the students’ test results. Validity is the extent to which a test measures what it is supposed to measure. TYPES OF VALIDITY 1. FACE VALIDITY It is the extent to which a measurement method appears “on its face” to measure the construct of interest. Face validity is at best a very weak kind of evidence that a measurement method is measuring what it is supposed to. One reason is that it is based on people’s intuitions about human behavior, which are frequently wrong. It is also the case that many established measures in psychology work quite well despite lacking face validity. 3. CRITERION-RELATED VALIDITY A type of validation that refers to the extent to which scores from a test relate to theoretically similar measures. It is a measure of how accurately a student’s current test score can be used to estimate a score on a criterion measure, like performance in courses, classes, or another measurement instrument. For example, the classroom reading grades should indicate similar levels of performance as Standardized Reading test scores. a. Concurrent validity The criterion and the predictor data are collected at the same time. This type of validity is appropriate for tests designed to assess a student’s criterion status or when you want to diagnose student’s status; it is a good diagnostic screening test. It is established by correlating the criterion and the predictor using Pearson Product Correlation Coefficient and other statistical tools correlations. b. Predictive validity A type of validation that refers to a measure of the extent to which student’s current test result can be used to estimate accurately the outcome of the student’s performance at later time. It is appropriate for tests designed to assess students’ future status on a criterion. Regression analysis can be sued to predict the criterion of a single predictor or multiple predictors. 4. CONSTRUCT VALIDITY A type of validation that refers to the measure of the extent to which a test measures a theoretical and unobservable variable qualities such as intelligence, math achievement, performance anxiety, and the like, over a period of time on the basis of gathering evidence. It is established through intensive study of the test or measurement instrument using convergent/divergent validation and factor analysis. There are other ways of assessing construct validity like test’s internal consistency, developmental change and experimental intervention. a. Convergent validity is a type of construct validation wherein a test has a high correlation with another test that measures the same construct. b. Divergent validity is type of construct validation wherein a test has low correlation with a test that measures a different construct. In this case, a high validity occurs only when there is a low correlation coefficient between the tests that measure different traits. c. Factor analysis assesses the construct validity of a test using complex statistical procedures conducted with different procedures. IMPORTANT THINGS TO REMEMBER ABOUT VALIDITY 1. Validity refers to the decisions we make, and not to the test itself or to the measurement. 2. Like reliability, validity is not an all-or-nothing concept; it is never totally absent or absolutely perfect. 3. A validity estimate, called a validity coefficient, refers to specific type of validity. It ranges between 0 and 1. 4. Validity can never be finally determined; it is specific to each administration of the test. FACTORS AFFECTING THE VALIDITY OF A TEST ITEM 1. The test itself. 2. The administration and scoring of a test. 3. Personal factors influencing how students’ response to the test. 4. Validity is always specific to a particular group. VALIDITY COEFFICIENT The validity coefficient is the computed value of the rxy. In theory, the validity coefficient has values like the correlation that ranges from 0 to 1. In practice, most of the validity scores are usually small and they range from 0.3 to 0.5, few exceed 0.6 to 0.7. Hence, there is a lot of improvement in most of our psychological measurement. Another way of interpreting the findings is the squared correlation coefficient (rxy)2, this is called coefficient of determination. Coefficient of determination indicates how much variation in the criterion can be accounted for by the predictor Coefficient of determination = (r)2 = (0.94)2= 88.36% Interpretation: The correlation coefficient is 0.94, which means that the validity of the test is high, or 88.36% of the variance in the students’ performance can be attributed to the test. UNIT 4: ANALYSIS AND INTERPRETATION OF ASSESSMENT RESULTS NORMAL DISTRIBUTION OR BELL SHAPED CURVE METHODS OF THE PRESENTATION OF DATA Presentation of Data is described as an organized of information such as measurements, numbers, names, observations, etc. in a certain way. TEXTUAL PRESENTATION This is the technique in a graph form. In this technique, it is does not necessary mean that the presentation contains of words only but figures can also be utilized as part of the presentation. TABULAR PRESENTATION This is another way in presenting data. In this technique, the data are summarized using tables. A table usually used is on the Frequency and Percentage Distribution. It is a table presenting the frequency and percentage sharing of nominal data. GRAPHICAL PRESENTATION There are many types of graphs such as line graph, bar graph, pictograph, pie chart, etc. DIFFERENT KINDS PRESENTATION OF GRAPHICAL LINE GRAPH. It shows associations between two or more sets of quantities. In this technique, the values are plotted using dots which are called “markers” to be connected together by line segments. SKEWED DISTRIBUTION OF TEST SCORE Positively Skewed Distribution (mean>median> mode) Negatively Skewed Distribution (mode>median>mean) Skewness is the degree of asymmetry, or departure from symmetry of a distribution. Skewed to the right: (positive skewness): if the frequency curve of a distribution has a longer “tail” to the left of the central maximum than to the right. Most scores are below the mean. Skewed to the left: (negative skewness): if the frequency curve of a distribution has a longer “tail” to the right of the central maximum than to the left. Most scores are above the mean and there are extremely low scores. BAR GRAPH. It is graphical method in which each value in the data is represented by rectangular bars. The length if the bars show the measure of a certain value which its width has a fixed size. PICTOGRAPH. This is a graphical technique that express its meaning through its pictorial similarity to a physical object. Each object used in pictograph stands for corresponding measure. PIE CHART. This is the type of graphical presentation in which a circle (or sometimes a cylinder) is divided into several parts with each parts typifying the categories of the data. KURTOSIS Kurtosis: is the degree of peakedness of a distribution, usually taken relative to a normal distribution. Leptokurtic: distribution having a relatively high peak. Platykurtic: a distribution having flat-topped Mesokurtic: a distribution which is moderately peaked. Examples of continuous variables are: Height, Mass, Distance, Weight, Temperature Continuous variables can be further categorized as either INTERVAL or RATIO variables INTERVAL VARIABLES: are variables for which their central characteristic is that they can be measured along a continuum and they have a numerical value. RATIO VARIABLES: are interval variables, but with the added condition that 0 (zero) of the measurement indicates that there is none of that variable. MEASURE OF CENTRAL TENDENCY MEAN The most commonly used measure of central tendency is the mean. B. QUANTITATIVE INTERPRETATION ANALYSIS AND 1. LEVELS OF MEASUREMENT 2. MEASURES OF CENTRAL TENDENCY 3. MEASURES OF VARIABILITY 4. MEASURES OF RELATIVE POSITION DEFINITION OF TERMS CATEGORICAL VARIABLES are also known as qualitative variables. Examples of categorical variables: GENDER (2 categories: male and female); MARITAL STATUS (3 categories: Single, married, separated); PHYSICAL ACTIVITY LEVEL (e.g., 4 categories: sedentary, low, moderate, and high). A categorial variable can be further categorized as NORMAL, DICHOTOMOUS, OR ORDINAL. NORMALVARIABLE: if the categories cannot be ordered or ranked. DICHOTOMOUS VARIABLES: are nominal variables with only two categories. ORDINAL VARIABLE: if the categories can be ranked or ordered. CONTINUOUS VARIABLES are also known as quantitative variables. The arithmetic mean, or simply the mean is the sum of all the given values or items in a distribution divided by the number of observations. The mean is commonly called the average. It is ̅ (reads "x bar"). denoted by 𝑿 There are some cases when values are given more importance than others. The mean derived in this case is known as the weighted arithmetic mean or weighted mean. MEDIAN The median is the middle value in the distribution after arranging the data either in ascending or descending order. Half of the observations belong to the higher 50% of the group, while the other half belongs to the lower 50% of the group. It is denoted by "md" (reads "median"). Or 𝑥̃ Midpoint of a distribution; Used when the distribution is skewed/ most stable MODE The mode is the simplest measure of central tendency. It can be easily identified by inspection by getting the score or item which occurs most frequently. A set of data with only one mode is called unimodal. A set of data with two modes is bimodal, with three is trimodal, and with many modes, multimodal. In some instances, the mode may not even exist at all. MEASURES OF VARIABILITY Indicate or describe how spread the scores are. The larger the measure of variability the more spread the score are and the group is said to be heterogeneous; the smaller the less spread the scores are and the group is said to be homogeneous. Opposite of the mean, unreliable/unstable; Used as quick description in term of average/typical performance of the group. COMPUTATION OF UNGROUPED DATA THE MEAN FOR ARITHMETIC MEAN ̅ = Formula: 𝑿 𝜮𝑿 𝒏 Example: The following are the ages of six senior citizens: 67, 72, 63, 65, 70, 68 INTERPRETATION The two sets of scores have the same mean, median, and mode. But if you will look closely at the scores, you will notice that the values in set A are less spread compared to the values in set B. WEIGHTED MEAN The formula is: ΣwX X = Σw where: X = represents each of the item values w = represents the weight of each item value Compute the weighted mean grade This shows that computing the measures of central tendency will not give us all the features or characteristics of a given set of data. Other measures can provide other information about the data, and these are the measures of variability. The measures of variability tell us how the data are spread out or dispersed around the center. The values are more clustered around the center if the computed measure of variability is small. COMPUTATION OF UNGROUPED DATA THE MEAN FOR WEIGHTED MEAN 𝑥̅ = Σ𝑤𝑋 3(1.5) + 5(1.25) + 3(1.75) + 3.(2.0) 22 = = ∑𝑤 14 3+5+3+3 = 1.57 Interpretation: The average grade of the student for the four subjects is 1.57. On the other hand, a high measure of variability indicates that the values fall farther from the center. Measures of variability are also called measures of variation and measures of dispersion. The most common measures of variability are the following: 1. 2. 3. 4. 5. Range Quartile deviation or semi-interquartile range, Mean or average deviation, Variance, and Standard deviation. RANGE The range is the simplest measure of variability but the most unstable because its value quickly fluctuates when there is a change in either the lowest or highest value. It is easily affected by outliers (extremely small or extremely large values). It does not give the dispersion or the spread of the values between the highest and the lowest value. The range is the difference between the highest and the lowest value. The formula is: H-L =R INTERQUARTILE RANGE A measure of variability that is not influenced by the presence of outliers in the data is the interquartile range. This measure considers the spread of the middle 50 percent of data points falling between Q1 and Q3. It gives the spread of the scores around the median (Q2). The interquartile range (IQR) is simply the difference between the third and first quartiles corresponding to the 75th and 25th percentiles. The formula is: MEAN DEVIATION (MD) Interquartile Range = 3rd Quartile − 1st Quartile IQR = Q3 − Q1 or IQR = P75 − P25 A Z-score expresses this value in standard deviation units. The mean deviation (MD) or the average deviation (AD) is the sum of the absolute deviations of each value from the mean divided by the total number of observations in the distribution. This measure shows the spread of the distribution around the mean. STANDARD SCORE A measure of relative position which is appropriate when the data represent an interval or ratio scale; a z score express how far a score is from the mean in terms of standard deviation units; allows all scores from different tests to be compared; In cases of negative values transform z scores to T scores (multiply z score by to plus 50) Z-VALUE A raw score by itself is rather meaningless. What gives the score meaning is its deviation from the mean 𝑧= 𝑋−𝑋̅ 𝜎 𝑋 = Score 𝑋̅ = Mean 𝜎 = SD = Standard Deviation EXAMPLE QUARTILE DEVIATION The quartile deviation is also called the semiinterquartile range. It is the dispersion of the middle 50% of the data. It tells something about how data is dispersed around the median. It is computed as one half the difference between the third and first quartiles. In the formula, we have Quartile Deviation: 𝑃 −𝑃 𝐼𝑄𝑅 QD: 75 2 25 = 2 𝑄3 −𝑄1 𝐼𝑄𝑅 = 2 2 A correlation coefficient tells us the size (magnitude) and direction (positive or negative) of the relationship between two variables. It can range form −1.0 𝑡𝑜 + 1.0. The closer a coefficient to −1.0 𝑡𝑜 + 1.0 the higher or stronger the relationship. Positive correlation means. 1. High scores in Test A are associated with high scores in Test B. 2. Low score in Test A is associated with low score in Test B. Negative correlation means: 1. High scores in Test A are associated with low scores in Test B. 2. Low scores in Test A are associated with high scores in Test B. Below shows the coefficients of correlation between two variables as indicated by numerical values and their corresponding interpretation The student who got a score of 76 is -1.5SD below the mean. And there are 6.68% students got a score below 76. T-SCORES Tells the location of a score in a normal distribution having a mean of 50 and a standard deviation of 10. 𝑇 = 50 + 10𝑧 Example: A z score of 1.27 = 𝑇 = 50 + 10(1.27) = 50 + 12.70 = 62.70 = 63 CORRELATING TWO VARIABLES There are times that we want to know if a student who got score in reading will also obtain high score in mathematics. When we are concerned with the relationship between two measures, we are talking about correlation. To determine the extent of such relationship we use a correlation coefficient (symbolized by 𝑟) KUDER-RICHARDSON FORMULA 20 One way to determine whether the test items are homogenous or heterogenous is through the use of Kuder-Richardson Formula 20(KR20) The formula in computing data through KuderRichardson Formula 20 is given below. ∑ 𝑝𝑞 𝑛 𝑟𝑡𝑡 = (1 − ) 𝑛−1 𝑆𝐷 2 𝑟𝑡𝑡 = reliability of the total test 𝑛 = number of items 𝑝 = proportions/percentage of passing an item 𝑞 = proportions/percentage of failing an item ∑ 𝑝𝑞 = sum of the product of 𝑝 and 𝑞 𝑆𝐷 = Standard deviation of the total scores. EXAMPLE 𝑟𝑡𝑡 = ∑ 𝑝𝑞 𝑛 (1 − ) 𝑛−1 𝑆𝐷 2 𝑆𝐷 = 2.8 10 1.9117 𝑟𝑡𝑡 = 10−1 (1 − 2.82 ) 1.9117 𝑟𝑡𝑡 = 1.11 (1 − 7.84 ) 𝑟𝑡𝑡 = 1.11 (1 − 0.24) 𝑟𝑡𝑡 = 1.11 (0.76) 𝑟𝑡𝑡 = 0.84 very high correlation The coefficient of correlation using KR20 suggests that the test items are homogenous. POINTS TO REMEMBER 1. Correlation is the relationship between two variables characterized by its magnitude and direction. 2. A correlation coefficient is a numerical value which gives the size and direction of the relationship between two variables. 3. The direction of the correlation may either be positive or negative. 4. The magnitude (or size) of correlation may range from negligible to perfect relationship. 5. Two statistical tools used to compute the coefficient of correlation are the Pearson Product Moment Correlation Coefficient and Kuder-Richard Formula 20. FEEDBACKING AND ASSESSMENT RESULTS A. Qualitative Evaluation B. Constructive Feedbacking 1. Principles and characteristics 2. Strategies - Written feedback - Oral feedback C. Self-assessment D. Peer assessment COMMUNICATING A. QUALITATIVE EVALUATION Qualitative evaluation provides you with the ability to gain an in-depth understanding of a program or process. It involves the “why” and the “how” and allows a deeper look at issues of interest and to explore nuances. Collecting qualitative evaluation data requires the use of different tools than if you were focused only on gathering quantitative data. Some commonly used data collection methods for qualitative data include interviews, focus groups, document/material review, and ethnographic participation/observation. B. CONSTRUCTIVE FEEDBACKING Provide positive and constructive feedback to guide students’ learning and behavior: The purpose of feedback is to guide student learning and behavior and increase student motivation, engagement, and independence, leading to improved student learning and behavior. Effective feedback must be strategically delivered and goal directed; feedback is most effective when the learner has a goal and the feedback informs the learner regarding areas needing improvement and ways to improve performance. Feedback may be verbal, nonverbal, or written, and should be timely, contingent, genuine, meaningful, age appropriate, and at rates commensurate with the task and phase of learning (i.e., acquisition, fluency, maintenance). Teachers should provide ongoing feedback until learners reach their established learning goals PEER AND SELF-ASSESSMENT Peer and self-assessment, where students assess each other and themselves, can encourage students to take greater responsibility for their learning, for example, by encouraging engagement with assessment criteria and reflection of their own performance and that of their peers. Through this, students can learn from their previous mistakes, identify their strengths and weaknesses and learn to target their learning accordingly. Getting students to become more active in their learning in this way can help to alter the perception of learning as being a passive process whereby students listen to you and absorb the information in order to regurgitate during a subsequent assignment. If students are participants rather than 'spectators', they are more likely to engage with their learning IMPROVING TEST ITEMS: ITEM ANALYSIS ASSESSMENT DEVELOPMENT CYCLE ITEM ANALYSIS Once scores of students/test takers are obtained, the items are now ready for item analysis. Item analysis is a procedure used to identify good items by determining the index of difficulty and the index discrimination. Index of difficulty is the portion of students/test takers who answered the item correctly. Index of discrimination is the extent of which a test item can differentiate between good performers and poor performers One method that can be employed for item analysis is the U-L Index Method. The most commonly used U-L Index Method is the Upper and Lower 27%. The steps are as follows: 1. Score the test papers and arrange the total scores from highest to lowest. 2. Segregate the top and bottom 27% of the papers. 3. Tally the correct answers to each item by each student/test takers in the upper 27% group. 4. Repeat Step 3 but this time consider the lower 27% 5. Get the percentage of the upper group that obtained the correct answer and call this U. 6. Repeat Step 5 but this time consider the lower group and call this L. 7. Get the average percentage of U and L percentages. 8. Get the difference between U and L percentages. A good or retained item must have not acceptable index of difficult and discrimination index. The acceptable index of difficulty ranges from 0.41 .60 while the acceptable index of discrimination ranges from +.20 to 1.00 A fair or revised item contains either unacceptable difficulty or discrimination index. A poor or rejected item must possess both unacceptable difficulty and discrimination indices. It has to be discarded right away. The table below shows the result of a tryout test in Reading 6. Study the data presented in the table and answer the questions that follow. N=50 The table below shows the result of a tryout test in Reading 6. Study the data presented in the table and answer the questions that follow. N=50 The table below shows some fair items that need revision Table of Nonplausible of Distracters on the result of a tryout test in Reading 6 It shows that option D in item number 4 was chosen by 7 percent of the students in the lower group but none (0) in the upper group. If less than 3 percent of both/either group did not select that particular option, then that alternative must be revised because it is not attractive or plausible MODIFIED ITEM ANALYSIS FOR CRITERIONREFERENCED TEST There is another way to analyze test items. This is by comparing the percentage answering each item correctly on both pretest and posttest. It is assumed that most of students should obtain a low score on a pretest and a high score on a post test. If this happens, we can say that there is an improvement in the performance of students in a certain subject of learning area. The steps in comparing the percentage answering each item correctly on both pretest and posttest are as follows: 1. Get the percentage of the students passing on a pretest and the percentage of students passing each item on a posttest. Use the formula given below. 𝑝�𝑒�𝑟�𝑐�𝑒�𝑛�𝑡�𝑎�𝑔�𝑒� = 𝑁�𝑜�.� 𝑜�𝑓� 𝑠�𝑡�𝑢�𝑑�𝑒�𝑛�𝑡�𝑠� 𝑤�ℎ𝑜� 𝑎�𝑛�𝑠�𝑤�𝑒�𝑟�𝑒�𝑑� 𝑡�ℎ𝑒� 𝑖�𝑡�𝑒�𝑚� 𝑐�𝑜�𝑟�𝑟�𝑒�𝑡�𝑙�𝑦� 𝑇�𝑜�𝑡�𝑎�𝑙� 𝑛�𝑢�𝑚�𝑏�𝑒�𝑟� 𝑜�𝑓� 𝑠�𝑡�𝑢�𝑑�𝑒�𝑛�𝑡�𝑠� 𝑤�ℎ𝑜� 𝑡�𝑜�𝑜�𝑘� 𝑡�ℎ𝑒� 𝑡�𝑒�𝑠�𝑡� 2. Subtract the percentage obtained on a pretest from that on a posttest. The more positive the difference is obtained; the more effective instruction is on that particular objective being answered Take note that Item 2 shows no change in performance of students before and after instruction. Even without instruction, the students already mastered the particular objective. Item 4 has the highest percentage of difference (87%). It registered that a very marked improvement has taken place among the students in that particular objective before and after instruction. Item 1 and 3 also register marked improvement in the performance of the students before and after instruction. Item 5 shows that only a little improvement is noted in the performance of the students Another approach in modified item analysis is to determine the percentage of item answered in both pretest and posttest. This is finding the number of items each student failed on the pretest but passes on the posttest. The steps in using this approach are as follows: 1. Determine the number of items each student answered incorrectly on the pretest but answered correctly on posttest. 2. Get the sum of the counts in Step 1 for all examinees. Then divide by the number of examinees. 3. Divide the result from Step 2 by the number of the test. 4. Multiply the quotient obtained from Step 3 by 100. POINTS TO REMEMBER 1. Item analysis is one way to improve test items. 2. Item analysis is the process of determining good test items. 3. The U-L method employing 27 percent is the common way to analyze items quantitatively 4. Moderately difficult items are those with acceptable index of difficulty and discrimination index. 5. Moderately difficult items tend to increase the test validity and reliability. 6. A modified item analysis determines the percentage of items answered by examinees in both pretest and posttest T-TEST FOR PRETEST AND POSTTEST
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )