Validity EDUC 307 Chapter 3 What does this test tell me? Validity as defined by Chase: "A test is valid to the extent that it helps educators make an appropriate inference about a specified quality of a student." Validity as defined by Mertler: "... the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests." Validity as defined by Linn and Gronlund: "Validity is an evaluation of the adequacy and appropriateness of the interpretations and uses of assessment results." What is Validity? Validity is the soundness of your interpretations and uses of the students’ assessment results. A. Validity of a test must be stated in specific terms of a specific trait, assessments don't have generic validity. B. Validity is not an either/or situation. Tests have varying degrees of validity. C. To demonstrate validity, test makers, publishers and teachers, should develop assessment procedures that ask students to perform acts that are similar as possible to the skill about which educators are making a decision. Nature of Validity When using the term validity in relation to testing and assessment, keep these cautions in mind: A. Validity applies to the ways we interpret and use the assessment results, not to the assessment procedure itself. We sometimes say, "validity of a test", when it is more correct to say the validity of the interpretation and use to be made of the results. Nature of Validity B. Validity is a matter of degree; it does not exist on an all-or-one basis. Consequently, we should avoid thinking assessment results as valid or invalid. Validity is best considered in terms of categories that specify degree, such as high, moderate or low validity. Nature of Validity Validity is always specific to some particular use or interpretation. No assessment is valid for all purposes. For example, the results of an arithmetic test may have a high degree of validity for indicating computational skill, a low degree of validity for indicting arithmetical reasoning, a moderate degree of validity for predicting success in future mathematics courses and no validity for predicting success in art or music. C. Evidence of Validity There are standard procedures for determining the level of validity of an assessment. A. 3 types of validity: 1. Content Validity - how precisely the test samples a specified body of knowledge 2. Criterion Related Validity - how closely the test's results correspond with a defined criterion of performance. 3. Construct Validity - how closely the test relates to behavioral characteristics described by a psychological theory. Content Validity Content Validity- planned sampling of the content of a specified instructional program. 1. To be valid, a classroom test should reflect the instructional objectives in the unit relative to the emphasis in teaching it. 2. Validity is determined by the extent to which the test content samples the content of instruction. 3. Content validity must be considered for every assessment device. Criterion Related Validity Criterion Related Validity - is shown in the relationship of outcomes of an assessment device to a performance criterion. This relationship is typically shown by correlation procedures. Criterion Related Validity A. In assessment, correlation is used to show how closely test scores correspond to the ranking of the skill the test is designed to predict. 1. Correlation is a statistical procedure that shows how closely students' scores on one measure correspond in rank with scores on another. a. Correlation is represented by a correlation coefficient, a number that shows how closely the two sets of data correspond. (.00 to 1.0 positive correlation and .00 to -1.0 negative correlation) b. 1.0 or -1.0 almost nonexistent. Most run between .25 and .80 c. .85 to 1.0 is high correspondence (between test and performance) d. .00 to .29 is low correspondence (between the test and the performance.) e. with negative correlations, as test scores get higher, performance scores get lower. f. errors in prediction - predicted scores typically do not hit the actual ability of a student but are likely to be a little above or below it. g. Tests with correlation coefficients above .50 are better predictors. Construct Validation Construct Validation evaluating the correspondence between how a test assesses a trait and how a psychological theory or construct says it should assess it. Finding the Validity of Published Tests Finding the Validity of Published Tests - publisher's test manuals and the Mental Measurements Yearbook A. Test manuals contain reports on how tests were developed and results when tried out. B. Mental Measurements Yearbook is a multi-volume set of reports on published tests of all types. Applying Validity Information Applying Validity Information - teachers must make practical use of information about validity to become more intelligent assessment designers and users. A. Guidelines for evaluating test validity as reported in test manuals and journals 1. Does the validation procedure fit the use to which the test will be put? 2. How well was the validation carried out? 3. Does the evidence reported for the validation support the use of the test? 4. Without a reasonable amount of validity, the assessment procedure is of very limited use to educators. Consideration of Consequences Assessments are intended to contribute to improved student learning. The question is, do they? And, if so, to what extent? What impact do assessments have on teaching? What are the possibly negative, unintended consequences of a particular use of assessment results? Teachers have an excellent vantage point for considering the likely effects of assessments. First, they know the learning objectives that they are trying to help their students achieve. Second, they are quite familiar with instructional experiences that the students have had. Third, they have an opportunity to observe students while they are working on an assessment task and to talk to students about their performances. This first-hand awareness of learning objectives, instructional experiences, and students can be brought to bear on an analysis of the likely effects of assessments by systematically considering questions such as the following: Consideration of Consequences 1. Do the tasks match important learning objectives? WYTIWYG - What you test is what you get.- has become a popular slogan. Despite the fact that it is an oversimplification, it is a good reminder that assessments need to reflect major learning outcomes. Problem-solving skills and complex thinking skills requiring integration, evaluation, and synthesis of information are more likely to be fostered by assessments that require the application of such skills than by assessments that require students merely to repeat what the teacher has said or what is stated in the textbook. Consideration of Consequences 2. Is there reason to believe that students study harder in preparation for the assessment? Motivating student effort is a potentially important consequence of tests and assessments. The chances of achieving this goal are improved if students have a clear understanding of what to expect on the assessment, know how the results will be used, and believe that the assessment will be fair. Consideration of Consequences 3. Does the assessment artificially constrain the focus of students' study? If it is judged important to be sure that students can solve a particular type of mathematics problem, for example, then it is reasonable to focus an assessment on that type of problem. However, much is missed if such an approach is the only mode of assessment. In many cases the identification of the nature of the problem may be at least as important as facility with application of a particular formula or algorithm. Assessments that focus only on the latter skills are not likely to facilitate development of problem identification skills. Consideration of Consequences 4. Does the assessment encourage or discourage exploration and creative modes of expression? Although it is important for students to know what to expect on an assessment and have a sense of what to do to prepare for it, care should be taken to avoid overly narrow and artificial constraints that will discourage students from exploring new ideas and concepts. Factors Influencing Validity A careful examination of test items and assessment tasks will indicate whether the test or assessment appears to measure the subject-matter content and the mental functions that the teacher is interested in assessing. The following factors can prevent the test items or assessment tasks from functioning as intended and thereby lower the validity of the interpretations from the assessment results. Factors Influencing Validity 1. Unclear directions: Directions that do not clearly indicate to the student how to respond to the tasks and how to record the responses tend to reduce validity. 2. Reading vocabulary and sentence structure too difficult (construct-irrelevant variance.) Vocabulary and sentence structure that are too complicated for the students taking the assessment result in the assessment's measuring reading comprehension and aspect of intelligence, which will distort the meaning of the assessment results. Factors Influencing Validity 3. Ambiguity. Ambiguous statements in assessment tasks contribute to misinterpretations and confusion. Ambiguity sometimes confuses the better students more than it does the poor students. 4. Inadequate time limits(construct-irrelevant variance.) Time limits that do not provide students with enough time to consider the tasks and provide thoughtful responses can reduce the validity of interpretation of results. Rather than measuring what a student knows about a topic or is able to do given adequate time, the assessment may become a measure of speed with which the student can respond. For some content (typing) speed may be important. However, most assessments of achievement should minimize the effects of speed on student performance. Factors Influencing Validity 5. Overemphasis on easy to assess aspects of domain at the expense of important but hard to assess aspects. It is easy to develop test questions that assess factual recall and generally harder to develop ones that tap conceptual understanding or higher-order thinking processes such as the evaluation of competing positions or arguments. Hence, it is important to guard against under representation of tasks getting at the important, but more difficult to assess aspects of achievement. 6. Test items inappropriate for the outcomes being measured. Attempting to measure understanding, thinking skills, and other complex types of achievement with test forms that are appropriate only for measuring factual knowledge will invalidate the results. Factors Influencing Validity 7. Poorly constructed test items. Test items that unintentionally provide clues to the answer tend to measure the students' alertness in detecting clues as well as mastery of skills or knowledge the test is intended to measure. 8. Test too short. a test is only a sample of the many questions that might be asked. If a test is too short to provide a representative sample of the performance we are interested in, its validity will suffer accordingly. Factors Influencing Validity 9. Improper arrangement of items. Test items are typically arranged in order of difficulty, with the easiest items first. Placing difficult items early in the test may cause students to spend too much time on these and prevent them from reaching items they could easily answer. Improper arrangement may also influence validity by having a detrimental effect on student motivation. This influence is likely to be strongest with young students. 10. Identifiable pattern of answers. Placing correct answers in some systematic pattern (TT FF TT or ABCD ABCD) enables students to guess the answers to some items more easily, and this lowers validity. Factors Influencing Validity In short, any defect in the construction of the test or assessment that prevents it from functioning as intended will invalidate the interpretations to be drawn from the results. Evaluating Your Classroom Assessment Methods Content Representativeness 1. Does my assessment procedure emphasize what I taught? 2. Do my assessment tasks accurately represent the outcomes specified in my school’s and state’s curriculum framework? 3. Are my assessment tasks in line with the current thinking about what should be taught and how it should be assessed? 4. Is the content in my assessment important and worth learning? Evaluating Your Classroom Assessment Methods Thinking Processes: 5. Do the tasks on my assessment instrument require students to use important skills and processes? 6. Does my assessment instrument represent the kinds of thinking skills that my school’s curriculum framework and state’s standards view as important? 7. During the assessment, do students actually use the types of thinking I expect them to use? 8. Do I allow enough time for students to demonstrate the type of thinking I am trying to assess? Evaluating Your Classroom Assessment Methods ds Consistency: 9. Is the pattern of results in the class consistent with what I expect based on my other assessments of them? 10. Do I make the assessment tasks too difficult or too easy for my students? Evaluating Your Classroom Assessment Methods Reliability and Objectivity: Reliability: refers to the consistency of assessment results. Objectivity: is the degree to which two or more qualified evaluators will agree on what quality rating or score to assign a student’s performance. 11. Do I use a scoring guide for obtaining quality ratings or scores from students’ performance on the assessment? 12. Is my assessment instrument long enough to be representative sample of the types of learning outcomes I am assessing? Evaluating Your Classroom Assessment Methods Fairness to Different Types of Students: 13.Do I word the problems or tasks on my assessment so students with different ethnic and socioeconomic backgrounds will interpret them in appropriate ways? 14. Di modify the wording or the administrative conditions of the assessment tasks to accommodate students with disabilities or special learning problems? 15. Do the pictures, stories, verbal statements, or other aspects of my assessment procedure perpetuate racial, ethnic, or gender stereotypes? Evaluating Your Classroom Assessment Methods Economy, Efficiency, Practically, & Instructional Procedures 16. Is the assessment relatively easy for me to construct and not too cumbersome to use to evaluate students? 17. Would the time needed to use this assessment be better spent directly teaching my students instead? 18. Does my assessment represent the best use of my time? Evaluating Your Classroom Assessment Methods Multiple Assessment Usage: 19. Do I use one assessment result in conjunction with other assessment results? Positive Consequences for Learning: 20. Do my assessments result in both the students and myself