EDUC 7705 I. Course Number: EDUC 7705 Course Title: Assessment and Evaluation in the Content Area College: Bagwell College of Education Semester: II. Instructor: III. IV. Class Meetings: Required Texts: Kubiszyn, T. & Borich, G. (2010), 9th Edition: Educational Testing and Measurement: Classroom Application and Practice, Wiley: Hoboken, NJ. V. Catalog Course Description: Assessment and Evaluation This course focuses on planning, constructing, analyzing, and applying educational assessment to document graduate teacher candidate performance for instructional and accountability purposes. Specific topics include guidelines for the development of traditional assessment questions, including the use of multiple-choice questions to measure critical thinking and problem-solving skills; guidelines and rubrics for the development and scoring of performance, writing and portfolio assessments; assessing affective outcomes; describing, analyzing and refining data to improve assessment; and the application and interpretation of standardized norm and criterion-referenced measures. Additionally, attention will be paid to multicultural assessment procedures and concerns relevant to external assessment programs. VI. Professional Portfolio Narrative: A required element in each portfolio for the Graduate Program is the portfolio narrative. The purpose of the portfolio narrative is to ensure that every candidate reflects on each of the proficiencies on the CPI with regard to what evidence the candidate has selected for his/her portfolio. In your portfolio, you need to include a narrative which includes descriptive, analytic and reflective writing in which you reflect on each proficiency and how you make the case that the evidence you have selected in your portfolio supports a particular 1 EDUC 7705 proficiency, using the Portfolio Narrative Rubric as a guide. The narrative should be comprehensive, documenting research-based best practices. Each graduate candidate is required to compile both an online portfolio of evidence that documents each candidate’s proficiencies as defined by the graduate CPI (the M. Ed. In Adolescent Education Capstone Portfolio) as well as a portfolio of assignments used to assess this program’s effectiveness (the M. Ed. in Adolescent Education Program Portfolio). Your Test Construction Project (Assignment #1) is a required elements from this course that must be added as evidence to your Program Portfolio in Chalk and Wire under Chapter #3: Planning, Implementing and Assessing for Learning. Additionally, you are required to complete and document in your program portfolio least one diverse field experience each semester. Of course, you will also wish to add all these assignments to your personal capstone portfolio. An additional required element in each capstone portfolio for the Graduate Program is description, analysis and reflection on each piece of evidence you place for each of the proficiencies. Using Chalk and Wire technology, this means identifying the content and role of the evidence, and then describing the importance of each piece of evidence. Something like the following: Date: EVIDENCE OF TITLE: To what time period, approximately or exactly, does this presentation refer? Context: EVIDENCE TITLE: "This/these artifacts were developed to" …describe with one or two sentences the condition under which the artifact(s) were created (part of a course requirement/field placement requirement/purpose related to licensure) Role: EVIDENCE TITLE: What was your role in the event(s) described? Were you acting as part of a collaborative team? Alone? Author? Editor? Researcher? Instructor? This field is placed here to allow you to indicate what your contribution to the overall development of the artifact(s) presented This is ethically a requirement if you collaborated with others who also made contributions. Reflection/Importance: EVIDENCE TITLE: This is by far the most significant information. You should concisely and clearly explain: What is happening in this presentation? How does this artifact (or artifacts) used at that time, clearly illustrate your capacity to perform the standard you are presenting? What next? Upon reflection, what has this experience suggested as "next moves" for you as a developing professional? Here´s a simple example of the phrasing you might use for this section: "I have included/associated/linked this….NAME OF ARTIFACT(s) with NAME OF A DOMAIN AND COMPONENT. I feel….NAME OF THE ARTIFACT(S) belongs under this standard because….PROVIDE RATIONALE IN TWO OR FOUR SENTENCES. This artifact(s) demonstrate(s) my ability/position/emerging skill/competence with regard to NAME A STANDARD/COMPONENT in that PROVIDE AN EXPLANATION OF HOW THE EXPERIENCE YOU HAVE HAD CLEARLY SHOWS YOUR CONFIDENCE/SKILL/CAPACITY RELATIVE TO THE DOMAIN OR COMPONENT. Given my experience, I 2 EDUC 7705 am determined/intend/will/plan….IF APPROPRIATE, DESCRIBE SPECIFIC ACTIONS YOU WILL TAKE TO FURTHER DEVELOP YOUR PROFESSIONAL SKILLS IN THIS AREA." VII. Purpose and Rationale: KENNESAW STATE UNIVERSITY’S CONCEPTUAL FRAMEWORK: Collaborative development of expertise in teaching and learning The Professional Teacher Education Unit (PTEU) at Kennesaw State University is committed to developing expertise among candidates in initial and advanced programs as teachers and leaders who possess the capability, intent and expertise to facilitate high levels of learning in all of their students through effective, research-based practices in classroom instruction, and who enhance the structures that support all learning. To that end, the PTEU fosters the development of candidates as they progress through stages of growth from novice to proficient to expert and leader. Within the PTEU conceptual framework, expertise is viewed as a process of continued development, not an end-state. To be effective, teachers and educational leaders must embrace the notion that teaching and learning are entwined and that only through the implementation of validated practices can all students construct meaning and reach high levels of learning. In that way, candidates are facilitators of the teaching and learning process. Finally, the PTEU recognizes, values and demonstrates collaborative practices across the college and university and extends collaboration to the community-at-large. Through this collaboration with professionals in the university, the public and private schools, parents and other professional partners, the PTEU meets the ultimate goal of assisting Georgia schools in bringing all students to high levels of learning. Knowledge Base: Teacher development is generally recognized as a continuum that includes four phases: preservice, induction, in-service, renewal (Odell, Huling, and Sweeny, 2000). Just as Sternberg (1996) believes that the concept of expertise is central to analyzing the teachinglearning process, the teacher education faculty at KSU believe that the concept of expertise is central to preparing effective classroom teachers and teacher leaders. Researchers describe how during the continuum phases teachers progress from being Novices learning to survive in classrooms toward becoming Experts who have achieved elegance in their teaching. We, like Sternberg (1998), believe that expertise is not an end-state but a process of continued development. Use of Technology: 3 EDUC 7705 Technology Standards for Educators are required by the Professional Standards Commission. Telecommunication and information technologies will be integrated throughout the master teacher preparation program, and all candidates must be able to use technology to improve student learning and meet Georgia Technology Standards for Educators. During the courses, candidates will be provided with opportunities to explore and use instructional media. They will master use of productivity tools, such as multimedia facilities, local-net and Internet, and feel confident to design multimedia instructional materials, and create WWW resources. Field Based Activities While completing your graduate program at Kennesaw State University, you are required to be involved in a variety of leadership and school-based activities directed at the improvement of teaching and learning. Appropriate activities may include, but are not limited to, attending and presenting at professional conferences, actively serving on or chairing school-based committees, attending PTA/school board meetings, leading or presenting professional development activities at the school or district level, and participating in education-related community events. As you continue your educational experiences, you are encouraged to explore every opportunity to learn by doing. VIII: Goals and Objectives: As a result of satisfactory fulfillment of the requirements of this course, the candidate will accomplish the objectives listed in the table below. Course Goals and Objectives KSU M.Ed CPI NBPTS Link PSC/NCATE Link Outcome 2 Core 3 Professional and pedagogical skills and knowledge Specific Objectives: The graduate teacher candidate will a. be able to recall basic definitions of assessment, measurement, evaluation and test. b. differentiate between criterion-reference measurement and norm-reference measurement. c. construct adequate objectives, both general and specific. Assessment: Formal Examination Applied Assignment #1 (Individually shared and collaboratively shared) General Objective #2: The graduate teacher candidate will construct a traditional Outcome 2 Core 3 Professional and Assessment General Objective: 1. The graduate teacher candidate will demonstrate an understanding of the components of assessment. 4 EDUC 7705 assessment instrument with particular attention paid to utilizing sound guidelines for writing true/false, multiple-choice, and higher-order multiple-choice questions. Specific Objectives: The graduate teacher candidate will a. demonstrate the use of the overall objective of a table of specifications. b. apply the categories of the Taxonomy of Educational Outcomes. c. analyze the criteria for evaluating measurement instruments. d. recall the guidelines for writing test items: supply, true-false, essay, multiple choice, and matching items. e. apply specific suggestions for writing test items: supply, true-false, essay, multiple choice, and matching items. f. calculate item discrimination indexes for the items of a test. g. interpret item analysis data for distractors of multiple-choice items in terms of the direction and extent to which a distractor discriminates. h. calculate item difficulty indexes for items of a test. h. interpret item difficulty indexes. i. make appropriate suggestions for the revision or reuse of an item on the basis of item analysis data. Assessment: Test Construction Project Applied Homework Assignments #2,#3, General Objective #3: The graduate teacher candidate will be able to recognize and utilize Outcome 1 fundamental statistical procedures. Specific Objectives: The graduate teacher candidate will be able to a. recognize the characteristics of a frequency distribution and frequency polygons. b. recognize the characteristics of percentiles. d. interpret percentiles. e. recognize the characteristics of percentile ranks. f. interpret percentile ranks. g. calculate percentile ranks. h. find ranks for a given set of scores. j. recognize the characteristics of a mean. pedagogical skills and knowledge Core 3 Professional and pedagogical skills and knowledge 5 EDUC 7705 l. recognize the relationship between the mean and median for distributions of different shapes. m. recognize the relationship of the range to the standard deviation. o. recognize the characteristics of the standard deviation and its square, the variance. p. interpret standard deviations as measures of dispersion. q. recognize the characteristics of the normal curve. r. recognize the characteristics of derived scores. s. recognize the characteristics of standard scores. u. interpret z-scores. v. calculate standard scores that have a mean of 50 and a standard deviation of 10. w. interpret standard scores having a mean of 50 and a standard deviation of 10. x. recognize the characteristics of a correlation coefficient. y. interpret a correlation coefficient., Assessment: Formal Examination Homework Applications #4-5 Presentation General Objective #4: The graduate teacher candidate will be able to identify and utilize various methods of estimating and factors influencing instrument reliability. Specific Objectives: The graduate teacher candidate will be able to a. recognize the role of the concept of reliability in evaluating tests. b. recognize appropriate interpretations of reliability coefficients. c. recognize the relationships among observed scores, true scores, and errors of measurement. Assessment: Homework Application #5-6 Formal Examination 6 EDUC 7705 General Objective # 5. The graduate teacher candidate will be able to identify and utilize various approaches to establish the validity of measuring devices. Specific Objectives: The graduate teacher candidate will be able to a. recognize the role of the concept of validity in evaluating tests. b. recognize the relationships among reliability, validity, and item analysis. c. select an appropriate procedure to obtain evidence of a specific type of validity. Assessment: Formal Examination Homework #6 General Objective #6: The graduate teacher candidate will demonstrate critical reflection on the use and misuses of today’s high-stakes tests and be able to communicate this knowledge to others, including colleagues, parents and graduate teacher candidates Assessment: Formal Examination Presentation Assignment General Objective #7: The graduate teacher candidate will plan and construct a valid performance assessment rubric for their capstone applied project. Assessment: Assignment #1 General Objective #8: The graduate teacher candidate will recognize, detect, and control measurement bias in testing and become familiar with techniques to ensure multicultural validity. Assessment: Formal Examination Presentation Assignment General Objective #9: The graduate teacher candidate will write descriptively, analytically, and reflectively. Assessment: All written assignments General Objective #10: The graduate teacher candidate will work collaboratively and provide feedback to peers. Outcome 3, Outcome 1 Core 5, 4 Disposition Professional and pedagogical skills and knowledge Outcome 1,2 Core 3 Professional and pedagogical skills and knowledge Outcome 2 Core 1, 4 Professional and pedagogical knowledge and skills. Dispositions Outcome 3 Core propositi ons 4 and 5 Outcome 3 Core propositi Professional and pedagogical knowledge and skills. Dispositions Dispositions 7 EDUC 7705 Assessment: Professionalism Evaluation General Objective #11: The graduate teacher candidate will follow institutional policies and professional guidelines of academic honesty, and exhibits professional behavior in interactions with professors and colleagues. Assessment Peer and Professor Feedback IX. Outcome 3 ons 4 and 5 Core Dispositions propositi ons 4 and 5 Requirements/Assignments: A. Assignment #1: Evaluate and Improve Your Classroom Assessment Instrument employed in the Graduate Candidate’s teaching area. An outline, instructions, example, and scoring rubric for the test construction project are available at the end of this syllabus and in WebCT. (30 points) B . Assignment #2: As a member of a 4-person team, Evaluate and Critique a CCDS Benchmark Assessment employed in your content area. An outline, instructions, and examples, for project are available at the end of this syllabus and in WebCT. (30 points) C. Assignment #3: Plan and Deliver a Group Presentation – Each teacher candidate will be assigned two partners. This team will develop a presentation comparing each candidate’s school and the high-stakes, large scale testing present in the schools. Specific requirements and a grading rubric are available at the end of this syllabus and in WebCT. Time will be made available during class for this activity; however, students may need to meet outside class to complete preparations for the presentation. (20 points) E. Assignment #4: Complete Practice Exercises and Questions – “Truth in Grading”(hand-out 5 points) and “Education Watch Georgia” available at http://www2.edtrust.org/edtrust/summaries2006/states.html . Questions on pages 2, 3, 6,7. (5 points). (10 points total) D. (Assignment #4 in WebCT) Professionalism: Behaviors that indicate professional skill may be demonstrated in a graduate teacher candidate’s approach to participating in and completing the requirements for any particular course, such as this one. Professional behavior will be monitored in this course. Should concerns arise regarding an individual teacher candidate; the instructor of this course will communicate these concerns to the graduate teacher candidate and to the program coordinator of the teacher candidate’s major program with the purpose of drawing attention to deficiencies so that they may be remedied. Indicators of professionalism that will be monitored are addressed in the questions below. (10 points) Does the teacher candidate: Model high standards and expectations for him or herself? 8 EDUC 7705 Display a commitment to the profession of helping students learn? Enjoy learning and indicate enthusiasm toward working with students to facilitate their learning? Regularly reflect on and assess his or her performance and effectiveness for self-improvement? Learn from experiences and show improvement over time? Manage interpersonal relationships effectively? Demonstrate courtesy, respect, and civility in interactions with others? Work collaboratively with professional colleagues and faculty? Accept responsibility for actions and non-actions, placing the locus of control upon him or herself rather than shifting blame or claiming inability to control outside factors? Maintain appropriate attire and appearance? Promote and model standards of academic honesty? Habitual absences, tardiness, and leaving class early are issues of professionalism. In case of emergencies, please email or call to inform me of your conflict/emergency. Finally, one final issue regarding professionalism-please turn off all cell phones and pagers during class. Disturbances by these devices are disrespectful, disrupt the flow of ideas during discussions, and are nuisances that can be easily avoided. Seldom is there a reason to speak on the phone that could not wait until the end of class. One-half of your professionalism points will be determined by your peers as they evaluate your group work contributions. The professionalism rubric is available as Assignment #4 All written assignments should be typed using 12-point font, double-spaced, on white 8 ½ X 11 paper. They should represent quality, college level work, which includes correct spelling, grammar and punctuation - utilizing APA ( 5th ed.) style formatting. (See our WebCTCourse Menu for APA and grammar style sources.) X: Evaluation and Grading: 90 – 100 = A 80 – 89 = B 70 – 79 = C 60 – 69 = D 9 EDUC 7705 below 60 = F XI. Policies Diversity: A variety of materials and instructional strategies will be employed to meet the needs of the different learning styles of diverse learners in class. Candidates will gain knowledge as well as an understanding of differentiated strategies and curricula for providing effective instruction and assessment within multicultural classrooms. One element of course work is raising candidate awareness of critical multicultural issues. A second element is to cause candidates to explore how multiple attributes of multicultural populations influence decisions in employing specific methods and materials for every graduate teacher candidate. Among these attributes are age, disability, ethnicity, family structure, gender, geographic region, giftedness, language, race, religion, sexual orientation, and socioeconomic status. An emphasis on cognitive style differences provides a background for the consideration of cultural context. Kennesaw State University provides program accessibility and accommodations for persons defined as disabled under Section 504 of the Rehabilitation Act of 1973 or the Americans with Disabilities Act of 1990. A number of services are available to support students with disabilities within their academic program. In order to make arrangements for special services, students must visit the Office of Disabled Student Support Services (ext. 6443) and develop an individual assistance plan. In some cases, certification of disability is required. Please be aware there are other support/mentor groups on the campus of Kennesaw State University that address each of the multicultural variables outlined above. Academic Honesty: KSU expects that graduate students will pursue their academic programs in an ethical, professional manner. Any work that students present in fulfillment of program or course requirements should represent their own efforts, achieved without giving or receiving any unauthorized assistance. Any student who is found to have violated these expectations will be subject to disciplinary action. Course Outline 10 EDUC 7705 Week/Date June 9 Why is assessment such a big deal today? PRE-READING CHAPTER Text (Gronlund or Popham) Chapter #1: Introduction Chapter #3: The Purpose of Testing How does your school use data to focus instruction? Pre-Reading (Articles and Important To-Do/Turn Handouts) In Dates Trimble, S., Gay, A., & Matthews, J. (2005). Using test score data to focus instruction. Middle School Journal, 2005, 25-31. Using this handout as a reference and benchmark compare and contrast the assessment activities in your school with those of Camden County teachers. Class Expectations and Assignments Chalk and Wire June 11 What is the relationship between assessment, instruction, and learning? Chapter #5: Measuring Learning Locate, print and read: http://www.fairtest.org/NCLBAfter-Six-Years Chapter #2: High Stakes Testing http://www.edaccountability.org/Ass essmentExecSumm061207.pdf How do you measure learning? Writing Behavioral Objectives Taxonomy Verb List Retrieved from: http://www.iloveteaching.com/steac her/verbs/taxonomy.htm High Stakes Testing……. GPS http://www.georgiastandards.org/ June 16 What is my link between the Chapter #4: Norm- and CriterionReferenced Tests and Content Validity Hand-out Black, P., Harrison, C., Lee, C., Assignment 1: Part 1 11 EDUC 7705 new standards, benchmarks and my teaching and my assessment? Evidence GPSs, Benchmarks, CRCT, EOCT…and all that jazz. Any research on effectivenessy of benchmarking? June 18 How hard can writing test items be? June 23 What is an instructional rubric? #6: Writing Objective Test Items #7: Writing Essay and Higher Order Chapter #10: Administering, Analyzing and Improving the Test or Assessment June 25 Should grades be “sacred” if they are not valid or reliable? Marshall, B, & Wiliam, D. (2004). Working inside the black box: Assessment for learning in the classroom. Phi Delta Kappan.85(1), 9-21. Available at http://www.pdkintl.org/kappan/kbla 9810.htm Make sure you bring your unit test. Hand-out Andrade, H. G. (2003). Teaching with rubrics: The good, the bad, and the ugly. College Teaching, 54(1),27-30. Chapter 11: Grading and Reporting Inferential Statistics Assignment 1 DUE The Case Study of Sarah Hanover June 30 Chapter 12 Summarizing Data and Measures of Central Tendency Video: Models of Distributions, Spread and the Bell Curve Assignment 4a. Truth in Grading Assignment Chapter 13: Variability 12 EDUC 7705 July 2 Chapter 15: Validity Chapter 16: Reliability Movie or Video U-Tube http://www.youtube.com/watch?v=1 brINlIxvKU Chapter 18: Standardized Tests 17: Accuracy and Error Video: Confidence Intervals Consistency and Use Assignment #2:Benchmark Rough Draft, Recommendations 1 -2 Collaborative Group Time July 7 Truth in Testing? Collaborative Group Time July 9 Expanding assessments for classroom teachers. Collaborative Group Time July 16 Chapter 19: Types of Standardized Tests Video: Misleading, Distorting, and Chapter 20: Assessing Children Lying Assignment 4b: The Education Trust (2006). Education Watch: Georgia.Retrieved 12/19/06 from http://www2.edtrust.org/e dtrust/summaries2006/Ge orgia.pdf Assignment #2:Benchmark Presentations Assignment#3 July 18 Assignment #5 13 EDUC 7705 Presentations Assignment #3 14 EDUC 7705 15 7705-Test Construction Project XIV. References/Bibliography Airasian, P.W. (1997). Classroom assessment. (3rd ed.).New York : McGraw-Hill. Banks, J. A. & Banks, C. A. M. (Eds.). (1995). Handbook of research on multicultural education. NY: Macmillan. Dana, R. H. (1993). Multicultural assessment perspectives for professional psychology. Boston: Allyn and Bacon. Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart, & Wilson. Haney, W. (1989). Testing reasoning and reasoning about testing. Review of Educational Research, 54(9), 557-654. Haney, W. M., Maduaus, G. F., Lyons, R. (1993). The fractured marketplace for standardized testing. Boston: Kluwer. Hibbard, K. M. and others. (1996). A teacher's guide to performance-based learning and assessment. Alexandria, VA: Association for Supervision and Curriculum Development. Gall, M. D., Gall, J. P., & Borg, W. R. (2003). Educational research. Boston: Allyn and Bacon. Gronlund, N. E. & Linn, R. L. (1995). Measurement and evaluation in teaching (7th ed.) New York: Macmillan, Chapter 6, “Constructing Objective Test Items.” Katz, M. (1961). Improving classroom tests by means of item analysis. Clearing House, 35, 265-269. Kohn, A. (2000). The case against standardized testing: Raising the scores, ruining the schools. Westport, CT: Heinemann. Linn, R. L., & Gronlund, N. (1995). Measurement and evaluation in teaching (7th ed.). New York: Macmillan. Messick, S. (1981). Evidence and ethics on the evaluation of tests. Educational Researcher, 10, 9-20. 16 7705-Test Construction Project Popham, W. J. (2002). Classroom assessment- What teachers need to know. (3rd ed.). Boston: Allyn and Bacon. Stiggins, R. J. (2001). Student-involved classroom assessment. (3rd ed.). Upper Saddle River, NJ: Prentice-Hall. Wiggins,G. (1998). Educative assessment: designing assessments to inform and improve student performance. San Francisco, Calif.: Jossey-Bass. 17 7705-Test Construction Project OUTLINE OF REQUIREMENTS FOR TEST CONSTRUCTION PROJECT Part I. Development of a Test Plan. 1. Locate your lesson plans and unit test from a course you have taught and assessed in the past. The unit should probably have lasted 2-3 weeks and have involved a major test. 2. Provide a brief summary of the course, level of student involved, place of course in curriculum, etc. 3. Provide the GPSs (or ??) this unit addressed. These are your general objectives. 4. Using your located lesson plans, rewrite the activities the student did each day as behavioral objectives. You should have at least two behavioral objectives for each day. The requirements for behavioral objectives will be discussed in class and in your text. At the end of the behavioral objective, place the number of minutes the students spent doing the activity. These are you specific objectives. 5. Complete a table of specifications showing the general standards and specific objective and the process dimension. You must also note now many classroom minutes were spend on this objective. Your Table of Specification should be set up as below. 6. Finally, total the total minutes spend on the unit. Then calculate the percentage of this total spend on each objective. EXAMPLE BELOW: Remember Understand Apply Analyz Evaluate Create Specific Total e Objectives in Behavioral Terms Time Cells include time in minutes, percentages of time spent on each objective, and test items that correspond (section and number from the test- when it is donet). 1.Realize 1a. After 5 min. the independently 1.4% importance reading the text and of discussing ecological examples in class balance. (5 minutes), all students will describe vocally, three logical effects on an ecosystem if an organism is removed. General Standards (GPS or QCC) 1b.) After independently 10 min 18 7705-Test Construction Project reading the text and discussing logical outcomes in class, students will be able to predict in writing the two effects of adding a foreign organism to an ecosystem. 2.8 % DUE MONDAY JUNE 16…………….PEER FEEDBACK PART II: B. Review and Construction of test items 1. Review your of test items based on the table of specifications you have prepared. Your questions must proportionally test what and how you taught. That is, if you spend 30% of your instructional time on a certain concept, the test should have approximately 30% of the points devoted to the concept. Also, your question type should also match how you taught the concept. Again, if your instructional and activities were devoted to the memorization of materials, the test items should also test this process. 2.. Review the quality of your items. Rewrite as needed and develop a test which you feel would be a worthwhile measuring instrument. 3.. Any type of items may be used (in accordance with your TOS) , but try to include as great a variety of item types as feasible. As a minimum include: ITEM TYPE- AT LEAST A. 4 completion items; directions and points per item B. 3 multiple choice items; directions and points per item C. 3 true false items; directions and points per item SPECIFICATIONS (ALL MUST FOLLOW GUIDELINES SPECIFIED IN TEXT, NOTES, AND HANDOUTS) Items (4) YES NO Directions (clear and complete) and points for each item Items (3) Directions (clear and complete) and points for each item Items (3) 19 7705-Test Construction Project D. One set of 4-6 matching items; directions and points per item E. 2 multiple-choice questions that involve critical thinking and stimulus material, directions, and points per item Directions (clear and complete) and points for each item Item set Directions (clear and complete) and points for item set Items (2) Stimulus material Directions (clear and complete) F. One restrictedEssay item response essay with directions and points for item with analytic scoring rubric Directions (clear and complete) Analytic scoring rubric content Rubric design G. One extendedEssay item response essay with directions, points, and with a holistic scoring rubric Directions (clear and complete) Holistic scoring rubric content Scoring rubric design 4. Correct answers are to be indicated for all items. C. Test Analysis and Using the Test Results to Support Learning Describe (not do) the procedures or techniques that you might use to analyze the results of this test considering the following categories: (Remember you do not have to actually administer the test. Be creative and idealistic and assume that time and money are of no concern. 20 7705-Test Construction Project a. Item analysis (What type?) b. Descriptive statistics (How would you go about describing student performance on your test?) c. Reliability and validity (What type? How determined?) Finally, briefly describe how and when you will return this test in a manner that supports student learning. 21 7705-Test Construction Project RUBRIC FOR Course 7705 Assignment #1 and Program Assessment #3: Plan, Implement and Assess An Instructional Unit Criterion Level 1-Poor Description of Students and Curriculum Little or No description of targeted students or curriculum is offered. 10% 0 Alignment of State Standards with Classroom Objectives 1.0 points Little or no alignment of state standards and classroom objectives is offered. 10% 0 1.0 points Level 2Unsatisfactory Students identified in broadest terms (grade). Curriculum is described, but little connection is offered between the course curriculum the specific unit being taught. Level 3-Satisfactory Level 4-Exceptional Students are described by grade and two subgroups. Both course curriculum and unit curriculum is described and connections are made between course curriculum and unit curriculum. 1.5 2.2 Students are described by grade and subgroups including gender, achievement level, ethnicity and pertinent groups identified with special classroom needs. General curriculum is described and connections are made between course curriculum and unit curriculum. Special attention is paid to the curricula and exceptional classroom needs of the students. 1.7 2.1 points Either the state standards or the classroom objectives (in behavioral terms) are not adequately described; or both are included but appear incomplete. 2.3 2.5 points Both the complete state standards and the classroom objectives (in behavioral terms) are adequately described. There is a clear alignment between processes described in standards and processes of the objectives. 2.7 2.8 3 points Both the complete state standards and the classroom objectives (in behavioral terms) are adequately described. There is a clear alignment between processes described in standards and processes of the objectives. Additionally, there is some 22 7705-Test Construction Project 1.5 1.7 2.1 points 2.2 2.4 2.6 points Completion Table of Specifications (TOS) with Specific Objectives as Behavioral Objectives No or very basic Table of Specifications was completed 0 2.0 points 20% Table of Specifications was completed but included only limited or incorrect indicators for processes involved and the time devoted to each objective. 3.0 Alignment TOS and Test Content (Type of Item and Instructional/Test Time Match – Am I Testing What I Taught, How I Taught It?) 20% No or very limited alignment between test content and TOS is attempted. 0 2.0 points 3.5 4.2 points Either an alignment between process involved in answering item and processes involved in instruction or an alignment between instruction time and test time is missing. Or both are attempted but incomplete. 2.7 2.8 3 points Table of Specifications was Table of Specifications was completed and includes completed and includes 80% of the indicators for 100% of the indicators for processes involved and the processes involved and the time devoted to each time devoted to each objective. objective. If appropriate, there is evidence that some 4.4 4.8 5.2 points differentiation of instructional time based on student needs occurred. Both an alignment between processes involved in answering items with processes involved in the classroom instruction and an alignment between time in the classroom and number of test items is displayed for 80% of the rows. 4.4 3.0 3.5 evidence that differentiation of objectives based on student needs occurred. 4.8 5.2 points 5.4 5.8 6 points Both an alignment between processes involved in answering items with processes involved in the classroom instruction and an alignment between time in the classroom and number of test items is displayed for all rows. There is differentiation of some test items if differentiated instruction took place. 4.2 points 5.4 5.8 6 points 23 7705-Test Construction Project Quality and Quantity of Items Few if any items are attempted. 30% 0 3.0 points Adequate Methods for Few or no methods for Assessing Reliability, assessing and reporting Validity and Reporting results are presented. Correct Answers in a timely manner. 0 1.0 poins 10% Some test items are attempted. However, the question types do not match the minimal requirements of the assignment or if the minimal number of item types is written, more than 20% of the items violate good writing guidelines for the type of questions. Not all rubrics or item answers are included. 4.5 5.1 6.3 points Incomplete or incorrect description of process for analyzing reliability, validity, and item analyze of test items is presented. Limited description of the method for reporting scores is presented and no provision is present that turns the test into an instructional tool. The minimal number of required question types is attempted. Eighty percent of the questions adhere to the writing guidelines for the type of question presented. All necessary rubrics and answers are included. 6.6 7.2 7.8 points The number of questions and types exceeds the minimal requirements and 90% of the questions adhere to the writing guidelines for the type of question. Answer keys and rubrics are clear and concise. If appropriate, some questions clearly involve differentiation based on student choices or needs. 8.1 Two of three descriptions for analyzing items, reliability and validity are accurately presented. An accurate method for calculating test scores is available. The corrected test information is returned to the students soon after the test. 8.4 9 points All of the descriptions for analyzing items, reliability and validity are accurately presented. An accurate method for calculating test scores is complete. The method for returning corrected test information to the students immediately following the test is described. 2.2 2.4 2.6 points 2.7 2.8 3 points 1.5 1.7 2.1 points 24 7705-Test Construction Project 25 7705-Test Construction Project Assignment #2: Benchmark Assessment Project The Cobb County School District (CCDS) is in the process of developing and implementing benchmark 9-week assessments. Personnel from CCCD have been extremely open to scrutiny of these benchmarks and have requested our feedback. To allow this, collaborative groups (of three to four members) of graduate students will be assigned a content and grade level. Utilizing the recommendations from the National Center for Research on Evaluation, Standards, and Student Testing (CRESST; available at http://www.cse.ucla.edu/products/reports.asp ), each group will review their assigned benchmark assessments through the lens of the Center’s six conditions or recommendations. Groups will present their findings in the form of a paper and/or presentation to administrators from the CCSD. Overview: CRESST Report 723 suggests a valid benchmark system must meet the following conditions: 1. The purposes of the assessment are clearly defined. 2. The domain to be assessed is clearly specified. 3. Alignment: there is credible evidence on the match between assessments and domain specifications. 4. Item development and selection procedures as well as administration and scoring procedures are accurately documented. 5. (Omitted) 6. Validation: evidence is assembled to support the intended interpretations and uses of the assessments. (p. 2) CITATION: Recommendations for Building a Valid Benchmark Assessment System: Interim Report to the Jackson Public Schools CRESST Report 723 David Niemi, Julia Vallone, Jia Wang, and Noelle Griffin CRESST/University of California, Los Angeles,July, 2007 The following pages flush out the six recommendations above, utilizing citin text from Report 723. Each group will carefully evaluate their assigned benchmark test based on the criteria below. You will also have access to PICASSO….CCSD’s online curriculum and instruction system. Please note the following request: Due to the sensitive nature of some of the materials, please don’t share your access nor resources from the site outside your class and professional duties. Absolutely no copies of PICASSO materials or benchmark test are to be used for any activities other than this assignment. 26 7705-Test Construction Project Structure your report/presentation around the following major queries: Recommendation #1: Assessment Purposes CRESST Report 723 states High quality assessment is not possible without clearly identifying the purposes of the assessments. Assessments have to be judged against their intended uses. There is no absolute criterion that can be used to judge assessments and assessment systems that do not have clearly spelled out purposes. It is not possible to say, for example, that a given test is good for any and all intents and purposes; it is only possible to say, based on evidence, what purposes the test is valid for. Furthermore, professional assessment standards require that assessments be validated for all their intended uses (e.g., AERA, APA, NCME, CRESST, National Research Council). A clear statement of assessment purposes also provides essential guidance for test and assessment item developers. Different purposes may require different content coverage, different types of items, etc. Thus it is critical to identify with as much precision as possible how assessment information is to be used, and to validate the assessments for those uses. (pp.2-3) Questions to Answer: 1. What is the assessment purpose for these benchmark assessment? 2. Where is this stated? 3. How might these purposes effect the items chosen for the benchmark? Recommendation #2. Domain Specification utilizing Models of learning, cognition, and expert performance in the domain should be used to develop assessment specifications. CRESST Report 723 states…….. To insure that assessments reflect the content domain to be assessed, assessment design and validation should be preceded by a cognitive analysis of the domain or by an analysis of the cognitive demands of the performances to be assessed. To analyze the “cognitive demands” of a performance means to specify the knowledge and mental skills required to complete that performance successfully. A recent report by CRESST (Herman & Baker, 2005) suggested that analysis of the domain should at least take into account, in addition to the specific content of individual state standards, both the intellectual demands of the domain and the “big ideas” or key principles underlying the content standards. An analysis of “key principles” obviously pre-supposes a theoretical understanding of the content area under consideration (i.e., language arts, math, social studies). Cognitive demands can be viewed through a number of models/categorization systems. In the above referenced article, Baker and Herman propose a frame work based on the work of Webb (1997), which identifies four general categories of cognitive demands: recall, conceptual understanding, problem solving/schematic thinking, and strategic thinking/transfer. For the tests to have any additional valid uses by teachers, schools, or administrators these “big ideas” need to be considered in test design and validation. 27 7705-Test Construction Project Some issues to consider are: Not all equally objectives are equally important, and this should be taken into account in benchmark test design. Conducting an analysis of the domain that takes into account both the intellectual demands of the domain and the “big ideas” or key principles underlying the content standards would allow one to make “educated” judgments about what content should be focused on. As all objectives are not created equally it would make the most sense to focus on what makes the greatest difference in student performance and long-term learning. To achieve this goal it may be necessary to cut objectives from the benchmark tests and focus more heavily on the concepts that will have the greatest impact once they are mastered. Each test should cover major state test content that is also covered by district curricula within the nine week time frame. If the main purpose of the test is not to provide comprehensive diagnostic information, the test does not need to be overly long. It is estimated that 30 items should provide a reasonable sample of state test content taught within a nine-week period. Activities to complete: Go to http://www.georgiastandards.org/ and locate the appropriate standards. For example, here are a portion of the 7th grade social studies standards: Grade Seven AFRICA and ASIA: In seventh grade, students conclude the study of major world regions. The four strands are integrated, with history as the central strand. The history strand focuses on historical developments essential to understanding a specific region in the modern world. The geography strand relates the importance of geography to each region’s development. The civics strand examines the political structures in each region. The economics strand continues to build basic economic concepts and introduces students to the economic development of each region. AFRICA Historical Understandings SS7H1 The student will identify important African empires. a. Describe the development of African empires including Ghana, Mali, Songhai, and Ethiopia. b. Explain the importance of cities such as Timbuktu as a center of learning, Djenne as one of the oldest cities in Africa, and Zanzibar as a center of commerce. Now go to http://picasso.cobbk12.org and locate the appropriate page describing your content and level. Locate the standards to be covered within your 9-week benchmark test. For purposes of this example only, let’s agree they are the standards listed above. Then locate any suggested activities or exemplar units. Construct a table ( See Table 1 28 7705-Test Construction Project below) that lists the all the relevant standards, the content of the standard, the cognitive demands, the learner outcomes, and time allocation. Table 1........Relevant Standards, Content , Cognitive Demands, Learner Outcomes, and Time for (Grade, Content, Benchmark Period) Standard Content Performance Learner Time in Process/Cognitive Objective: minutes Demands What is the student doing Level: Knowledge, to let us know Comprehension, he/she is application, learning this? analysis, Identify from synthesis, suggested evaluation activities (CCDS: Bloom) SS7H1 The African Describe Read pages 2 days student will Empires development 504-506 in the identify Ghana, Mali, textbook. important Songhai, Knowledge 100 minutes African Ethiopia Discuss the empires. importance of a. Describe trade in the the growth of an development empire. of African (Don’t know empires what the including student is Ghana, Mali, doing? Songhai, and Taking notes Ethiopia. from discussion, listening?) Create an newspaper ad for the newest model of camel Additional questions: 1. Are there any “Key Principles” that should be emphasized? 29 7705-Test Construction Project 2. Did you locate evidence that the content and process of each standard are taught in each classroom? If not, what does that mean? How could that issue be addressed? Recommendation #3: Alignment CRESST Report states that ……………Alignment generally describes the extent to which tests reflect state standards and assessments. You should analyze and identify the match of each item to the table created above……The number of dimensions that incorporate both curricular content and skill level. • Content checking (i.e., is the item at face value an example of the “what”, ‘how” and “why” of specific standard(s) ostensibly being tested, as detailed in the blueprint) • Cognitive skills/demands (i.e., what cognitive abilities does the item draw on). • Content coverage (or, specifically, the extent of match between the item and the “key principles” or big ideas underlying the specific standard). This dimension should address whether the item under consideration is a good representation of the core concept underlying the standard, or only peripherally/ superficially linked. Also, the percentage of items on the benchmark test ought to reflect the percentages of classroom instruction. For this recommendation, develop Table 2. based on the following example. Table 2: Item Alignment with Standard Content and Process Item # and Does this Item Does this Item If either or both column is Question Match A match “no”; circle the item Content Process/Cognitive number and use this Standard Demand of the column to explain the Standard? misalignment. Additionally questions to answer: 1. Have all the standards and key concept been covered? What standards or key concepts should have been included that were not? Recommendation #4. Then the CRESST Report 723 suggests each benchmark test should have written documentation of: 1. Item and test development procedures – Test length: 30 items per test should be sufficient for purposes of predicting outcomes on state tests. These items should focus on the most important knowledge and skills taught before the test. If the district decides that the tests should provide diagnostic information, there would need to be 3-5 items per topic reported. 30 7705-Test Construction Project – Item development: Items have been reviewed for wording, clarity, bias, test directions for test takers. 2. Information about the test – Students, teachers, and schools should receive advance information about the content and format of the test. For this purpose it is reasonable to release a representative sample of items for each test, 3. Scoring procedures – Documentation on scoring should describe: scoring and training methods for open ended items, possible differential weighting of items, instructions and possibly training for interpreting and using scores, procedures for determining proficiency levels, criterion and/or norming procedures, and scaling procedures, if any. There should be a rubric and examples of performance at different score levels for constructed response tasks. – In addition, the meaning of scores should be clearly stated, and justification for score interpretations provided. 4. Student characteristics – Who is the test intended for and who is it not appropriate for should be outlined. For example, are there assumptions about language ability, prior knowledge, format familiarity, special education students, etc. Activities: 1. Please review the item construction and scoring directions. Please rewrite any items that need improved wording, clarity, layout, removal of bias, improved distracters, etc. 2. Are there any test items that query the appropriate standard content but involve the wrong cognitive demand? That is, does the standard ask the learner to describe two forms of government, and the test question asks for an evaluation of these two governments? Rewrite these questions. 3. Please review and improve test directions for test takers. 4. After reviewing Tables 1 and 2, are there any topics that should be queried? Make a list of any topics or key standards omitted. Recommendation #5: Omitted Recommendation #6 Validation: For discussion only; no questions to answer 31 7705-Test Construction Project CRESST Report 723 states……..Validation is another major focus which covers the following considerations for the benchmark assessment and summarizes the evidence and theory bearing on the intended use or interpretation. Validation evidence should include: • Expert review • Empirical studies • Relationships among different measures • Criterion group comparisons • Utility: – Does the assessment provide useful information? – Can results be used to improve learning and instruction? • Item analyses • Difficulty • Discrimination • Cognitive demands • Test properties • Difficulty • Inter-item correlations • Reliability • Evidence on differential validity • Investigation and elimination of irrelevant variance Appendix C: Example Preliminary Analysis of the Jackson School District’s First Term Algebra 1 Test, 2005-2006 Recommendations regarding reliability and validity analyses will be addressed in greater depth in the second deliverable to JPS. However, as an example of what some of this work might entail, we present an example of a preliminary test analysis. This example is based on findings from JPS’s first term Algebra 1 test in 2005-06. In the school year of 2005-06, 2,244 students at the Jackson School District took the 35-item Algebra 1 test. The students who took the test were enrolled in Grades 8-12, with the majority of students in Grades 8-10. There were 458 8th-graders, 780 students in 9th-grade, 850 in 10th-grade, 99 in 11th-grade, and 34 in 12th-grade. The gender breakdown for the student population was relatively even, with 54% female and 46% male. Of the total, 98% of the students were African Americans and about 90% received free or reduced fee lunch. Descriptive Analysis To describe how these students performed on the test, we started with descriptive analyses. First we calculated the percentages of students who scored correctly by item in order to give us an initial indication of item difficulty. Please see Charts A and B for the detailed results. Chart A has results on the first 18 items, and Chart B has the remaining 17 items. Note that 1.00 on the axis indicates that 100% of students answered 32 7705-Test Construction Project an item correctly. As shown in Charts A and B, there is a wide variation in the percentages of students answering each item correctly across this 35-item test. Only about 31% of the students passed item 18, while 86% of the students passed item 22. Caution should be directed to items with passing rate less than 50%. These items should be examined in terms of content and phrasing to determine if they are addressing the specific knowledge/skills that they were designed to address. This process helps to ensure the content validity of the items. Items that, after analysis, are determined to have content validity problems should be deleted from the test or at least deleted from the calculation of the final scores for the students. If the low pass-rate items are found to be valid in content, both students and teachers should be considered in investigating reasons for the pass rate. For example, it could be the content was not covered adequately in the classroom, or it could be the students did not master the content because they did not possess the requisite background knowledge or skills. Besides analyzing the individual item passing rate for all the students who took the test, we also analyzed the individual item passing rate by various student background variables including gender, ethnicity, status in receiving free or reduced fee lunch, and grade level. There is no one specific pattern for the passing rates by grade level. In other words, some items seem to be of the same difficulty level for all grades, some items are more difficult for 8th-graders than for the students of higher grades, and sometimes items are more difficult for 12th-graders than for the students in lower grades. The gender differences in passing items are relatively small; the maximum differences are 6% (for items 23, 25, 26, and 30). There were differences in students’ passing rates by ethnicity, some as large as 23%. For example, for item 19, African American students had a passing rate of 47% while the other students had a passing rate of 70%. The differences in pass rate based on free/reduced lunch status are relatively minor when compared to the ethnic differences we found. Correlational Analysis The reliability coefficient for the 35-item test is .80, which is in the range generally considered to be acceptable for instrument reliability. In terms of investigating individual items, we found the Pearson correlation coefficients between each item and the mean score range from .19 to .49. After deleting 10 items with the lowest correlations with the grand mean, the reliability coefficient for the remaining 25 items is .78, which is only slightly lower than the original one. The items dropped are items 31, 27, 5, 10, 3, 22, 1, 2, 8 and 11, whose correlations with the grand mean range from .19 to 33 7705-Test Construction Project .31. It seems deleting these 10 items yields a more efficient test without reducing the general reliability of the test. These analyses suggested that the test internal reliability is reasonably high and that many items can be removed without seriously impairing that reliability. 34 7705-Test Construction Project Assignment 3: Final Project: Applied Assessment and Georgia Test Results Directions and Absolute Analytical (Present/Absent) Rubric (20 Points total) C. (Assignment #3 in WebCT) Plan and Deliver a Group Presentation – Each teacher candidate will be assigned three partners. This team will develop a presentation comparing each candidate’s school and the high-stakes, large scale testing present in the schools. Basic requirements of this assignment include summaries and comparisons of data located on school report cards, an in-depth examination and critique of one high-stakes test used in each school, and the comparison of this state’s performance in 4th and 8th grade national achievement tests for 2005 (or 2006 if available), compared to those same scores from three other states. The presentation must employ technology, and a copy of the presentation must be supplied to the professor. This assignment is designed to allow teacher candidates to critically examine high stakes testing in their individual schools, while comparing data on Georgia’s national testing performances. Specific requirements and a grading rubric are available at the end of this syllabus and in WebCT. Time will be made available during class for this activity; however, students may need to meet outside class to complete preparations for the presentation. (20 points) The purpose of this assignment is to integrate in PowerPoint presentation the applied educational assessment materials presented by the authors of our text(s) with the information available to educators in the state of Georgia concerning statewide assessment of student learning, and then contrasting Georgia students’ test data with test scores from students in other states . 35 7705-Test Construction Project The descriptors presented below will serve as the analytical checklist rubric as valued by the below assigned points. + = present, full credit 0= absent = no credit Possible Total = 20; Obtained =__________ __A. Introductions (2 points) 1. Identify yourselves and your schools, grades and content areas. ___B. Demographics of Your Schools (Total = 6 points) 1. Compare and Contrast Your INDIVIDUAL schools. This data will be located by to http://reportcard2007.gaosa.org/ School Reports Then go to Top Tabs = Student and School Indicators Compare and contrast one piece of data that you find relevant to assessment from each of the ____Community data ____Compensatory programs ____Selected Programs ____Enrollment by demographics Top Tab=Indicators Include one piece of data that you find relevant to assessment from each of the _____Retained Students ____Graduation Rate ____7-12 Drop Outs ____Attendance 2. Did your school fulfill the accountability demands of AYP? Go to Accountability Profile. Compare and contrast each schools AYP that now requires schools to meet criteria in three areas: Test Participation (for both Mathematics and Reading/English Language Arts), Academic Performance (for both Mathematics and Reading/English Language Arts), and a Second Indicator. What is your school’s second indicator? 36 7705-Test Construction Project ____C. Assessment/Test Results – CRCT (Middle Schools) GHSGT (High Schools) in Georgia (Total = 6 points) Locate information about your school’s testing program What state level tests are used? Any national tests? 1. Write, using bullets, very short descriptors of the test that includes test purpose, state mandates, content domains and content descriptions. http://www.doe.k12.ga.us/curriculum/testing/scores.asp 2. Compare and contrast your various schools’ scores. ____D. Assessment/Test Results- National 8th Grade NAEP Science, Math and Reading, 2005 (Total 6 points) Now, go to http://nces.ed.gov/nationsreportcard/states/ Compare and contrast scores from the 2006 8th grade math, reading, and science tests for Georgia and one northern, one western, and one eastern state. Then do the same for the states based on black/white ethnicity and gender. Use the data analysis tools (http://nces.ed.gov/nationsreportcard/nde/criteria.asp) to determine if these states produced 8th grade math test scores significantly higher than comparable scores from Georgia. 37 7705-Test Construction Project Professionalism Assignment #5 As indicated on your syllabus, professionalism will be evaluated in this course. I certainly am able to assess some aspects of professionalism, but as a class member, you all are more capable of assessing professionalism from your interactions with those in your group. When completing this form, remember that your group members, as KSU graduates will represent us all as a KSU graduate. So, here is your opportunity to evaluate yourself and your colleagues against the standards set by the College. If you and your class members have been professional in interactions as described below, then the expectations have been met and full credit should be given. If, on the other hand, you or your class members have let colleagues down by not interacting as described below, then full credit for professionalism should not be given. Please evaluate the professionalism and participation of your group members AND yourself as you reflect on the experiences of working with your class members. What is most important is that over time, the individual was consistently meeting or exceeding expectations, or that they improved over time. 1. Write the names of each of your class members across the top row of cells. 2. Score each class member’s participation and professionalism using the following scale placing the appropriate number in the space provided. 2= Meets or exceeds expectations. No improvement necessary. 1.5= Just meets expectations. A little improvement would make this person a better group member. 1= Below expectations. Needs improvement. 0= Really below expectations. Really needs improvement. GROUP MEMBERS Consistently and actively contributes knowledge required for assignment completion. Demonstrate punctuality and timely completion of responsibilities. Values the knowledge, opinion and skills of all group members and encourages their contribution. 38 7705-Test Construction Project 39