Item and Test Development Process Collaborative Conference for Student Achievement March 30–April 1, 2015 NCDPI/Division of Accountability Services Test Development 1 Test Development Presenters Dan Auman – English Language Arts Test Measurement Specialist, NCDPI Tom Englehart – Education Research & Evaluation Consultant, NCSU-TOPS Iris Irving – Social Studies Test Measurement Specialist, NCDPI 2 Assessment Cycle • NCDPI Curriculum & Instruction develop standards • State Board of Education adopt standards • Blueprint decided by Test Specification Panel • Items are developed for adopted standards • Field testing of items for adopted standards • Data analyzed from field test items • Operational testing of items 3 • • • • Phase 1: 4 months Phase 2: 12 months Phase 3: 20 months Phase 4: – 4 months/EOC – 9 months/EOG • Phase 5: 4 months • Total: 44-49 months 4 Frequently Asked Question Question: Did the test determine the content standards? Answer: No. Content standards drive item and test development. 5 TOPS • North Carolina State University/Technical Outreach for Public Schools • NCDPI Partner • Content Specialists • Item Development using NC teachers • Form Development (paper and online) • Production and Editing • Printing, Shipping, and Distribution • NCTEST Online Platform • NCFE Constructed Response Scoring 6 NC Teacher Involvement in Item Writing or Reviewing Recruitment information •1) Complete two courses on Test Development (Content Standards Overview and Test Development Basics) https://center.ncsu.edu/nc/x_courseNav/index. php?id=21 •2) Once online training courses are completed, teachers are interested in item writing or reviewing they can go to http://goo.gl/forms/wXv4Imh0ko to submit an interest form. 7 Test Development Basics • Items designed to be accessible to NC’s diverse student population • Vocabulary "TD101B: Test Development Basics." NC Education. North Carolina Department of Public Education. 8 Test Development Basics • • • • • • • • • Alignment to a standard Clear and distinct answer Foils formatted in logical order Minimize wording Use third person Avoid contractions, stereotypes, idioms Use grade-level appropriate language Plausible distractors Review for bias "TD101B: Test Development Basics." NC Education. North Carolina Department of Public Education. 9 Weak Improved Which describes a cloud that will most likely produce heavy rain, lightning, and thunder? Which describes a cloud that will most likely produce heavy rain, lightning, and thunder? A. The cloud will be tall and dark. B. The cloud will be low and gray. C. The cloud will be high and broken. D. The cloud will be high and wispy. A. tall and dark B. low and gray C. high and broken D. high and wispy "TD101B: Test Development Basics." NC Education. North Carolina Department of Public Education. 10 Weak Improved How much does a Channel Bass usually weigh? How much does a Channel Bass usually weigh? A. up to 75 pounds B. less than 100 pounds C. between 30 and 40 pounds D. more than 30 pounds A. Over 100 pounds B. Between 60 and 80 pounds C. between 30 and 40 pounds D. Less than 20 pounds "TD101B: Test Development Basics." NC Education. North Carolina Department of Public Education. 11 Weak Improved Which are symbols of North Carolina? I. cardinal II. honeybee III. Boykin spaniel IV. dogwood Which are symbols of North Carolina? A. cardinal and fire ant B. Boykin spaniel and shad boat C. Dogwood and soda D. Honeybee and Plott Hound A. I and III only B. II and III only C. III and IV only D. I, II, and IV only "TD101B: Test Development Basics." NC Education. North Carolina Department of Public Education. 12 Selection Review 13 Selection Review • English Language Arts items are all tied to selections; therefore, before writing items, selections must be reviewed and approved • Approved selections must go through the copyright process (with the exception of (works for hire or Public Domain) 14 Selection Review • Some things to keep in mind when reviewing selections: – Is the grade level appropriate? – Are words spelled correctly and appropriately? (with the understanding that texts must be allowed a level of authenticity) – Is the length appropriate for the assigned grade level? – Are there an excessive number of footnotes needed? (more than 3, then it probably belongs at a higher grade level) 15 Selection Review • Avoid using selections that contain: – Any focus on negative behavior or activities – References to holidays such as Christmas, Halloween, birthdays, etc. that are celebrated by some groups, but not others – Controversial topics such as magic, ghosts, witches, and death – Any focus on tragedies that have occurred in recent years in NC (floods, hurricanes, tornadoes) – Any articles that mention topics such as tobacco or alcohol 16 Item Development Process 17 Item Characteristics • Level of difficulty – Easy, medium, hard • Level of cognitive complexity – Example: Depth of Knowledge (DOK) or Revised Bloom’s Taxonomy (RBT) • Different types of items 18 Types of Items • Multiple-Choice (MC) • Technology Enhanced (TE) – Drag and Drop – Text Identify – String Replacement • Constructed Response (CR) – Gridded Response/Numeric Entry – Short Answer 19 Item Development Process 17 Steps • • • • Teachers • Production Content Lead • Editing Content Specialist • Subject-specific Test Measurement DPI-Curriculum & Specialist (TMS) Instruction • EC/ESL/VI 20 Steps 1–3 • Step 1: Item created – Assigned standard, DOK/RBT rating, knowledge type and cognitive category • Step 2: TOPS Content • Step 3: Production – Graphics, copyright check 21 Step 3 – Sample of Production Edits 4th Grade Science Released Item Fall 2015 (Item 5) Step 2: alluded to an image Step 3: Production created image 22 Sample Item: ELA at Step 2 (Grade 8 EOG Released Form 2012-13, Item 4) What does saffron mean in the selection? A) green B) yellow C) dark D) light 23 Step 4: Teacher Review • Select Answer • Match – Content/Course – Content/Standard – Content/DOK (NCSCS) or RBT (ES) – Content/Difficulty • Stem Quality • Plausible Distractors • Grade Level Vocabulary • Appropriate Graphics • Bias, insensitivity, accessibility issues • Overall item quailty • Additional comments 24 Step 4: Teacher Review 25 Steps 5–6: Reconciliation, Production • TOPS Content Specialist address feedback from teacher reviews – Teacher review suggested revision to Foil D 26 Step 7: DPI-Curriculum & Instruction, Exceptional Children (EC), English as a Second Language (ESL), Visually Impaired (VI) DPI-Curriculum: • Keyed correctly? • Match – Content – DOK for NCSCS, RBT for ES • Select aligned standard • Any bias, insensitivity, or accessibility issues? • Overall item quality EC/ESL/VI: • Any accessibility issues? • Overall item quality from ESL perspective • Overall item quality from VI perspective 27 Steps 8–9: Reconcile, Production • TOPS-Content Specialists would incorporate feedback from DPI-Curriculum & Instruction and EC/ESL/VI reviews as needed 28 Step 10: Test Measurement Specialist (TMS) • NCDPI/Test Development Section • TMS Review Screen 29 Steps 11–12: Reconcile, Production • Incorporate Test Measurement Specialist (TMS) suggestions • Sample item: Clarification in stem Steps 13–14: Grammar, Security • Copyright permissions, etc. 30 Step 15: Final Approval • TOPS Content Lead Step 16: Production Edits Step 17: Item Approved • Ready for placement on a form 31 Released Sample Item – Steps 16–17 Based on the sentences below, what does saffron mean? “But high up, their tops were green and caught the saffron light of the west. He remembered that when a boy, he had thought there was nothing more beautiful than the evening sunshine falling athwart the dark green fir boughs on the hills.” A) B) C) D) green yellow dark west 32 Form Building Process 33 Blueprint • Priority of Topics and Sub-topics determined by the Test Specification Panel –NC Teachers –DPI-Curriculum & Instruction –DPI-Exceptional Children –DPI-Test Development –Outside Content (e.g., university professors) 34 Blueprint, cont. • All forms of each test are built to the same test specification • Topic Level: – Forms within a subject have the same distribution of items by topic • Sub-topic Level: – Forms within a subject may or may not have the same distribution of items by subtopic 35 Example: Biology EOC -“Ecosystems” domain has 6 sub-standards - Items from “Ecosystems” will fall within the 18-22% total score points - May have variation in sub-standards tested across forms 36 Form Building Process and Blueprint • Test items on forms used from year to year are different. – Tests equivalent at the total score level, not at the sub-topic level. – Thus, forms from year to year may have more or less items on a particular topic or sub-topic. 37 Form Review 27 Steps • • • • • • Outside Content Specialist Content Lead Content Manager • Subject-specific NCDPI Test Measurement Content Specialist Specialist (TMS) Production, Editing • Psychometrician TOPS/IT Staff 38 Building Base Forms (e.g., Forms A, B, C) • Step 1: Operational items selected by psychometricians • Step 2: Production edits (as needed) • Step 3: TOPS Content review • Step 4: DPI-TMS Review/Key Balance 39 Building Base Forms, cont. • • • • • • • Step 5: TOPS Content Reconcile Step 6: Outside Content Key Check Step 7: TOPS Content Reconcile Step 8: Psychometric Review/Key Balance Step 9: Production Step 10: Grammar Step 11: Content Lead Review/Finalize Form 40 Frequently Asked Question Question: Does anyone review the item statistics for each item each year? Answer: Yes. Item statistics are reviewed for every item on every form after semester and yearlong test cycles as soon as a representative data sample is received by Accountability Services. 41 Embedded Sub-Form Review (e.g., A1, A2, A3) • NC field tests items by creating sub-forms • Each sub-form has the same operational items but different field test items • Field test items are developed, aligned to the content standards, and then piloted/field tested (item tryout) – 2 to 1 ratio for item needs • Field test item are not included in students’ scores • Items deemed to meet technical criteria based on the field test statistics are then placed on a test form the following year 42 Common Questions Why are items field tested? • Before being placed on a test form, item statistics are needed to control the overall difficulty and reliability of a form. Is one test form harder than another form? (EOC/EOG) • No, all of the forms (online and paper) of a given test for a grade/subject are equivalent with respect to content and difficulty (one form is not harder or easier than another) 43 EOG/EOC Resources • Released Forms http://www.ncpublicschools.org/accountability/testing/releasedforms • EOG Test Specifications http://www.ncpublicschools.org/accountability/testing/eog/ • EOC Test Specifications http://www.ncpublicschools.org/accountability/testing/eoc/ • Technical Reports http://www.ncpublicschools.org/accountability/testing/technicalnotes • Guidelines, Practice and Examples Math Gridded Response Items http://www.ncpublicschools.org/accountability/testing/eoc/ 44 NC Final Exams Resources • NC Final Exams (test specs, released forms/items, reference sheets) http://www.ncpublicschools.org/accountability/common-exams/ • Fall 2014 Released Item Sets http://www.ncpublicschools.org/accountability/common-exams/releaseditems/ • Assessment Specifications http://www.ncpublicschools.org/accountability/commonexams/specifications/ 45 Resources • NC Testing Program Overview http://www.ncpublicschools.org/docs/accountability/1415testovervie w.pdf • NC Testing Calendar http://www.ncpublicschools.org/docs/accountability/testing/calendar s/1415optestcal.pdf 46 Questions? Thank you! 47