Understanding End-of-Grade (EOG) and End-of-Course (EOC) Assessments

advertisement

Understanding End-of-Grade

(EOG) and End-of-Course (EOC)

Assessments

Collaborative Conference for Student Achievement

March 4, 2014

NCDPI/Division of Accountability Services

Test Development Section

1

NCDPI

Test Development Section

Hope Lung – Section Chief

Dan Auman – English Language Arts

Test Measurement Specialist

Michael Gallagher, Ph.D. – Mathematics

Test Measurement Specialist

Garron Gianopulos, Ph.D. – Psychometrician

Jami-Jon Pearson, Ph.D. – Psychometrician

2

Presentation Outline

• Overview of End-of-Grade (EOG) &

End-of-Course (EOC) Assessments

• Measurement Principles

• How EOG/EOC Tests Are Built

• Reporting

• Frequently Asked Questions (FAQ)

3

The Mission of the Accountability

Services Division is to…

(a) promote the academic achievement of all North Carolina public school students and (b) assist stakeholders in understanding and gauging this achievement against state and national standards

4

Implementation is three-fold…

• reliable and valid assessment instruments

• suitable assessment instruments for all students

• accurate and statistically appropriate reports

5

Frequently Asked Question

Question:

Did the test determine the content standards?

Answer:

NO. Content standards drive item and test development.

6

Comprehensive Balanced

Assessment System

Aligned to State

Standards

Classroom

Interim/Benchmark

Statewide Summative

Types of Assessment

Type Frequency Evaluate Purpose/Reporting

Classroom

/Formative

(local)

Interim/

Benchmark

(local)

Daily

Multiple per year

Minute-tominute

Mastery of set of standards

Descriptive feedback to adjust instruction & support learning

Inform stakeholders

(e.g., teachers, students, parents)

Summative

EOG/EOC

(statewide)

End of year/ course

Overall mastery

High stakes accountability

8

Why is statewide assessment important?

Environment conducive for learning?

Moving all students forward?

Are we preparing our

STUDENTS to be successful?

STATE

Sound basic education to every student?

9

Frequently Asked Question

Question:

Are there close to 200 state-mandated tests in

NC?

Answer:

NO. Reference the following for the testing calendar and assessment options: http://www.ncpublicschools.org/accountability/p olicies/

10

Guiding Philosophy

• Scores serve as one piece of evidence

(snapshot) regarding students’ knowledge

& skills.

• Instruction and placement decisions should be made based on a culmination of evidence.

• Scores are most reliable at the scale score level.

12

Measurement Principles:

Assessment Literacy

Validity

Reliability

13

EOG/EOC Measurement

Two important goals are to…

1) Achieve the most reliable and accurate snapshot of student achievement with minimal impact on instructional time within legislative guidelines.

• Fewer items per assessment

• Shorter test administration time

14

EOG/EOC Measurement

Two important goals are to…

2) Remove bias using valid and reliable psychometric methods throughout the test development process.

15

EOG/EOC Measurement

• Validity depends on evidence that the test is measuring what it is designed to measure

• Reliability depends on scores being consistent

• Must have reliability in order to have validity

16

Types of Validity

• Content Validity

– Items are carefully aligned to the content standards

– The NCDPI also contracts to have independent alignment studies of its assessments

• Concurrent Validity

– Correlation of student performance with other measures

17

Reliability

Reliability refers to the consistency of the scores on a test ( if a test were repeated, a student would obtain close to the same score )

State uses coefficient alpha to estimate reliability

• The industry standard for state assessments used for accountability purposes is a coefficient alpha of

.85 or higher;

– the EOGs & EOCs exceed that value

18

Table of Reliability

SUBJECT

ELA

Math

Science

* Minimum of .85 is standard

Cronbach’s Alpha

.88 - .92

.90 - .93

.90 - .92

19

When interpreting score averages…

• Classroom-level data are more reliable than student-level data

• School-level data are more reliable than classroom-level data

• LEA-level data are more reliable than school-level data

• State-level data are more reliable than LEA-level data

More observations = more reliable inferences

20

Conclusion

• Recall our goals

– Minimizing impact on instructional time

– Valid and reliable overall scale score

• Giving reliable scores at the sub-topic* level would require MORE

– Items and testing time

• double or triple item amounts on tests

21

*Topic and *Sub-Topic

Umbrella terms used for curriculum clarification

• 1 st and 2 nd level of curriculum structure

– CCSS ELA Strand/Standard

• RI.4.3 - Strand RI (grade 4), Standard 3

– CCSS Math is Domain/Cluster/Standard

• 3.NF.A.2 – Grade 3, Domain NF, Cluster A, Standard 3

– ES Science is Standard/Clarifying Objective

• 5.E.1 – Standard 5 (Earth Systems), Clarifying

Objective 1

22

How Are Tests Built?

Common Questions

State Standards

Items/Forms

Technical Quality

Standard Setting

The answer will dictate what types of data are available and how those data can be used.

23

EOG/EOC OVERVIEW: Common

Questions

1.

Who writes the items?

• Teachers who successfully complete an online training system for item writing are contracted to write items for specific content standards.

2.

How are the test specifications determined?

• The NCDPI convenes teachers and content experts ( Test

Specification Panel ) to provide input on what percent of the test should cover groups of the content standards

(Blueprint).

24

EOG/EOC OVERVIEW: Common

Questions

• Blueprint

–Priority of Topics and Sub-topics determined by the Test Specification Panel

• NC Teachers

• DPI-Curriculum

• DPI-Exceptional Children

• DPI-Test Development

• Outside Content (e.g., professors)

25

EOG/EOC OVERVIEW: Common

Questions

3

.

Why are items field tested?

• Before being placed on a test form, the psychometricians need item statistics to control the overall difficulty and reliability of a form.

4

.

Is one test form harder than another form?

• No, all of the forms ( online and paper ) of a given test for a grade/subject are equivalent.

26

State Standards: What we measure…

• Common Core State Standards ( ELA & Math ) &

Essential Standards ( Science )

– More rigorous standards aligned to College and

Career Readiness

• Global competitiveness

– Same standards across states

• Could compare achievement

• Continuity

– Adopted by State Board of Education (SBE) in 2010

27

Items and Forms

28

Frequently Asked Question

Question:

Does anyone review the item statistics for each item each year?

Answer:

YES. Item statistics are reviewed for every item on every form after semester and yearlong test cycles as soon as a representative data sample is received by Accountability Services.

29

The Life of a EOG/EOC item…

State Board of Education (SBE) Policy GCS-A-013 specified the process used to develop tests for the

North Carolina Testing Program:

• Content Standards are adopted by the SBE

• Test Specifications are determined with input from teachers and content experts

• Items are developed, aligned to the content standards, and then piloted/field tested (item tryout)

– 3 to 1 ratio for item needs

• Items deemed to meet technical criteria based on the field test statistics are then placed on a test form

30

Item Characteristics

• Level of difficulty

– Easy, medium, hard

• Level of cognitive complexity

– Example: Depth of Knowledge (DOK)

• Different types of items

31

Types of Items

• Multiple-Choice (MC)

• Technology Enhanced (TE)

– Drag and Drop (DD)

– Text Identify(TI)

– String Replacement (SR)

• Constructed Response (CR)

– Gridded Response(GR)/Numeric Entry (NE)

– Short Answer (SA)

32

Item Review

17 Steps

• Teachers

• Content Lead

• Content Specialist

• DPI-Curriculum

• EC/ESL/VI

• Production

• Editing

• Subject-specific

Test Measurement

Specialist (TMS)

33

The Life of a EOG/EOC test form…

• Test forms are reviewed for technical quality and content alignment

– Teachers, content experts, editing staff, and NCDPI curriculum staff

– Psychometricians affirm forms of a given test are equivalent with respect to content and difficulty ( forms not harder or easier than each other )

34

The Life of a EOG/EOC test form…

• Tests are administered to students

• In the first year of administration, the standard setting is conducted and achievement levels are set

• Psychometricians continue to monitor the forms to ensure equivalency

35

Form Review

27 Steps

• Content Lead • Outside Content Specialist

• Content Manager • Subject-specific Test

• Content Specialist

Measurement Specialist

• Production, Editing

• Psychometrician

• IT Staff

(TMS)

36

Operational Item Review Operational Form Review

EOG/EOCs Form Construction

• All forms of each test are built to the same test specification

• Topic Level

:

– Forms within a subject have the same distribution of items by topic

• Sub-topic Level:

– Forms within a subject may or may not have the same distribution of items by subtopic

38

Three Year Timeline

• 2010-11 Item Writing/Reviewing/Pilot Test

• 2011-12 Field test

• 2012-13 Operational forms

– First year requires standard setting

– Continually add new forms each successive year ( Pre-equated to first year forms )

39

If you would like to participate in item writing or review…

Website

• https://center.ncsu.edu/nc/login/index.php

40

Technically Sound

41

How do we know the EOG/EOCs are technically sound?

We adhere to the Standards for Educational and

Psychological Testing

• American Psychological Association (APA)

• American Educational Research Association

(AERA)

• National Council on Measurement Standards

(NCME)

42

How do we know the EOG/EOCs are technically sound?

The development process and the analysis plan are reviewed by Technical Advisors,

National/State Leaders in…

• Educational Assessment

• Standard Setting

• Psychometrics

• District Level Accountability/Research

• Exceptional Children Research

• Others

43

How do we know the EOG/EOCs are technically sound?

Evidence from the tests is submitted to the U.S.

Department of Education (USDOE) for Peer Review.

Some examples are…

• Valid & Reliable

• Fair & Accessible to all students

• Technical Quality of items & forms

• Alignment

• Content Standards

44

Standard Setting

45

Frequently Asked Question

Question:

Are the cut scores (Levels 1-4) determined by

NCDPI staff?

Answer:

NO. Cut scores are determined by NC educators at a standard setting meeting. As a matter of fact, the meeting itself is managed and facilitated by an outside vendor. The NC State Board of Education

(SBE) must approve the recommended cut scores or request specific changes.

46

Standard Setting General

Assessments

• Nationally recognized company did study

(Vendor)

• Vendor managed work while DPI observed

• Multi-day in-person meeting

• Use a Proven Test-Based Method

– Item Mapping (Bookmark)

– Three requirements: Item Map, ALDs, & panelists

( educators ) to make content judgments

47

Item Map- Ordered Item Booklet

(OIB)

• Review OIB

– Items (2012-13 administration) ordered in ascending difficulty as calculated from actual student performance

– Each item on single page

– Panelist identifies the last item where a “just barely” student would get item correct more often than not

48

Item Map- Ordered Item Booklet

(OIB)

EOG/EOC

Items Scale

49

Achievement Level Descriptors

(ALDs) Development & Purpose

• Reviewed current content standards

• Developed ALDs

– Knowledge/skills expected at level

– Eventually used in score reporting

– Used to define entry points to each level

• Developed “Just Barely” Descriptors

– These define the entry points of each achievement level.

– These represent the minimum knowledge and skill needed to be proficient.

50

NC READY Achievement Levels

Achievement Level 4 - Superior Command

Achievement Level 3 - Solid Command

Achievement Level 2 - Partial Command

Achievement Level 1 - Limited Command

51

Overview: Proficiency cut placement

Achievement Levels

Superior command

Solid command

Partial command

Limited command

EOG/EOC

Items Scale

What do the scores mean?

52

Overview: Proficiency cut placement

Achievement Levels

Superior command

Solid command

Partial command

Limited command

EOG/EOC

Items Scale

On Track

ALDs capture the meaning of the scale levels.

53

Standard Setting: Teacher Panels

• Judgment of experts in content and student population

– Over 200 NC Educators participated

• 16-20 per panel

– Post-secondary educators were represented

• Experts in content

– Each region was represented

– Each grade-level and content area was represented

• Grade-band groups

– 3-5; 6-8; High School

54

Three Rounds of Judgment

• Round 1

– Set cuts, then broke into small groups, discussed cut scores, and why they chose the cuts

• Round 2

– Set cuts, then reviewed external validity measures

• EXPLORE/ACT linking study

• Percentage of 2013 test takers in each level (initial impact data)

• Round 3

– Set final teacher grade level panel cut, received final impact data, and completed evaluation survey

55

Vertical Articulation

• Last step

– Subset of teacher panelists

– Reviewed results across grades and/or subjects

– Small adjustments made to smooth the percentages across grades within each content area

56

Standard Setting: SBE Approved

• After all Standard Setting meetings, the next phase…

– Policymakers: SBE, NCDPI ( Curriculum, Exceptional

Children, & Accountability )

– Presented “teacher panel” recommendations to SBE for approval

• Ultimately a policy decision by SBE

– Presented September 2013

– Adopted October 2013

57

Reporting:

Individual Student Report

Class Rosters

Goal Summary Reports

Accountability Reports

58

Frequently Asked Question

Question:

Did the NCDPI delay 2012-13 scores in case the test was bad?

Answer:

NO. There must be a standard setting process to determine Levels 1-4 when new assessments are administered. A full year’s worth of data is necessary before teachers can be convened for the standard setting process.

59

Frequently Asked Question

Question:

Did NCDPI require districts to use a 0-100 score as at least 20% of the student’s final grade during

2012-13?

Answer:

NO. The NCDPI waived NC State Board of

Education policy GCS-C-003 during the 2012-13 school year.

(http://sbepolicy.dpi.state.nc.us/Policies/GCS-C-

003.asp?Acr=GCS&Cat=C&Pol=003)

60

Different Reports with which you may be familiar…

Level Report

Student Individual Student Report

Classroom

School

LEA

Goal Summary

Class Roster

Goal Summary

Class Roster

Accountability Reports

State Accountability Reports

61

Example: Goal Report at Teacher Level

Profile based on blueprint

Profile based on

Common

Core

Categories

62

Common Problems with Reporting,

Interpreting, and Using Subscores

• Subscores are usually highly correlated. So, differences between subscores are often small.

• Subscores at the topic level are often not specific enough to inform instructional practice.

• Subscores at the sub-topic level are often very specific, but usually have too few items to be reliable.

63

• Scores with less than 10 items are typically too unreliable to interpret. Mean scores from spiraled forms improve reliability. 64

The Goal Summary Report

Provides valid data about curriculum implementation only when…

• all forms in the spiral are administered within the same classroom/school/LEA

• there are at least five students per form (with multiple forms) , and

• approximately equal numbers of students have taken each form

Cannot provide valid data when we don’t have enough participation or only one form.

65

Four Steps to Interpreting the Goal

Summary Reports

1. Transform the percentage of items per form to the raw number of items.

2. Transform weighted mean percent correct to raw subscore correct.

3. Determine the room for improvement by subtracting the raw subscore correct from the number of items per reporting category.

4. Triangulate results with other measures.

66

Math I Example from Fall 2013

67

Step 1

Transform the percentage of items per form to the raw number of items.

(49 X .306 = 15 items)

68

Step 2

Transform weighted mean percent correct to raw subscore correct (15 X

.458= 6.87 items).

69

Step 3

Determine the room for

improvement by subtracting the raw subscore correct from the number of items per reporting category (15 -

6.87 = 8.13)

Room for

Improvement

8.13

70

Interpreting Goal Summary Reports

Room for

Improvement

8.13

12.67

1.74

1.96

2.93

71

Interpreting Goal Summary Reports

Actual Correlations

Functions r= .73

Algebra r= 0.68

Be more confident in the longer more reliable subscores.

72

Always Remember…

• Test items on forms used from year to year are different.

– Tests equivalent at the total score level, not at the sub-topic level.

– Thus, forms from year to year may have more or less difficult items on a particular topic or sub-topic.

73

Tests are only one source of data

• Augment school-level Goal Summary

Reports with other sources of evidence like classroom observation and interaction.

– Goal Summary Reports

– Classroom observations

– Formative Assessment

– Benchmark Assessment

– Pacing guides

– Lesson plans

– Peer observations

74

Comparing Scores:

Examples of “Apples to Apples”

• Same year

– State to LEA average scale

– State to School average scale score

– State to LEA or School percent correct for the same topic

• Across years

– State, LEA, or School average scale scores within curriculum cycle across years

– Student percentiles across years

75

Resources

• Released Forms http://www.ncpublicschools.org/accountability/t esting/releasedforms

• Test Specifications http://www.ncpublicschools.org/accountability/

• Technical Reports (coming in May) http://www.ncpublicschools.org/accountability/t esting/technicalnotes

76

Resources

• Guidelines, Practice and Examples Math Gridded

Response Items http://www.ncpublicschools.org/accountability/

• NC Final Exams (test specs, released forms/items, reference sheets) http://www.ncpublicschools.org/accountability/commo n-exams/

• NC Testing Program Overview http://www.ncpublicschools.org/docs/accountability/nc testoverview1314.pdf

77

Resources

• NC READY Accountability Report http://www.ncaccountabilitymodel.org/

• Achievement Level Information

(cut scores

& descriptors) http://www.ncpublicschools.org/accountab ility/testing/shared/achievelevel/

78

Questions?

Thank you!

79

Download