ConcordanceStudies

advertisement
Representative Samples and
PARCC to MCAS Concordance Studies
This report describes the methods and outcomes for a) selecting representative
samples of test-takers for MCAS and PARCC in 2015, and b) identifying estimated
MCAS results for PARCC test-takers.
February 2016
Massachusetts Department of Elementary and Secondary Education
75 Pleasant Street, Malden, MA 02148-4906
Phone 781-338-3000 TTY: N.E.T. Relay 800-439-2370
www.doe.mass.edu
This document was prepared by the
Massachusetts Department of Elementary and Secondary Education
Mitchell D. Chester, Ed.D.
Commissioner
The Massachusetts Department of Elementary and Secondary Education, an affirmative action employer,
is committed to ensuring that all of its programs and facilities are accessible to all members of the public.
We do not discriminate on the basis of age, color, disability, national origin, race, religion, sex, gender identity,
or sexual orientation.
Inquiries regarding the Department’s compliance with Title IX and other civil rights laws may be directed to the
Human Resources Director, 75 Pleasant St., Malden, MA 02148-4906. Phone: 781-338-6105.
© 2016 Massachusetts Department of Elementary and Secondary Education
Permission is hereby granted to copy any or all parts of this document for non-commercial educational purposes.
Please credit the “Massachusetts Department of Elementary and Secondary Education.”
This document printed on recycled paper
Massachusetts Department of Elementary and Secondary Education
75 Pleasant Street, Malden, MA 02148-4906
Phone 781-338-3000 TTY: N.E.T. Relay 800-439-2370
www.doe.mass.edu
Table of Contents
Introduction .......................................................................................................................... 1
Background and Purpose ....................................................................................................... 1
PART 1: SELECTING REPRESENTATIVE SAMPLES
The Need for Representative Samples ................................................................................... 3
Method to Identify Representative Samples .......................................................................... 5
Results from the Representative Sample Study ...................................................................... 7
PART 2: CONCORDANCE TABLES AND GUIDANCE FOR USE OF DATA
Concordance Tables Comparing MCAS to PARCC Results ...................................................... 10
Introduction ................................................................................................................................ 10
Methods for Generating ............................................................................................................... 10
Composite Performance Index (CPI) Results for PARCC Schools and Districts ................................. 13
Guidance for Using Representative Samples and Concordance Tables .................................. 15
Concordance Tables ..................................................................................................................... 15
Conducting Analyses at the State Level with Representative Samples............................................ 15
Conducting Analyses that are Not State-Level ............................................................................... 16
References .......................................................................................................................... 18
Appendix A: Proof-of-Concept Study.................................................................................... 19
Counts ......................................................................................................................................... 19
Balance........................................................................................................................................ 19
Replication of 2013–14 Psychometric Results................................................................................ 21
Replication of 2013–14 Student Growth Percentiles ...................................................................... 23
Replication of 2013–14 Accountability Results .............................................................................. 24
Summary of Results from the Proof-of-Concept Study .................................................................. 25
Appendix B: Method Used to Select Representative Samples ............................................... 26
Appendix C: Logistic Regression Variables and Results ......................................................... 29
Introduction
During the 2014–15 school year, school districts in Massachusetts were offered a choice regarding their
grades 3–8 summative testing programs: whether to participate in MCAS or PARCC. In order to generate
stable trends for the 2014–15 school year, the State embarked on two analytical studies. The first
addressed non-equivalence in MCAS and PARCC samples of test-takers through the selection of
representative samples from each group. The second estimated MCAS scores for PARCC test-takers to
generate Composite Performance Index values (CPIs, which are measures of proficiency for schools and
districts).
Although each test was taken by roughly half of the grades 3–8 examinees, demographic differences
between the two groups of examinees remained. If left unaddressed, these demographic differences
would distort state trends and other analyses. To reduce unintended differences between the two
groups of examinees, the Department, with assistance from national testing experts (members of the
MCAS Technical Assistance Committee), developed a method to select representative samples from the
total samples of examinees taking MCAS and PARCC in 2015. This first analysis produced representative
samples of examinees taking MCAS and PARCC that were significantly more similar to each other than
the total samples were.
The second analysis used the representative samples produced in the first analysis to match MCAS
scores for examinees, by grade and subject/test, to PARCC scores, using an equipercentile linking
approach (which links scores across the distributions of the two tests). The resulting data were used to
generate CPIs for students, schools, and districts.
This report details the methods used to identify representative samples for MCAS and PARCC test-takers
and the methods used to estimate MCAS scores for PARCC examinees, and presents outcomes from
both analyses to show how well each study worked. Guidance for using the representative samples is
also provided.
Background and Purpose
Massachusetts has administered its Massachusetts Comprehensive Assessment System (MCAS) tests in
English language arts and mathematics every year since 1998. In 2010 it joined the PARCC consortium to
develop new tests aimed at measuring college and career readiness. In 2013–14 Massachusetts
participated in PARCC field testing, and in 2014–15 Massachusetts continued its trial of the PARCC test
for a second year while continuing to administer the MCAS.
For the spring 2015 test administration, Massachusetts public school districts serving grades 3 to 8 were
offered the option to administer either the MCAS or PARCC tests in English language arts and
mathematics.1 Because districts were not assigned randomly to take PARCC or MCAS, the groups of
students who took MCAS were likely to be systematically different (i.e., higher- or lower-performing or
having different demographic characteristics) than those who took PARCC. When samples
1
The state’s three largest districts (Boston, Worcester, and Springfield) were offered the opportunity to choose
PARCC or MCAS school by school rather than district-wide. All districts that selected PARCC had the option of
administering the test online or on paper (e.g., choice by mode).
1
systematically differ, it interferes with the ability to observe whether changes in state-level student
achievement from one year to the next are due to actual changes in performance or to differences in
the samples (or both), and simply combining results from the two assessments will not produce an
accurate picture of statewide performance.
To address this issue, the State developed a methodology to identify samples of 2015 MCAS and PARCC
test-takers that were representative of all students in the state. These students’ performance would be
used to determine how MCAS and PARCC results compared and could be linked. The purposes for doing
so were
 to report state-level results for 2015, including results from both MCAS and PARCC test-takers;
 to maintain trends for MCAS results relative to prior years;
 to calculate student growth percentiles (SGPs) for MCAS and PARCC test-takers; and
 to calculate accountability levels for all districts and schools; PARCC accountability levels are
calculated using concordance tables that identify associated MCAS score estimates for a range
of PARCC scores.
Part 1 of this report explains in further detail the need for representative samples, and describes the
methodology the Department used to select them. Part 2 of the report explains the process for
generating the concordance tables linking PARCC results to MCAS, and provides guidance about how to
interpret and use assessment data from the 2015 school year.
2
PART 1: SELECTING REPRESENTATIVE SAMPLES
The Need for Representative Samples
As expected the students taking MCAS and PARCC were not equivalent, with differences on prior
performance and student demographic variables. In terms of numbers, although about 50% of the
districts participated in each test2, the number of PARCC test-takers was slightly higher. Table 1
compares the numbers of districts and grades 3–8 students that participated in PARCC and MCAS. The
full list of district choices for the 2015 assessments is available on the State PARCC website, in the Excel
file (“list by district”).
Table 1: District Assessment Choices for 2015
Assessment Choices for Spring 2015*
# of
Districts
% of
Districts
# of
Students
% of
Students
MCAS
PARCC
230
192
55%
45%
197480
225572
47%
53%
Total
422
100%
423052
100%
*District counts do not include the three largest districts or any singleschool district. Schools in the three largest districts (Boston, Springfield,
and Worcester) were assigned either MCAS or PARCC. In single-school
districts, 188 districts administered MCAS and 6 administered PARCC.
MCAS and PARCC 2015 test-takers scored similarly on MCAS in 2014, as shown in Table 2. In both
English language arts and mathematics, the percentages scoring at each proficiency level are similar
across the assessments, with the 2015 MCAS test-takers performing slightly higher at the Advanced
level.
2
This estimate does not include single-school districts – 188 out of 194 single-school districts administered MCAS
in 2015.
3
Table 2: 2014 MCAS Results for 2015 MCAS and PARCC Test-Takers
Group Achievement Levels and SGP Differences, Grades 3–8
ELA Achievement Level: Advanced
ELA Achievement Level: Proficient
ELA Achievement Level: Needs Improvement
ELA Achievement Level: Warning
ELA Student Growth Percentile
Total Number ELA
Math Achievement Level: Advanced
Math Achievement Level: Proficient
Math Achievement Level: Needs Improvement
Math Achievement Level: Warning
Math Student Growth Percentile
Total Number Math
2014 Average
MCAS & PARCC
Test-Takers
14.4%
52.6%
25.2%
7.8%
50.1
410811
24.7%
33.3%
27.5%
14.5%
50.2
412005
2014 MCAS
Results of 2015
MCAS-takers
15.1%
52.8%
24.3%
7.9%
50.2
187465
25.5%
33.5%
26.7%
14.3%
50.6
187704
2014 MCAS
Results of 2015
PARCC-takers
13.8%
52.4%
26.0%
7.8%
49.9
223346
24.1%
33.3%
28.1%
14.6%
49.8
224301
Table 3 compares MCAS and PARCC test-takers by demographic characteristics. The demographic
differences between the two are somewhat larger than the achievement differences, driven in part by
the decision in some large school districts to administer PARCC. Overall, students with higher needs are
more heavily weighted in the PARCC sample.
4
Table 3: 2014 Demographics for 2015 MCAS and PARCC Test-Takers
Group Demographic Differences, Across Grades
Ever ELL
High Needs*
Free/Reduced Lunch**
Race: AA/Black
Race: Asian
Race: Hispanic
Race: White
Race: More than One
Race: Other
Race: AA/Hispanic
No Special Needs Services
Minimal Hours Special Needs Services
Low Hours Special Needs Services
Moderate Hours Special Needs Services
High Hours Special Needs Services
Total N
2014 Overall
Population
14.5%
49.1%
39.1%
8.3%
6.2%
16.4%
66.0%
3.0%
0.3%
24.6%
81.8%
2.7%
3.6%
9.6%
2.2%
442982
2015 MCAStakers
15.1%
47.5%
36.7%
5.1%
7.5%
17.3%
67.2%
2.9%
0.3%
22.5%
81.1%
2.7%
3.4%
9.3%
3.4%
202938
2015 PARCCtakers
17.0%
54.0%
44.8%
11.0%
6.0%
18.8%
60.7%
3.1%
0.4%
29.8%
81.3%
2.7%
3.5%
9.2%
3.3%
240044
*High Needs Students belong to at least one of these groups: current/former English Language Learner (ELL), low
income, student with disabilities.
**2014 Values, Imputed.
Although the demographic differences between MCAS and PARCC test-takers are not great, they are
large enough to call into question whether the two groups can fairly be compared without making an
adjustment for selection bias.
Method to Identify Representative Samples
The process used to identify representative samples involved matching each of the 2015 testing
populations (MCAS test-takers and PARCC test-takers) to the characteristics of the overall 2014 MCAS
population using student-level data. (The Department chose 2014 as the target population because
2014 was the last year for which the state has statewide results on a single assessment: MCAS). By
removing from each 2015 sample those test-takers who were most dissimilar to the 2014 test-takers,
the Department was able to create two 2015 samples that are well-matched to the 2014 student
population. By definition, the two 2015 samples are also roughly equivalent. This matching process is
represented visually in the logic model in Figure 1.
5
Figure 1: Logic Model for the Sample-Matching Study
The methodology for selecting representative samples is a variation of propensity score matching, a
statistical technique commonly used to estimate the impact of a treatment when participants are not
randomly assigned to it (Angrist & Pischke, 2009; Austin, 2011; Murnane & Willett, 2011; Rosenbaum,
2010). The situation here is not precisely analogous, as the self-selection into the MCAS or PARCC test is
determined by districts, not by student characteristics. But the principle applies nonetheless: we can
identify a representative sample of students who are similar to one another in all measurable ways
except the assignment of taking MCAS or PARCC. We can then use these representative groups to
estimate state findings.
The propensity score matching conducted in this analysis used prior MCAS results and student
demographic variables to match test-takers in each sample (MCAS and PARCC) in the current year to the
population of test-takers in the prior year. (It should be noted that prior MCAS results were emphasized
in the analysis, resulting in better balance on prior achievement than on demographic variables,
although it will be shown that the method worked to create better balance on both sets of variables.)
The method worked by removing test-takers who were more unlike the prior year’s population of testtakers, creating two sets of representative samples comprised of test-takers more like those of the prior
year’s population of students.
Results using this methodology were evaluated in a “proof-of-concept study” that applied the method
to draw representative samples in 2014 that were equivalent to the population of examinees in 2013. If
the method worked well, then we would expect to get identical results for analyses conducted in 2014,
which we did. The four critical checks conducted and the results were
1) The prior achievement and key demographic variables looked similar across the samples and
were similar to the prior year’s data (2013).
2) The MCAS cut scores (i.e., the raw scores that correspond with the MCAS achievement levels of
“220, Needs Improvement,” “240, Proficient,” and “260, Advanced”) were replicated for the
representative sample of examinees assigned to MCAS in 2014.3
3
Each year, the current year’s MCAS results are linked to the prior year’s results using a method called “equating.”
The equating method identifies the raw scores for each MCAS achievement level (e.g., 220 is Needs Improvement)
that yield consistent measurements from the year prior. In other words, the equating method establishes
consistency in the MCAS measurement scales.
6
3) The student growth percentiles (SGPs) had a uniform (flat) distribution with a median at or near
50.4 The majority of SGPs generated using the representative samples were the same as or very
close to the actual SGPs.
4) School- and district-level accountability results were nearly equivalent to what was reported in
2014 for both samples.
The proof-of-concept study provided evidence that the methodology worked well. Consequently, the
State should be able to use the representative samples as the data source for psychometric and
analytical work and still obtain the same results as it would have if it had used the full sample. A full
presentation of the evidence from the proof-of-concept study is presented in Appendix A.
The proof-of-concept study also allowed the State to establish the methodology for selecting the
samples prior to the generation of 2015 assessment data, to avoid any concern that the State might
select a sampling strategy that would advantage students who took one or the other assessment.
Using a slightly refined methodology, the same analysis used in the proof-of-concept study was
conducted for 2015 to select representative samples of MCAS and PARCC test-takers from the 2015
administration, measuring their representativeness by the characteristics of the state in 2014. Further
details on the matching methodology are provided in Appendix B.
Results from the Representative Sample Study
The number of overall test-takers and the number of students selected for each representative sample
are shown in Table 4.
Table 4: PARCC and MCAS Samples for 2015
PARCC and MCAS Samples, 2015
Grade 3
Grade 4
Grade 5
Grade 6
Grade 7
Grade 8
Total
Total
MCAS
33251
33205
33962
33978
33579
34963
202938
MCAS
MCAS
Rep.
Sample
25086
25324
26058
25357
26154
26252
154231
MCAS %
Removed
25%
24%
23%
25%
22%
25%
24%
Total
PARCC
39534
39114
39828
40284
40327
40957
240044
PARCC
PARCC
Rep.
Sample
29704
30026
30416
30198
30624
31209
182177
PARCC %
Removed
25%
23%
24%
25%
24%
24%
24%
Approximately 75 percent of test-takers were retained in each representative sample. Retaining a large
N was important to minimize error, particularly for down-the-line calculations such as student growth
percentiles that depend on a large amount of student data to be estimated accurately.
4
Student growth percentiles, by definition, have a flat, uniform distribution with a median of 50 and a roughly
equal percentage of students in each percentile, from 1 to 99.
7
Looking first at how well the representative samples are matched to the population in 2014 and to each
other, Tables 5 and 6 demonstrate that the MCAS and PARCC samples are well-matched to the state on
students’ prior performance and demographic characteristics. As shown in Table 5, the MCAS sample is
nearly identical on prior performance to MCAS test-takers as a whole, but the PARCC representative
sample selects disproportionately from higher-performing PARCC test-takers to make the sample more
similar to the state.
Table 5: Comparison of Achievement Outcomes for MCAS and PARCC Test-Takers,
by Grade and Sample, to 2014 MCAS Population
Comparison of Achievement Outcomes for 2015 Test-Takers, by Grade and Sample
to 2014 Population
2014 MCAS
All 2015
All 2015
2015 Rep.
Population
MCAS TestPARCC Test2015 Rep.
Sample
Average
Takers
Takers
Sample MCAS
PARCC
Gr. 3 *
53%
54%
51%
53%
53%
Gr. 4–8**
50%
51%
49%
51%
51%
Average Ach. Gr. 4–8***
58%
58%
55%
56%
56%
*2014 Achievement Outcome Grade 3: Estimated percent scoring Proficient+ on MCAS ELA & Math, by school and
demographic group
**2014 Achievement Outcome Grade 4–8: Percent scoring Proficient+ on MCAS ELA & Math
***Average percent of examinees scoring Proficient+ on 2014 MCAS ELA and Math, separately
As shown in Table 6, the MCAS and PARCC representative samples are fairly equivalent across most
demographic comparisons. The largest differences are identified in the Black/African American and High
Needs categories, again likely stemming from the choice of some large school districts to administer
PARCC. The representative samples do balance this difference somewhat, but the PARCC representative
sample still has slightly higher percentages of test-takers in these categories (along with fewer White
students) than the 2014 Population and the 2015 representative sample for MCAS. In addition, the
PARCC sample has slightly more examinees who were English language learners or who received free- or
reduced-priced lunch in 2014.
Table 6: Comparison of Demographics for 2015 MCAS and PARCC Test-Takers to
2014 Population of Examinees
Comparison of 2015 Demographics to 2014 Examinee Population
Demographic
Ever ELL
High Needs*
Free Lunch (2014, imp.)**
Race: Black/African American
Race: Asian
Race: Hispanic
Race: White
Race: Other
Special Education
2014
Population
14.7%
47.2%
38.0%
8.5%
5.8%
15.3%
67.7%
0.3%
16.9%
All 2015
MCAS-Takers
15.1%
47.5%
36.7%
5.1%
7.0%
17.3%
67.2%
0.3%
17.7%
All 2015
PARCC-Takers
17.0%
54.0%
44.8%
11.0%
5.9%
18.8%
60.7%
0.4%
17.6%
2015 Rep.
Sample MCAS
14.2%
46.0%
35.6%
5.7%
7.0%
16.0%
67.9%
0.3%
17.2%
2015 Rep.
Sample PARCC
16.7%
47.9%
39.7%
10.8%
6.4%
16.0%
63.6%
0.4%
15.8%
8
*Students in the High Needs category belong to any of these groups: special education, low-income, and ELL or
ever-ELL students
**Free lunch values were estimated for students with missing values
Student growth percentiles (SGPs) generated for 2015 MCAS and PARCC (provided in Table 7) show a
median at or near 50 in all grades for the representative samples, while there is a greater departure
from 50 for examinees not included in the representative samples. Across all test-takers in the state,
SGPs hover at or near a median of 50, as expected.
Table 7: Statewide ELA SGPs for 2015, by Sample
Comparison of Median Student Growth Percentiles,
by Testing Program and Sample
PARCC Sample
Group
Selected
UnSelected
Total
Grade
04
05
06
07
08
04
05
06
07
08
04
05
06
07
08
ELA
Median
Number
50
26321
50
27052
50
26625
50
26166
50
27127
52
8126
48
7566
49
8205
49
8095
47
7835
50
34447
50
34618
50
34830
50
34261
50
34962
Math
Median
Number
50
26289
50
27196
50
26656
50
26156
50
27070
49
8049
50
7621
50
8166
48
8128
50
7944
50
34338
50
34817
50
34822
50
34284
50
35014
MCAS Sample
ELA
Median Number
50
22176
50
23451
50
22640
50
23180
50
23267
53
7441
46
6945
50
7742
50
6932
50
7780
50
29617
49
30396
50
30382
50
30112
50
31047
Math
Median
Number
49
22283
50
23621
50
22597
51
23222
50
23297
50
7481
51
7055
52
7800
48
6914
50
7811
49
29764
50
30676
50
30397
50
30136
50
31108
9
PART 2: CONCORDANCE TABLES AND GUIDANCE FOR USE OF DATA
Concordance Tables Comparing MCAS to PARCC Results
Introduction
This section of Part 2 describes the methods and outcomes for the MCAS and PARCC concordance
studies. The selection of representative samples enabled greater accuracy when comparing MCAS to
PARCC outcomes, and also allowed the State to calculate achievement trends for the 2015 test
administrations. The concordance work allowed the State to directly compare PARCC to MCAS results
using an equipercentile approach and concordance tables.
Methods for Generating Concordance Tables Comparing MCAS to PARCC Results
The representative samples were used to generate concordance tables that estimate MCAS scores
based on PARCC test-takers’ results. The concordance tables serve two primary purposes:
1. to provide a better understanding of the relationships between the new PARCC test scores and
MCAS scores
2. to enable use of PARCC results in the State’s accountability formulas, which involve four-year
trends
The equipercentile method, which identifies comparable test scores across two different tests using
student achievement percentiles generated from each set of test results,5 was used to generate the
concordance tables. The equipercentile method is appropriate because a) the two tests measure similar
educational standards (the PARCC assessments measure the Common Core State Standards, and the
Massachusetts Curriculum Frameworks are based on the Common Core State Standards6), and b) the
representative samples drawn from the prior year’s population appear to satisfy the single-subject
requirement (see “single-subject requirement,” Kolen & Brennan, 2004, pp. 293–294).7 Additionally, the
equipercentile method for estimating MCAS scores from PARCC scores also works under the given
conditions of a) non-linear relationships between test scales, b) differences in test difficulty, and c) the
need to have accurate estimated MCAS scores across the PARCC scaled-score continuum (Kolen &
Brennan, 2004, p. 294).
With the equipercentile method, the representative sample of test-takers for each test are first ranked
from lowest to highest with scores matched to percentiles.8 The graphs in Figure 2 show the distribution
5
Student achievement percentiles are synonymous with student score rankings. To generate the percentiles in this
application, results on either test were ranked from low to high and assigned a percentile from 0 to 100 (this range
was used to provide more differentiation at the ends of the score distribution on MCAS).
6
See: http://www.doe.mass.edu/candi/commoncore/, document #1.
7
The single subject requirement stipulates that a single group of test-takers sit for both exams. Since almost all
Massachusetts test-takers took MCAS in 2014, the State, using the representative samples approach, identified
2015 test-takers in both the MCAS and PARCC groups that were nearly equivalent to the population of examinees
in 2014. As illustrated in Figure 1 on p. 5, once the representative samples are found to approximate the
comparison population, they are also considered approximately equivalent to each other.
8
Percentiles were calculated on student ability measures (thetas), which underlie each of the scaled scores.
Percentile buckets ranged from 0 to 100 with each bucket identifying one percentile. “0” and “100” were retained
10
of MCAS and PARCC scaled scores by percentile. For each test, as the percentile increases, the scaled
score increases in a logistic manner (i.e., in a non-linear fashion that is bounded by 0 and 1— or in this
case, 0 and 100).
Figure 2: Distribution of Grade 4 ELA and Math Scaled Scores by Percentile for MCAS and PARCC
This example illustrates two issues. First, the MCAS scale did not stretch out across all of the percentiles
computed, so gaps are noted in Graph 1. To address these gaps, MCAS scores were repeated across
percentiles so there was an estimated MCAS score for every percentile. Second, the scale did not stretch
completely from the beginning to the end of the distribution for some grades and subjects (in grade 3
ELA, for example, the MCAS score that mapped to the “0” percentile is 206 and not 200, as shown in
Table 8).
The MCAS estimates for each percentile were then mapped on the PARCC percentiles so that every
student with a valid PARCC score also received an estimated MCAS score. Table 8 shows a portion of the
concordance table for grade 4. Estimated MCAS scores by PARCC scaled scores are shown for percentiles
that range from 0 to 10 and from 51 to 61. Looking at the first row of results, PARCC fourth-grade testtakers who had a valid scaled score from 650 to 668 on the ELA exam received an estimated MCAS score
of 206. In Math, fourth-grade test-takers with a valid PARCC score that ranged from 650 to 676 received
an MCAS scaled score of 206. Toward the middle of the PARCC scale, fourth-graders who received a
PARCC ELA scaled score that ranged from 756 to 757 received an estimated MCAS scaled score of 240.
The full set of concordance tables, by grade, is published here, in the “Spring 2015 Grades 3–8 MCAS
and PARCC Concordance Tables.”
in the analysis to provide additional differentiation of student achievement at the ends (bottom and top) of the
distribution.
11
Table 8: Segment of Concordance Table for PARCC and MCAS, Grade 4, ELA and Math
Concordance: MCAS and PARCC by Percentile, Grade 4
Percentile
0
1
2
3
4
5
6
7
8
9
10
PARCC
ELA SS
650 to 668
668 to 681
681 to 689
689 to 694
694 to 698
698 to 701
701 to 704
704 to 707
707 to 709
709 to 711
711 to 713
MCAS
ELA SS
206
208
210
214
214
214
216
216
216
218
218
PARCC
Math SS
650 to 676
676 to 685
685 to 690
690 to 694
694 to 697
697 to 699
699 to 701
701 to 704
704 to 705
705 to 707
707 to 709
MCAS
Math SS
206
210
214
214
214
216
216
216
218
218
218
Percentile
51
52
53
54
55
56
57
58
59
60
61
PARCC
ELA SS
756 to 757
757 to 758
758 to 758
759 to 759
759 to 760
760 to 761
761 to 762
762 to 763
763 to 763
763 to 764
764 to 765
MCAS
ELA SS
240
240
240
240
240
242
242
242
242
242
244
PARCC
Math SS
748 to 748
748 to 749
749 to 750
750 to 751
751 to 752
752 to 753
753 to 753
753 to 754
754 to 755
755 to 756
756 to 757
MCAS
Math SS
238
238
238
240
240
240
242
242
242
242
244
The graphs in Figure 3 display the relationship between PARCC and MCAS scaled scores for ELA and
Math at grade 4. The graphs show the gradual increase of MCAS scores as PARCC scores increase, as
well as the range of PARCC scores associated with each concordant MCAS score.
Figure 3: Relationship Between MCAS and PARCC Scaled Scores for Grade 4 ELA and Math
A similar equipercentile method, using the SGP and Equate packages through the statistical platform, R
(Albano, 2014, Betebenner, 2015), was applied for eighth-graders taking the PARCC Algebra I test. This
methodology better adjusted the MCAS math score estimates for the higher-achieving eighth-graders
taking Algebra 1 by accounting for students’ prior achievement. The resulting concordance table for the
Algebra 1 test features ranges of estimated scores for both MCAS and PARCC, as shown by the segment
provided in Table 9.
12
Table 9: Segment of PARCC to MCAS Concordance Table for Algebra 1, Grade 8
Concordance: MCAS and PARCC by Percentile, Algebra I
Percentile
0
1
2
3
4
5
6
7
8
9
10
PARCC
Math SS
677 to 694
695 to 706
706 to 714
714 to 720
720 to 723
723 to 726
726 to 727
727 to 731
731 to 733
733 to 735
735 to 737
MCAS Math
SS
214 to 218
218 to 220
218 to 220
220 to 222
222 to 222
222 to 224
224 to 224
224 to 226
226 to 228
228 to 228
228 to 230
Percentile
51
52
53
54
55
56
57
58
59
60
61
PARCC
Math SS
772 to 773
773 to 774
774 to 775
775 to 775
775 to 776
776 to 777
777 to 777
777 to 778
778 to 779
779 to 780
780 to 780
MCAS Math
SS
252 to 254
254 to 254
254 to 254
254 to 254
254 to 254
254 to 256
256 to 256
256 to 256
256 to 256
256 to 256
256 to 256
Composite Performance Index (CPI) Results for PARCC Schools and Districts
Estimated MCAS scores from the concordance study were used to generate Composite Performance
Indices (CPIs) for school- and district-level accountability purposes. Although schools and districts taking
PARCC for the first time in 2015 were “held harmless” from negative accountability decisions, the CPIs
were reported and are part of the historical record.
CPIs were generated from the estimated MCAS scores using the standard formula, as shown in the first
column of Table 10. Table 10 provides the range of PARCC scores associated with each CPI level in ELA
and Math for grades 3 through 8.
Table 10: PARCC Values for CPIs
2015 PARCC Composite performance Index (CPI) Concordance Table
CPI Points per
Student
100 (240-280)
75 (230-238)
50 (220-228)
25 (210-218)
0 (200-208)
Grade 3
ELA
Math
745-850
720-745
691-720
668-691
650-668
735-850
724-735
708-724
667-708
650-667
Grade 4
ELA
Math
754-850
737-754
717-737
681-717
650-681
750-850
729-750
709-729
676-709
650-676
PARCC Scaled Scores
Grade 5
Grade 6
ELA
Math
ELA
Math
743-850
725-743
711-725
677-711
650-677
740-850
728-740
712-728
686-712
650-686
741-850
726-741
713-726
674-713
650-674
741-850
725-741
701-725
662-700
650-662
Grade 7
ELA
Math
746-850
737-746
726-737
692-723
650-692
746-850
737-746
726-737
692-723
650-692
ELA
Grade 8
Math
Alg 01
727-850
712-727
695-712
662-695
650-662
743-850
729-743
712-729
667-712
650-667
749-850
736-748
705-735
672-704
659-672
The average CPIs were compared by grade level, test, and testing group. Comparisons by CPIs across
groups (representative sample vs. total sample), by grade, are provided in Tables 11 and 12. In all cases
except for one, the CPI differences across the MCAS and PARCC examinees are smaller for the
representative sample group than for the total sample. Differences for all representative sample groups
are “1” or less, indicating that the CPIs are fairly comparable across the tests.
13
Total
Representative
Sample
Table 11: Comparison of CPIs by Test and Testing Group, ELA
Comparison of CPIs, by Test and Testing Group, ELA*
03
04
05
06
07
08
03
04
05
06
07
08
Median
100
100
100
100
100
100
100
100
100
100
100
100
PARCC
Mean
82.49
77.63
87.09
86.63
86.28
92.17
81.25
78.91
84.83
85.28
86.29
91.09
Valid N
28075
27859
28504
28006
28005
28647
36901
36431
37105
37204
36909
37368
Median
100
100
100
100
100
100
100
100
100
100
100
100
MCAS
Mean
83.48
78.60
87.35
86.70
87.14
91.52
83.30
81.03
85.95
86.27
88.17
90.97
Valid N
24104
23908
24810
23968
24672
24481
32297
32256
32915
32806
32371
33116
PARCC - MCAS
Difference
ES*
-0.99
-0.04
-0.97
-0.04
-0.26
-0.01
-0.07
0.00
-0.86
-0.04
0.65
0.03
-2.06
-0.08
-2.12
-0.08
-1.12
-0.05
-0.99
-0.04
-1.88
-0.08
0.12
0.01
*ES = Effect Size
Total
Representative
Sample
Table 12: Comparison of CPIs by Test and Testing Group, Math
Comparison of CPIs, by Test and Testing Group, Math**
03
04
05
06
07
08
Alg. 1*
03
04
05
06
07
08
Alg. 1*
Median
100
75
100
100
100
100
100
100
75
100
100
100
100
100
PARCC
Mean
85.28
76.96
83.18
81.19
72.30
78.05
92.43
84.22
78.32
79.53
79.10
72.48
75.52
91.54
Valid N
28089
27880
28466
27995
27754
24956
3558
36942
36461
37085
37194
36593
32984
4264
Median
100
75
100
100
100
100
MCAS
Mean
85.45
77.32
83.71
81.57
73.18
78.85
Valid N
24104
23908
24810
23968
24672
24481
100
100
100
100
100
100
85.43
79.90
81.42
81.11
75.82
78.40
32297
32256
32915
32806
32371
33116
PARCC - MCAS
Difference
ES**
-0.17
-0.01
-0.36
-0.01
-0.53
-0.02
-0.38
-0.01
-0.88
-0.03
-0.80
-0.03
-1.21
-1.58
-1.88
-2.01
-3.33
-2.88
-0.05
-0.06
-0.07
-0.07
-0.11
-0.10
*Algebra 1 taken by eighth-graders.
**ES = Effect Size
The last column in both Table 11 and Table 12 shows the effect size of the differences between the two
groups. Effect sizes indicate the “standardized mean difference” between two groups. Basic rules
indicate that effect sizes of 0.2 or less indicate small differences, and effect sizes near 0 indicate almost
no difference (Becker, 2000). It should be noted, however, that smaller differences tended to favor the
MCAS examinee group (with the MCAS group showing slightly higher achievement than the PARCC
group).
14
Guidance for Using Representative Samples and Concordance Tables
Concordance Tables
Locations
Estimated MCAS results that correspond with PARCC scores are available in the concordance tables and
are linked to PARCC results in several datasets, as shown in Table 13. Because CPIs for PARCC testtakers, schools, and districts are calculated using concordant MCAS scores, CPIs also provide information
of MCAS concordance with PARCC.
Table 13: Datasets Containing PARCC to MCAS Concordance Results
Datasets Containing PARCC to MCAS Concordance Results
Dataset Name
Research Files
(student level)
School and District
PARCC Results
MCAS Achievement
Distribution and
Growth Reports
Description
De-identified student-level
files
Full set of PARCC data with
MCAS concordance results
and CPIs included
A collection of reports that
provide results by MCAS
performance levels and by
CPIs based on MCAS levels
Location
Request access:
www.doe.mass.edu/infoservices/research
School/District Dropboxes
Edwin Analytics, see: PE334, PE434, PE305,
PE405, among others
Cautions
There are several things to keep in mind when using concordance scores. First, because test-takers took
only one of the tests (PARCC or MCAS), the concordant results approximate, but are not exactly the
same as, the results test-takers would have gotten if the alternative test had been administered. Users
are cautioned against making consequential decisions based on a single test score, a single two-year
comparison, or a single analysis, particularly when estimated scores are being used. Second, due to the
requirements for conducting concordance studies (described on page 10), the concordance results are
specifically applicable to 2015. A refinement of this approach will be used to generate concordance
tables for 2016. 9 It is anticipated that while the 2015 concordance tables apply primarily to 2015, the
2016 tables will be applicable to both 2015 and 2016. Analyses using concordance tables applied to nondesignated years should be used with strong caution. Third, concordance results for PARCC were
identified without consideration of the mode of administration (paper or online); therefore, no
adjustments were made for any differences that may be attributable to mode.
Conducting Analyses at the State Level with Representative Samples
In 2015, datasets and data reports with state-level results provide information on the representative
samples either by reporting state-wide results only for representative samples (as is done in Edwin
Analytics), or by providing a “representative samples flag” (a column of “1s” that denote the cases
[students] that belong to the representative samples for each test). The representative samples are
9
Refined concordance tables for 2016 will be published on the Department’s website by summer 2016.
15
useful for comparing state-level results from 2015, either in their entirety or disaggregated by the
demographic groups studied in this paper, and are useful for comparing to state-level results in prior or
subsequent years. Data users conducting their own analyses are encouraged to apply the representative
samples flags, which will be available in all state-wide data sets, when using 2015 state-level results in
analyses.
As shown in Figure 4, when representative samples are reported for state-level results, a superscript (¹)
in the report links to a footnoted description of how those representative samples are used.
Figure 4: State-Level Results Based on Representative Samples in 2015
Conducting Analyses that are Not State-Level
Representative samples are not applicable to smaller units of analysis (e.g., analyses at the school- or
district-level) because these samples were identified for state-level use only. In situations where
students within a school or a district took the same test two years in a row, year-to-year comparisons
can be made using typical procedures (e.g., comparing across student demographic groups using the
scaled scores).
Scaled-score comparisons should only be made using scores on the same scale. Therefore, if a school or
district changed from administering the MCAS tests in 2014 to administering the PARCC tests in 2015,
then scaled score comparisons should be made by applying estimated results from the concordance
tables for 2015 and/or the concordance tables for 2016.
SGPs and CPIs can also be used in comparing groups or in evaluating trends over time. However, once
again, caution is advised when rendering judgments based on small differences between transitional
2015 SGPs or CPIs and traditional SGPs or CPIs generated from MCAS data.
Data users may wish to take mode into consideration when conducting analyses with PARCC data, based
on potential mode differences (paper versus online) resulting from variations in prior experience with
online testing. For example, users may wish to take caution when comparing PARCC results across
16
schools with different administration modes, or when comparing year-to-year results that involve
different mode administrations. The Department has identified the 2015 PARCC test mode for each
school and district in the file titled list by district. In addition, a variable denoting mode is provided in the
2015 research files, which can be requested here.
17
References
Albano, A. D. (2014). equate: an R Package for Observed-Score Linking and Equating. Retrieved from:
https://cran.r-project.org/web/packages/equate/vignettes/equatevignette.pdf.
Angrist, J. D., & Pischke, J-S. (2009). Making regression make sense In J. D. Angrist and J-S. Pischke
(Eds.) Mostly Harmless Econometrics, An Empiricist’s Companion (pp. 8094). Princeton, NJ:
Princeton University Press.
Betebenner, D. (2012). On the Precision of MCAS SGPs. Presentation given to the MCAS Technical
Assistance Committee, April 2013.
Betebenner, D. (2015, February 19). SGP: An R Package for the Calculation and Visualization of Student
Growth Percentiles & Percentile Growth Trajectories. Retrieved from: https://cran.rproject.org/web/packages/SGP/SGP.pdf.
Becker, L. A. (2000). Effect Size (ES). Retrieved from: http://www2.jura.unihamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Lehre/StatIIKrim/
EffectSizeBecker.pdf.
Honaker, J., King, G., & Blackwell, M. (2014, November 14). Amelia: Multiple Imputation of Incomplete
Multivariate Data. Retrieved from: https://cran.r-project.org/web/packages/Amelia/Amelia.pdf.
Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007, January 31). Matching as nonparametric
preprocessing for reducing model dependence in parametric causal inference. Political Analysis,
15, pp. 199–236. Retrieved from: http://gking.harvard.edu/files/matchp.pdf.
Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2011, June 28). MatchIt: Nonparametric Preprocessing for
Parametric Causal Inference. Retrieved from: http://raptor1.bizlab.mtsu.edu/sdrive/TEFF/Rlib/library/MatchIt/doc/matchit.pdf.
Kolen, M. J., & R. L. Brennan. (2004). Test Equating, Scaling, and Linking: Methods and Practices, 2nd Ed.
New York, NY: Springer Science+Business Media, Inc.
Murnane, R. J. & Willett, J. B. (2012). Dealing with selection bias in nonexperimental data In R. J.
Murnane and J. B. Willette (Eds.) Methods Matter: Improving Causal Inference in Educational
and Social Science Research (pp. 304–331). NY, NY: Oxford University Press.
Rosenbaum, P. R. (2010). Design of Observational Studies. New York, NY: Springer Science+Business
Media, Inc.
Rosenthal, R, and Rubin, D.B. (1982). A simple, general purpose display of magnitude of experimental
effect. Journal of Educational Psychology, 74, pp. 166–169.
18
Appendix A: Proof-of-Concept Study
Results from the proof-of-concept study were examined to evaluate and refine the models. The proofof-concept study compared 2013–14 representative sample results with 2012–13 population-level
results, allowing for a direct examination of how well the matching models worked with respect to four
key factors:
1. Balance on prior achievement and key demographic variables between the 2013–14
representative samples and the 2012–13 population; a direct comparison between the MCAS
and PARCC groups in 2013–14 was also conducted
2. Replication of 2013–14 MCAS psychometric results in the representative samples (the
replication of cut scores was examined)
3. Replication of the 2013–14 student growth percentiles in the representative samples
4. Similarity of school- and district-level accountability results in 2013-14.
Counts
In the 1:1 matching of the 2013–14 examinees to the 2012–13 examinees, a designated percentage
(~13–14%) of examinees is removed from the self-selected samples. The removed examinees are more
dissimilar to the prior year’s population of examinees than those selected for the representative
samples. The numbers of examinees included and removed from the representative samples in the
proof-of-concept study, by grade, is shown in Table A1.
Table A1: PARCC and MCAS Samples, Proof-of-Concept Study
Grade
3
4
5
6
7
8
Total
PARCC and MCAS Samples, 2014 Proof-of-Concept Study
MCAS
PARCC
PARCC
Total
MCAS Rep.
MCAS %
Total
Rep.
MCAS
Sample
Removed
PARCC
Sample
29686
26119
12%
35579
31120
30719
26183
15%
35461
31054
30644
26318
14%
35584
31110
30041
26708
11%
35931
31194
31393
26636
15%
36456
31686
31850
26988
15%
36317
31775
184333
158952
14%
215328
187939
PARCC %
Removed
13%
12%
13%
13%
13%
13%
13%
Balance
Balance refers to the comparability of groups according to the demographic variables studied. When
two groups are equivalent across a large range of relevant demographic variables, they are said to be
balanced or equivalent (Ho, Imai, King, and Stuart, 2007). The goal of drawing representative samples
was to generate two sets of samples that were each more comparable to the prior year in terms of
student demographics and prior achievement than were their respective overall groups of examinees.
For the two 2014 representative samples, results on balance showed that the matching procedures
resulted in better matching both to the previous year’s population and to each other, in terms of
achievement and demographic variables.
19
Table A2 shows the results of matching for the criterion (achievement) variables. Achievement results
did not differ much for examinees in the 2014 unmatched samples; however, the matching procedure
achieved better balance on these variables. The columns in the table show the following achievement
results:
 population average in 2013
 average of all 2014 MCAS examinees
 average of selected (matched) MCAS examinees
 average of all 2014 PARCC examinees
 average of selected PARCC examinees
The first two rows provide comparisons for the grade 3 and the grades 4–8 criterion variables. The third
row provides the average for two achievement variables for grades 4–8. With respect to MCAS
achievement variables, there is very little change achieved from the matching, as the overall MCAS
group results were already close to the mean. For PARCC test-takers, however, very low-scoring
examinees were removed from the PARCC sample, putting it more closely in line with both the MCAS
sample and the 2013 population results. The selected samples are nearly identical to the population and
to each other.
Table A2: Comparison of 2014 Samples to Population Achievement Results, by Group
Comparison to Achievement Outcomes for 2014 Test-Takers,
by Grade and Sample to Population
Gr. 3 *
Gr. 4–8**
Average Ach. Gr. 4–8***
2013 Pop.
Average
60%
51%
63%
All 2014 MCAS
Test-Takers
63%
52%
61%
2014 MCAS
Rep. Sample
62%
50%
60%
All 2014 PARCC
Test-Takers
57%
50%
59%
2014 PARCC
Rep. Sample
60%
51%
60%
*2013 Achievement Outcome, Grade 3. Estimated percent scoring Proficient+ on MCAS ELA & Math, by school and
demographic group
**2013 Achievement Outcome Grade 4-8: Percent scoring Proficient+ on MCAS ELA & Math
***Average percent of students scoring Proficient+ on 2013 MCAS ELA and Math, separately
Table A3 presents similar population and group outcomes for demographic comparisons. Adjustments
to group demographic proportions are larger for some categories than for others. The largest group
differences are noted on the Race: Black/African American, High Needs, and Free Lunch categories. After
matching, the MCAS and PARCC samples are brought more closely in line to the population proportions,
with the MCAS group having slightly fewer examinees in these categories and the PARCC group having
slightly more.
20
Table A3: Comparison of 2014 Demographic Characteristics (Proportions), by Group
Comparison of Group Demographics to the 2013 Population of Examinees
Race: Black/African American
Race: Hispanic
Race: White
Race: Asian
Free Lunch (2013, imp)*
High Needs**
Special Education
Ever-ELL
2013
Population
8.2%
15.8%
67.0%
5.9%
37.5%
47.5%
18.0%
14.1%
All 2014
MCAS
Test-Takers
5.1%
15.6%
69.6%
6.5%
34.4%
45.4%
18.1%
13.5%
2014 MCAS
Rep. Sample
5.7%
16.5%
67.6%
6.8%
36.0%
47.4%
19.1%
14.0%
All 2014
PARCC
Test-Takers
11.0%
16.9%
63.1%
5.6%
41.8%
52.1%
18.0%
15.3%
2014 PARCC
Rep. Sample
10.4%
16.3%
64.0%
5.8%
39.6%
49.9%
17.8%
15.0%
*Free lunch values were estimated for students with missing values
**Students in the High Needs category belong to any of these groups: special education, low income, and ELL or everELL students
Replication of 2013–14 Psychometric Results
Measured Progress, the testing contractor for MCAS, evaluated the impact of the representative
samples approach on the cut scores assigned to MCAS for the 2014 test administration. The evaluation
involved re-running the 2014 MCAS results using the representative samples from the proof-of-concept
study. The comparison of the number of raw score points assigned to each cut score is provided in the
last two columns of Table A4. In the table, the “2014 Actual” column provides the number of raw score
points actually assigned to each cut score in 2014 and the “Rep. Sample” column (shaded green)
indicates the number of raw score points that would be assigned to each cut score using the
representative samples.
Comparisons for four tests/grade combinations are presented. As is shown, the number of raw score
points assigned to each cut score using the representative samples matched the actual raw score cuts in
2014 for the four test/grade combinations studied except in one instance (Math Grade 4, W/F to NI)
where one raw score point difference is noted. Measured Progress psychometricians indicated that this
small difference is anticipated with re-analyses and that the results were nearly equivalent.
A second check of the representative samples was conducted by comparing graphs of student results for
the four test/grade combinations studied. The graphs in Figure A1 depict examinees’ expected raw
scores (Expected Total Score) by examinees’ ability measures (Theta) for the 2014 Actual population
results (red dashed line) and the results based on the representative samples (‘2015 Matched’ – blue
line – which denotes the 2014 test characteristic curves based on the representative samples). As is
shown in the graphs, the examinees’ expected results for the 2014 populations of students and the 2014
representative samples are nearly identical in all instances, indicating that the use of the representative
samples yields equivalent results to those generated with the population-level data.
21
Table A4: Replication of 2014 MCAS Results with Representative Samples
2014 Comparison of MCAS Actual Cut Scores and
Cut Scores from Representative Samples
Test
Cut
ELA Grade 3
ELA Grade 7
Math Grade 4
Math Grade 8
2014 Actual
Rep. Sample
W/F to NI
23
23
NI to Prof
37
37
Prof to Adv
44
44
W/F to NI
NI to Prof
Prof to Adv
W/F to NI
NI to Prof
Prof to Adv
W/F to NI
NI to Prof
Prof to Adv
30
47
64
23
39
48
24
37
48
30
47
64
22
39
48
24
37
48
Figure A1: Comparison of 2014 Expected MCAS Results by Ability (Theta),
for Four Subject/Grade Combinations
MAT04
-2
0
2
40
30
20
10
0
2014 Actual
2015 Matched
4
-4
-2
0
Theta
Theta
ELA07
MAT08
2
4
-4
-2
0
Theta
2
40
30
20
2014 Actual
2015 Matched
0
0
2014 Actual
2015 Matched
10
Expected Total Score
50
10 20 30 40 50 60 70
-4
Expected Total Score
Expected Total Score
40
30
20
10
2014 Actual
2015 Matched
0
Expected Total Score
50
ELA03
4
-4
-2
0
2
4
Theta
22
Replication of 2013–14 Student Growth Percentiles
The 2013–14 representative samples were used to recompute the SGPs for the students designated to
take MCAS, to both evaluate the impact of the samples on the generation of SGPs for 2014–15 and to
ensure that the SGPs could be replicated using the representative samples approach. The graphs in
Figure A2 display differences on MCAS SGPs generated with the representative samples in ELA and
Math. The majority of the recalculated SGPs were between -2 and 2 (94% for ELA and 92% for Math)—
far smaller than the expected standard error for SGPs, which is generally between 5 and 7 (Betebenner,
2013).
Figure A2: Replication of 2013–14 Student Growth Percentiles (SGPs)
Table A5 provides descriptive statistics for the recalculated SGPs, by sample. The anticipated mean and
median SGP across the state was 50. Here we can see that the SGPs for the unselected sample vary
more from the expected median of 50 than the recalculated SGPs for the selected sample. For the
selected sample, the median and mean SGPs for all grades are within one point of 50. The total SGPs are
also within one point of 50. These results confirm that the representative samples can be used to
calculate population-level SGPs for the 2015 test administrations.
23
Table A5: Descriptive Statistics for Recalculated SGPs, by Sample
Recalculated SGPs for MCAS Examinees, by Sample, Proof of Concept Study
UnSelected
Selected
Total
04
05
06
07
08
04
05
06
07
08
04
05
06
07
08
Median
55
47
51
51
50
49
50
50
50
50
49
50
50
50
50
Recalculated SGP ELA
Mean Min. Max.
53
1
99
48
1
99
51
1
99
51
1
99
50
1
99
50
1
99
50
1
99
50
1
99
50
1
99
50
1
99
50
1
99
50
1
99
50
1
99
50
1
99
50
1
99
Number
3613
3340
2690
3754
3852
20800
21073
21649
21278
21213
24413
24413
24339
25032
25065
Median
51
50
48
52
50
50
50
50
50
51
50
50
50
50
51
Recalculated SGP Math
Mean Min. Max. Number
51
1
99
3614
50
1
99
3377
48
1
99
2696
51
1
99
3750
50
1
99
3872
50
1
99
20843
50
1
99
21152
50
1
99
21669
50
1
99
21288
51
1
99
21322
50
1
99
24457
50
1
99
24529
50
1
99
24365
50
1
99
25038
50
1
99
25194
Replication of 2013–14 Accountability Results
The method for selecting representative samples was evaluated with a final check on how the method
affected accountability results. The 2014 CPI means for each representative sample (MCAS and PARCC)
were compared to those of the examinees excluded from the representative samples. In both sets of
comparisons (in ELA and in Math), mean CPIs for the representative samples matched exactly, while the
mean CPIs for the excluded student samples differed considerably, as shown in Table A6. These results
confirmed the use of the representative samples for calculating the Department’s accountability
measures.
Table A6: Comparison of 2014 CPI Calculations
Comparison of 2014 CPIs for Representative Samples and Excluded Samples,
by Test Group
SAMPLE
MCAS Representative Sample
Students excluded from the MCAS rep. sample
PARCC Representative Sample
Students excluded from the PARCC rep. sample
ELA 2014 CPI
85.4
89.3
85.4
82.7
Math 2014 CPI
78.9
83.6
78.9
74.1
24
Summary of Results from the Proof-of-Concept Study
The proof-of-concept study examined the representative samples approach for generating results that
were very close or identical to the actual results (population-level results) for the 2014 test
administration. In each of the four areas investigated, this approach yielded results for the
representative samples that were equivalent or nearly equivalent to the actual results in 2014:
 The comparisons of prior achievement and demographic averages indicated identical or more
similar results (balanced) than the results for the total samples, indicating that the matching
approach ameliorated differences by testing group (MCAS vs. PARCC).
 The 2014 cut scores on four MCAS tests (i.e., four grade/subject combinations) were replicated
using the representative samples approach, indicating that this approach can be used to
maintain testing trends in 2015.
 The 2014 SGPs were nearly replicated using the representative samples approach, indicating
that SGPs can be generated for 2015 using this method.
 The 2014 accountability results (average CPIs in ELA and Math) were replicated using the
representative samples approach, indicating that this approach can be used to generate
accountability statistics for 2015.
All results from the proof-of-concept study demonstrated that the representative sample
results more consistently matched results for the testing population than did the overall
sample group results; the representative samples yielded results nearly equivalent to the
actual results in 2014.
25
Appendix B: Method Used to Select Representative Samples
A derivation of propensity score matching was used to identify representative samples. Propensity score
matching uses a host of background variables to match individuals across two conditions (e.g.,
intervention, treatment). The formula shown in Figure B1 computes a “propensity score” that captures
the differences among examinees for each condition (e.g., treatment) with respect to the group of
variables used to compute it. Typically, propensity scores (x) are used to match people who received an
intervention (z) to students who have not, using a set of variables (X, covariates). The propensity score is
an estimated probability (p), based on the background variables and the condition (e.g., intervention).
The propensity scores are then used to match individuals across the two conditions (or more than two
conditions).
For our study, there is no intervention – the condition (z) being evaluated is assignment to a testing
program, which is not really dependent on a student’s background variables; instead, assignment is
determined by the district or school a student attends. Consequently, estimating propensity scores
based on testing assignment would yield a poorly fitting model and poor matching of students across
conditions.
Figure B1: Typical Propensity Score Equation
propensity
score
Z = treatment
assignment
(e.g., test
assignment)
e( x )  p ( z  1 | X )
Vector of
covariates
(e.g., student
demographics)
The Department, in consultation with testing experts from the MCAS Technical Advisory Committee,
matched students taking MCAS to students taking PARCC using prior MCAS results instead of test
assignment (z = prior test results: Not Proficient vs. Proficient/Advanced). Using this approach,
propensity scores (x) were generated outside of the matching program, as shown in Figure B2.
26
Figure B2: Derivation of Propensity Score Equation
propensity
score
z = prior MCAS
Results:
0=Not Prof./Adv.
1=Prof/Adv
e( x )  p ( z  1 | X )
Probability
(can be
computed
outside of the
matching
program)
Vector of
covariates
(e.g., student
demographics)
The propensity scores are probabilities for achieving Proficient or Advanced on the criterion variable
used, given the examinee demographics. Propensity scores are generated using a binary logistic
regression for each grade and subject.10
Two primary matching models were used that differed largely by the achievement variable (criterion
variable) used for the matching. These two models are described below.
1. The main model that identified representative samples in grades 4–8 used the prior MCAS
results in ELA (0=W/NI, 1=Prof/Adv), and included the prior MCAS results in Math, (0=W/NI,
1=Prof/Adv) as one of the covariates.
2. Since grade 3 does not have prior achievement results, the criterion variable used to match
examinees was a dichotomized (0, 1) average score (based on the results for prior students in
that school, grade, and that student’s demographic group) that indicated whether or not
students scored Proficient or Advanced on both ELA and Math for the prior year (2014). For
example, a grade 3 student in 2015 was matched to another grade 3 student in 2014 in that
same school and grade according to the average MCAS results for that group (by race and
whether the examinee was enrolled in special education or not).
Model variables and coefficients are provided in Appendix C. Model fits for the three models are shown
in Table B1; for all three, higher numbers indicate better fit.
 The column labeled “% Corr” shows how accurately the model classified students according to
the criterion variable used (MCAS results, proficient or not proficient, in 2014). This statistic
ranges from 0 to 100%, and the fits shown indicate moderately strong model fit.
 The data in the columns labeled “C & S” (Cox and Snell) and “Nagel” (Nagelkerke) are two
additional model fit statistics that describe the proportions of variance (score spread) explained
10
First, a binary logistic regression was run using the prior year’s data (the prior year provides population-level
results). Next, the population-level coefficients for each variable used in the model were applied to the current
year’s data to yield population-level results for the current year—in this case, coefficients generated using data
from 2014 were applied to the 2015 model.
27
by the models, with higher proportions indicating better fit. Nagelkerke ranges from 0 to 1 and
Cox and Snell from 0 to a limit below 1. The model fits on these metrics indicate moderate fit for
most of the models.11
Table B1: Model Fits for Matching Models
Model Fit: Log. Regression
% Corr C & S Nagel.
Grade 3
83.8
.453
.604
Grade 4
78.3
.352
.471
Grade 5
80.1
.345
.475
Grade 6
82.8
.370
.522
Grade 7
82.8
.348
.507
Grade 8
85.5
.308
.488
Prior to generating the propensity scores, a multiple imputation procedure (from the R Amelia package)
was used to substitute estimated values for all missing data used in the analysis. The matching was
conducted with the R Package MatchIt using the nearest neighbor matching algorithm. The nearest
neighbor match conducts a 1:1 match. Therefore, each examinee included in each representative
sample was matched to an examinee the prior year in that grade using the propensity scores generated
through the logistic regression.
After the matching was conducted, all data were evaluated to determine how similar the matched
groups were on student demographic and achievement variables used in the models.
Matched samples for 2014–15 were evaluated for balance on the demographic and prior achievement
variables, as well as for comparability on SGP and accountability results. These results, presented in the
body of this report, showed that the method generated samples that were significantly more equivalent
to the prior year’s population than the unmatched samples. Further, the matched samples for MCAS and
PARCC test-takers were more similar to each other than were the unmatched samples.
11
The higher model fits for grade 3 are largely an artifact of the criterion variable, which includes information on
student demographic information. Because the remainding models used actual examinee results, those model fits
provide better information for matching than does the grade 3 model, despite appearances.
28
Appendix C: Logistic Regression Variables and Results
Table C1 provides information about the variables included in the two logistic regression models used to
generate the propensity scores.
Table C1: Variables Used in the Logistic Regression Models*
Variables Used Across the Models
Type of
Variable
Criterion
Variable Name
emperf2013_PA_mean_imp_cat
Criterion
Covariate
Covariate
Covariate
Covariate
Covariate
Covariate
Covariate
eperf2014_imp
race_B
race_H
race_A
race_W
race_M
ever_ell
Highneeds
Covariate
Covariate
Covariate
Covariate
Covariate
Covariate
Covariate
Covariate
Covariate
Covariate
Covariate
freelunch2014_imp
yrsinmass_imp
levelofneed0_B
levelofneed0_H
levelofneed0_W
levelofneed0_A
freelunch2014_BH
freelunch2014_H
freelunch2014_A
freelunch2014_W
emperf2012_PA_mean_imp
Covariate
Covariate
Covariate
Covariate
Covariate
emperf2012_B
emperf2012_H
emperf2012_W
emperf2012_A
emperf2012_levelofneed0
Covariate
emperf2013_PA_imp
Covariate
Covariate
Covariate
eperf2013_imp
mperf2013_imp
emperf2013_PA_mean_imp_cat
Description
Proficient or Advanced on MCAS ELA/Math in the prior
year, in that grade and school, by Race and
Free/Reduced Lunch Status, Imputed, Dichotomized
(0,1)
Proficient or Advanced on MCAS ELA in the prior year
Race/Ethnicity = African American/Black
Race/Ethnicity = Hispanic/Latino
Race/Ethnicity = Asian
Race/Ethnicity = Caucasian/White
Race/Ethnicity = Mixed
Ever an English Language Learner
High Needs (Student with Disability + Free/Reduced
Lunch Eligible)
Free/Reduced Lunch Eligible (prior year), Imputed
Number of years in Massachusetts schools, Imputed
Interaction: No Special Needs * Race_B
Interaction: No Special Needs * Race_H
Interaction: No Special Needs * Race_W
Interaction: No Special Needs * Race_A
Interaction: freelunch2014_imp * Race_BH
Interaction: freelunch2014_imp * Race_H
Interaction: freelunch2014_imp * Race_A
Interaction: freelunch2014_imp * Race_W
Proficient or Advanced on MCAS ELA and Math,
proportion for that school, grade, race, and
free/reduced lunch category, two years' prior, imputed
Interaction: emperf2012_PA_mean_imp * Race_B
Interaction: emperf2012_PA_mean_imp * Race_H
Interaction: emperf2012_PA_mean_imp * Race_W
Interaction: emperf2012_PA_mean_imp * Race_A
Interaction: emperf2012_PA_mean_imp *
leverofneed0
Proficient or Advanced on MCAS ELA nd Math, 2013,
Imputed
Proficient or Advanced on MCAS ELA in the prior year
Proficient or Advanced on MCAS ELA, 2013, Imputed
Proficient or Advanced on MCAS Math, 2013, Imputed
Model
Grade 3
Grades 4-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 3-8
Grades 4-8
Grades 4-8
Grades 4-8
Grades 4-8
Grades 4-8
Grades 4-8
Grades 4-8
Grades 4-8
Grades 4-8
Grades 4-8
* Variables were used only in the Grade 3 model, only in the Grades 4–8 model, or in both models (Grades 3–8), as
indicated by the far right column.
Tables C2 and C3 provide information on the covariates used in the logistic regression models. Table C2
provides information for the models used in grades 3 through 5, and Table C3 provides information for
grades 6 through 8. Within each set of grade-level results, the first column indicates the population-level
coefficient, the second column provides the standard error of the coefficient, and the third column, the
29
statistical significance of the coefficient. For propensity-matching purposes, variables should be retained
in the model if the probability (significance) is less than 0.5; however, all variables were kept in for
consistency across models.
Table C2: Information for Model Variables, Grades 3–5
Coefficients for Covariates used in the Logistic Regression Models, Grades 3–5
Grade 3
Grade 4
Grade 5
St.
St.
St.
Coefficient
Error Sign. Coefficient
Error
Sign.
Coefficient
Error
race_B
-3.135 .234
.000
-0.225 0.231
.329
-0.073 0.219
race_H
-2.718 .208
.000
-0.352 0.214
.101
-0.126 0.209
race_A
-0.682 .217
.002
-0.028 0.252
.912
0.025 0.240
race_W
-3.400 .179
.000
-0.005 0.193
.981
-0.052 0.194
race_M
0.297 .163
.068
0.175 0.178
.327
0.216 0.177
ever_ell
-0.236 .042
.000
-0.369 0.035
.000
-0.415 0.034
highneeds
-0.742 .064
.000
-0.300 0.051
.000
-0.329 0.049
freelunch2014_imp
-0.922 .102
.000
-0.211 0.113
.062
-0.234 0.121
yrsinmass_imp
-0.029 .017
.098
0.029 0.012
.014
0.029 0.009
levelofneed0_B
1.696 .161
.000
0.895 0.118
.000
0.629 0.093
levelofneed0_H
1.224 .125
.000
1.071 0.089
.000
0.614 0.070
levelofneed0_W
4.324 .077
.000
0.328 0.055
.000
0.169 0.056
levelofneed0_A
2.189 .150
.000
0.565 0.172
.001
0.371 0.160
freelunch2014_B
0.529 .121
.000
0.020 0.137
.884
-0.082 0.144
freelunch2014_H
-0.756 .105
.000
-0.220 0.125
.079
-0.126 0.134
freelunch2014_A
-0.700 .125
.000
-0.135 0.139
.331
-0.290 0.152
freelunch2014_W
0.174 .094
.065
-0.051 0.109
.642
-0.034 0.120
mperf2014_imp
1.851 0.020
.000
1.909 0.021
emperf2013_PA_mean_imp
3.019 0.087
.000
2.873 0.094
emperf2013_B
-0.392 0.130
.003
-0.100 0.146
emperf2013_H
-0.074 0.113
.516
-0.181 0.124
emperf2013_W
-0.051 0.084
.538
-0.101 0.095
emperf2013_A
-0.091 0.122
.455
-0.079 0.133
emperf2013_levelofneed0
-0.097 0.082
.239
-0.063 0.093
Constant
1.137 0.178 .000
-2.219 0.199
.000
-1.600 0.201
Sign.
.740
.548
.916
.789
.222
.000
.000
.053
.002
.000
.000
.002
.020
.567
.347
.056
.779
.000
.000
.492
.144
.287
.555
.500
.000
30
Table C3: Information for Model Variables, Grades 6–8
Coefficients for Covariates Used in the Logistic Regression Models, Grades 6–8
Grade 6
Grade 7
Grade 8
Coefficient St. Error Sign. Coefficient St. Error Sign. Coefficient St. Error
race_B
-0.282
0.246
.251
-0.079
0.232
.733
-0.089
0.244
race_H
-0.144
0.235
.539
-0.075
0.226
.740
0.025
0.238
race_A
-0.183
0.277
.509
-0.111
0.263
.672
0.014
0.278
race_W
0.135
0.221
.541
-0.026
0.213
.901
0.231
0.225
race_M
0.234
0.205
.254
-0.182
0.196
.353
0.244
0.200
ever_ell
-0.379
0.035
.000
-0.413
0.034
.000
-0.609
0.035
highneeds
-0.403
0.052
.000
-0.280
0.051
.000
-0.394
0.053
freelunch2014_imp
-0.405
0.136
.003
0.021
0.133
.874
-0.190
0.157
yrsinmass_imp
0.055
0.008
.000
0.063
0.006
.000
0.072
0.006
levelofneed0_B
0.780
0.094
.000
0.746
0.083
.000
0.743
0.080
levelofneed0_H
0.593
0.071
.000
0.729
0.067
.000
0.739
0.063
levelofneed0_W
0.059
0.061
.336
0.097
0.062
.114
0.203
0.063
levelofneed0_A
0.452
0.184
.014
0.234
0.171
.172
0.305
0.175
freelunch2014_B
0.333
0.159
.036
-0.210
0.155
.174
0.044
0.178
freelunch2014_H
0.199
0.149
.183
-0.450
0.148
.002
-0.124
0.171
freelunch2014_A
-0.111
0.176
.527
-0.365
0.178
.040
-0.282
0.205
freelunch2014_W
0.133
0.135
.325
-0.361
0.133
.007
-0.189
0.157
mperf2014_imp
2.147
0.023
.000
2.224
0.030
.000
2.259
0.037
emperf2013_PA_mean_imp
3.308
0.108
.000
3.672
0.104
.000
3.580
0.116
emperf2013_B
-0.273
0.152
.072
-0.373
0.157
.017
-0.151
0.158
emperf2013_H
-0.123
0.148
.404
-0.083
0.140
.550
-0.174
0.146
emperf2013_W
-0.153
0.116
.187
-0.230
0.109
.034
-0.003
0.122
emperf2013_A
0.058
0.156
.708
-0.166
0.153
.276
-0.060
0.166
emperf2013_levelofneed0
-0.062
0.114
.586
-0.014
0.105
.891
-0.266
0.119
Constant
-1.802
0.228
.000
-1.476
0.218
.000
-1.104
0.231
Sign.
.715
.917
.960
.305
.224
.000
.000
.226
.000
.000
.000
.001
.082
.804
.468
.168
.230
.000
.000
.339
.234
.980
.716
.026
.000
31
Related documents
Download