2010-11 HAP Proposal Project Title

advertisement
2010-11 HAP Proposal
Project Title: 2010-11 Humanities Assessment Project
Persons Submitting Proposal: Cathy Hardy (2321) & Anne Dvorak (2260)
I: Brief Description:
The Humanities Assessment Project has undergone a number of iterations since 2001.
Originally conceived as a 40-minute timed essay response, it has changed to an essay involving
the writing process and currently will require a ten-page documented essay on a particular
prompt. Our assumption is that a more involved, capstone-like measure would be the best
means of demonstrating significant competence for the outcome.
The structure for assessing this measure has changed relatively little since the inception
of the assessment tool. Faculty across the humanities disciplines will be normed to a four-point
scale and evaluate the essays holistically in a double-blind technique. Prior iterations of the tool
have shown problematic inter-rater reliability. The goal for this iteration is to try to regularize
the prompt and perhaps tweak the scoring rubric to make scoring easier across the disciplines.
The data sample at present comes from Longview and perhaps Maple Woods; the size of the
sample and the type of courses under study would be similar to the previous study in order to
determine any significant changes over time. The prompt would ask the following:
A. Humanities Question:
Identify and articulate the aesthetic standards used to determine if a piece of music, art or
literature is a masterpiece or classic.
II. MCC GenEd Outcome Assessed:
Our measure is derived from the MCC outcome for the humanities, specifically
subcategories 2 and 5, described as follows:
Humanities Component
Courses in MCC‟s general education component will provide opportunities for students to
develop their understanding of the ways in which humans have addressed their condition
through imaginative work in creative art and speculative thought; to deepen their
understanding of how that imaginative process is informed and limited by social, cultural,
linguistic, and historical circumstances; and to recognize that the virtual world of the
creative imagination is a form of knowledge different from, but as important as, empirical
knowledge. Students will be able to:
2: identify aesthetic standards used to make critical judgments.
5: articulate a response to participation in, or observance of, works in the arts and
humanities based upon aesthetic standards.
1
2010-11 HAP Proposal
III. Detailed Description of Assessment:
A: Administration
1. What question/prompts/artifacts will be used to assess which outcome?
The question faculty will provide students as a writing prompt is to “Identify and articulate
the aesthetic standards used to determine if a piece of music, art or literature is a
masterpiece or classic.”
A final eight to ten page, multi-revisionary paper with appropriate MLA documentation
and citation will be submitted by students in Longview humanities courses at the end of
the 2011 spring semester.
2. How will the assessment be administered?
The written prompt will be provided to faculty before the beginning of the spring 2011
semester, to be embedded in the course content. The final draft will be used for the
assessment project.
3. To how many students/classes will the assessment be administered? The anticipated sample will
provide 10-15 classes with an average of 20 responses per class, yielding between 200-300 student
submissions. Such a sample size duplicates the previous iteration of the tool.
4. In which classes will the assessment be administered?
Music Appreciation, Music of the World’s Cultures, Art History and 200 level literature classes will
administer the assessment. Students in these classes are likely to have taken 30+ credit hours; hence,
a random sample of students meeting this data element will be extracted from the entire sample under
study for further analysis.
B. Evaluation of responses:
1. How will the student responses be scored or rated?
The artifacts will be rated using the 4 point Humanities rubric developed for the 2005 HALP. It is
possible, based on rubrics developed by other schools recently, that this rubric will be tweaked
slightly to ensure clear guidelines.
Each artifact will have a minimum of two readings. Non-content instructors will rate the submissions.
If there is more than a 1-point discrepancy the artifact will have a third, non-content instructor
reading.
2
2010-11 HAP Proposal
Rubric for Humanities Assessment Project
Students will identify aesthetic criteria, theory or cultural context and will apply such to analyze
or interpret a work in the Humanities.
Conceptual
Analytical
Terminology
Provides strong, effective
and detailed evidence
which is significant and
appropriate to the chosen
interpretation; Interprets
evidence in a
sophisticated manner to
support a thesis; can use
self evident arguments
and can go beyond the
self evident to explain and
support an arguments with
relevant evidence.
Uses specific evidence but
it may not be singularly
appropriate or significant;
Interpretations are
sporadic, uses
inappropriate evidence or
simply restates classroom
observations.
Does not always provide
evidence when needed;
significant interpretations
are missing; arguments do
not stand without
generous reading between
the lines and assumptions
of understanding by reader
Summarizes or describes
a work with no attempt to
interpret or analyze;
Discussion is
disconnected completely
from referenced criteria,
theory or cultural context
Can use subject terminology
consistently and accurately.
Can redefine terms in their
own words; Can recognize
the historical or cultural
origin of terms
4.
Outstanding
Can define in detail
appropriate aesthetic criteria,
theory or cultural context; can
use aesthetic criteria, theory or
cultural context accurately; can
synthesize ideas beyond
restatements of the classroom
context.
3.
Acceptable
Can define aesthetic criteria,
theory or cultural context in
general terms; can recall and
restate classroom arguments
which use aesthetic criteria,
theory or cultural context
2.
Borderline
Attempts to define aesthetic
criteria, theory or cultural
context in a vague way; vague
understanding of aesthetic
criteria, theory of cultural
context causes a misreading
of work under consideration
1.
Unacceptable
Does not define, identify or
apply aesthetic criteria, theory
or cultural context; grossly
misreads work with arbitrary
and unsupported responses
3
Terminology is present but
not used consistently and is
applied only partially;
Understanding is implied but
not clearly evident; restates
verbatim classroom
experience.
Attempts to use terminology
but may misapply or only
display partial
understanding; Terminology
is vague and inconsistent
No relevant terminology
used; no appreciation of the
need of clear or specific
terminology; terminology
grossly misused
Humanities Assessment Project 2011 Results
2. Who will score or rate the responses?
Non-content instructors will rate the responses. For example, music instructors will rate
art and literature responses. Art instructors will rate music and literature responses.
Content instructors will not rate responses in their own discipline. The scoring will be a
double-blind, so that faculty do not know the other scorer or the other score; they also will
not know which instructor taught the class.
3. Will evaluators need to be trained or normed? Yes.
First, all instructors will require training in the appropriate question/ prompt development.
Issues had surfaced in the 2005 HALP with the inter-rater reliability.
A detailed description of the inter-reliability issue can be found in the 2005 HALP report
located under reports on the MCC Assessment Office web page. Below is a brief
explanation of the issue.
In Appendix 3 of the 2005 HALP report, the data drawn up from the research office
indicated the Kappa scores (inter-rater reliability) had only one set of raters (Table 17)
with a Fair to Good Agreement. When there was enough data to calculate inter-rater
reliability, most raters were in Poor Agreement.
It was not a surprise to the HALP committee. During the faculty debriefing sessions after
each rating period, concerns were expressed about the inconsistency among the various
prompts prepared by the faculty for the HALP. Some prompts were structured according
to the HALP rubric guidelines. Other prompts did not request information in that depth.
Raters were conflicted as how to score submissions that were not structured according to
the HALP rubric. The kappa scores support such conflict.
Purposed Intervention
Training for faculty willing to participate in the HALP will be necessary in order to develop
questions that incorporate the three aspects of the rubric. (Conceptual Framework,
Analytical Skills and Use of Terminology).
Prompts and assessment questions need to be developed that keep in mind that raters
cannot know the specific classroom context in which the question is used. The structure
of HALP prompts should provide the framework for raters devoid of specific classroom
context.
A faculty training retreat is the suggested plan of action. Close work with the WAC
coordinator in question and assignment development will be a benefit to the HALP
contributors.
Secondly, once the appropriate prompts/questions are embedded in the course content,
then at the end of the semester the final draft will be submitted for the assessment
project. The classroom instructor will give the student the final grade and all papers are
to be submitted to the HALP coordinator.
The classroom instructor will determine which paper would be a good example of a 4Outstanding response, a 3- Acceptable response, a 2- Borderline response and a 1Unacceptable response. The classroom instructor will indicate to the HALP coordinator
that these submissions are to be used for the norming session. At no time is the
instructor to give any indication of his/her ratings. The coordinator will assemble the
4
Humanities Assessment Project 2011 Results
norming packets and the classroom instructor will explain to the raters the prompt and the
desired outcome. The other instructors will rate and discuss each submission until a
consistent understanding is achieved.
C. Reporting of results
How will the results of the assessment be reported to faculty?
The assessment results will be reported in the form of a written report located on the
Assessment Office‟s web page. In formation will also be shared in breakout sessions
during the district in-service and faculty convocation. HAP committee members will
present the results to any constituent of the MCC district upon invitation. HAP will also
share the results with any of the Humanities Divisions at a division meeting.
How will the discussion of the results, interpretation and intervention be facilitated?
Careful notes of comments, interpretations and debriefing conversations will be incorporated in the
feedback year 2011/12. If interventions are deemed necessary, the committee will collect suggestions
from faculty at debriefing and create an intervention plan.
IV. Resources Required by Assessment
A. Personnel
All instructors submitting artifacts or responses will be urged to become raters for the assessment
project. It is the goal to have all instructors involved. Additional faculty in the humanities will be
recruited as needed.
B. Compensation
All raters will be paid $25.00 an hour for training, norming and assessing.
The proposed 2011 length paper of eight to ten pages will require 30 minutes to assess. (This figure
represents the total time for 2 readings. Each reading will take approximately 15 minutes.)
Based on the previous HALP study, we anticipate that 30 % of the papers will require a third reading
and will require an average time of 15 minutes to assess.
Minimum: 200 papers; 10 raters
Cost based on
200 papers and 10
raters
Time in hours for
10 raters
Training (4 hours)
40
Cost Per
Hour
$25.00
Subtotal
Cost
$1,000.00
Norming (2.5 hours)
25
$25.00
$625.00
Assessment on 200
papers (5 hours)
50
$25.00
$1,250.00
5
Humanities Assessment Project 2011 Results
Assessment on 60
papers requiring a 3rd
reading
(1.5 hrs)
15
$25.00
$375.00
$3,250.00
Total Cost on
200 papers
Maximum: 300 papers; 15 raters
Cost based on
300 papers and 15
raters
Time in hours for
15 raters
Training (4 hours)
60
Norming (2.5 hours)
Assessment on 300
papers (5 hours)
Assessment on 90
papers requiring a 3rd
reading (1.5 hrs)
Cost Per
Hour
$25.00
Subtotal
Cost
$1,500.00
37.5
$25.00
$937.50
75
$25.00
$1,875.00
22.5
$25.00
$562.50
$4,875.00
Total Cost on
300 papers
Coordinator Compensation -Spring 2011
3 work units $800.00=$2400.00
Coordinator Duty Description
1
Print and Assemble Training packets for each rater.
A Develop training materials with WAC Coordinator
B Copy and assemble all materials.
C Facilitate training session
2
Assemble Norming Packet
A Gather instructors’ prompts and responses representing a 4 paper, a 3 paper,
paper and a 1 paper.
a2
B Print sample rating sheets on 2 part carbonless paper.
C Copy all papers and prompts and assemble norming packets.
D Facilitate norming session and note comments and enter scores on artifacts
the norming session.
6
used for
Humanities Assessment Project 2011 Results
3. Organize and Facilitate the Assessment session.
A Collect all artifacts.
B Peruse each paper removing or covering up the student name and assure
papers have
the only Student ID number. Label the paper with the packet letter
and number of
order.
C Put 8 to 10 papers in each packet. Packets are identified by letters.
D Use Metrosoft to print each class roster. Place packet letter(s) on each roster.
E Download student ID numbers and place on each score sheet.
F Assign and print color labels that anonymously identify each rater and still allow
Kappa Score analysis.
F Develop the raters scoring sheets in excel and print on 2-part carbonless
for
paper.
G Build a spreadsheet for each packet.
H Enter the rater’s score into the spreadsheet.
I Prepare the hard copy score sheets for shipment to the district assessment
J
Email, burn onto CD and mail hard copy of spreadsheet to the Office of
office.
Assessment.
K Work with and coordinate with the Office of Assessment on interpretation of
data.
L Write the Spring 2011 HALP report.
M Prepare and serve refreshments and lunch.
C. Itemized Cost of Materials
1. 1x 2 5/8 labels for rater color code labels, packet labels, paper labels-$8.55
2. 1 ream of 2 part carbonless paper for score sheets--$7.40
3. 10x12 clasps envelops for packets-$14.56
4. Copy Expense of prompts and norming examples-$150.00
5. Assessment day refreshments and lunch for 15 raters and coordinator-$200.00
6. Banker Box for artifact and hard copy data storage-6.99
Supplies Cost: 387.50
Maximum Project Cost: $7662.50
V. Supporting Information
1. Literature
The humanities is one of the last general education outcomes to be assessed, for
two reasons. Humanities faculty have often been utilized for conducting other
assessment projects, especially in communication and in critical thinking, so the human
7
Humanities Assessment Project 2011 Results
resources have not been available. Secondly, the literature has been silent regarding
other humanities measures in academia, so we have had no models for guidance. There
may be other studies done across the country, but the only evidence of such lies in a few
webpage rubrics. Notable studies follow.
Parkland College collected essays from their Literature classes and scored them
on a four-point scale on such criteria as “a) ability to analyze and interpret, b) ability to
make clear connections between ideas, c) ability to support a stance with textual
evidence, d) ability to recognize and acknowledge variant readings and/or ambiguities in
meaning, e) ability to write clearly, with appropriate terminology.” A second study, in
College Teaching (2000), describes an assessment project for music listening classes
that used a three point rubric to discern if students “1. listen actively to the music, 2.
describe the music, in both plain English and musical language, and discuss it in terms of
style; 3. make connections between the music and its social and historical context, 4.
write coherently about music.” A third project, a portfolio including an “aesthetic
analysis,” asks students to choose an analysis they have written for one of the visual or
performing arts. Students are instructed to “demonstrate [their] ability to analyze the
work‟s form, structure, and contexts; ultimately, it should interpret the work in some way.”
Students are also asked to “describe the analytical thinking involved in the entry. . . . [to
provide a]judgment about the quality and the „representativeness‟ of [their] use of
analysis and/or evaluation.”
Our study is more ambitious than the studies listed here: most of these
assessments only address one discipline; some blend writing skills with appreciation
skills in their assessment; and they use a variety of measures, from tests to essays to
portfolios. The ambitiousness of our study may explain the difficulty we are having in
achieving strong inter-rater reliability; however, we are confident that such reliability can
be achieved.
The HALP committee feels there is much merit to the MCC-Longview Humanities Rubric. Even
though there was a Poor Agreement in the 2005 Kappa scores, there was one exception to the Kappa
scores in the HALP 2005 assessment. The prompt used by the music department for the 2005
assessment was consistent between all instructors. It dealt with all three aspects of the rubricTerminology, Analytical and Conceptual. The inter-rater reliability was at 78.9%. That placed the
Kappa scores for music at the Excellent Agreement category. (See the 2008 Music Program
Evaluation for more information-Longview section.) With that evidence, the HALP committee feels
that the rubric has great merit and with the appropriate training and norming, we expect to see a vast
improvement in the Humanities Kappa Scores along with a reduction of 3rd readings.
VI. Bibliography
Even 5 years after the 2005 HALP report, there is still not a lot of literature on
assessment projects in the country. The following website addresses have some
information on projects at a few institutions.
“Assessment of Student Learning Outcomes in General Education- Humanitites 2004-05.”
Assessment at Buffalo State: State University of New York. Office of Academic
Information and
Assessment. 1999. Web. 24 March 2010.
<http://www.buffalostate.edu/offices/assessment/assessment/humant.htm.>
8
Humanities Assessment Project 2011 Results
Dallinger, Judith M., and Karen B. Mann. “Assessing Student Knowledge of and Attitudes
Towards the Humanities.” College Teaching 48.3 (Summer 2000):95-101. JSTOR. Web.
24 March 2010.
Mann, Karen B. “You Can Herd CATs: Assessing Learning in the Humanities.” College
Teaching 48.3 (Summer 2000): 82-89. JSTOR. Web. 24 March 2010.
Murphy, John. “Case Study: Lessons Learned from Humanities Assessment
2000): 102-103. JSTOR.
Web. 24 March 2010.
in Music” College Teaching 48.3 (Summer
University of South Florida. “Foundations of Knowledge and Learning Core Curriculum:
Humanities Assessment Rubric.” Web. 24 March 2010.
<http://usfweb2.usf.edu/assessment/Resources/Humanities%20
core%20area%20integrated%20assessment%20rubric%20.pdf.>
http://heldref-publications.metapress.com/app/home/main.asp?referrer=backto
“Case Study: Assessing Students’ Learning in Introduction to Art.” College Teaching 48.3
(Summer 2000): 90-94. JSTOR. Web. 24 March 2010.
Though there is still not a lot of literature and other assessment projects, the rubric developed for the
HAP project by Anne Dvorak is in keeping with some of these other institutes’ holistic humanities
rubrics.
9
Humanities Assessment Project 2011 Results
Overview
The Humanities Assessment Project has undergone a number of iterations since 2001. Originally
conceived as a 40-minute timed essay response, it has changed to an essay involving the writing process
and it requires a final eight to ten page documented essay on a particular prompt. The assumption is that a
more involved, capstone-like measure would be the best means of demonstrating significant competence
for the outcome.
The structure for assessing this measure has changed relatively little since the inception of the
assessment tool. Faculty across the humanities disciplines will be normed to a four-point scale and
evaluate the essays holistically in a double-blind technique. Prior iterations of the tool have shown
problematic inter-rater reliability. The goal for this iteration is to try to regularize the prompt and perhaps
tweak the scoring rubric to make scoring easier across the disciplines. The data sample at present comes
from Longview and from multiple disciples (i.e., Music, Literature, Art, etc.). A total of 100 students
participated in this assessment during 2011 spring semester.
The prompt asked the following Humanities Question:
Identify and articulate the aesthetic standards used to determine if a piece of music, art or literature is a
masterpiece or classic.
This measure was derived from the MCC outcome for the humanities, specifically subcategories 2 and 5,
described as follows:
Humanities Component
Courses in MCC’s general education component will provide opportunities for students to develop
their understanding of the ways in which humans have addressed their condition through
imaginative work in creative art and speculative thought; to deepen their understanding of how
that imaginative process is informed and limited by social, cultural, linguistic, and historical
circumstances; and to recognize that the virtual world of the creative imagination is a form of
knowledge different from, but as important as, empirical knowledge. Students will be able to:
2: identify aesthetic standards used to make critical judgments.
5: articulate a response to participation in, or observance of, works in the arts and humanities
based upon aesthetic standards.
10
Humanities Assessment Project 2011 Results
There were two readings for each packet of papers. Papers were scored by non-content
instructors using the following 4-point rubric:
Conceptual
Analytical
Terminology
Provides strong,
effective and detailed
evidence which is
significant and
appropriate to the
chosen interpretation;
Interprets evidence in a
sophisticated manner to
support a thesis; can
use self evident
arguments and can go
beyond the self evident
to explain and support
an arguments with
relevant evidence.
Uses specific evidence
but it may not be
singularly appropriate
or significant;
Interpretations are
sporadic, uses
inappropriate evidence
or simply restates
classroom
observations.
Does not always
provide evidence when
needed; significant
interpretations are
missing; arguments do
not stand without
generous reading
between the lines and
assumptions of
understanding by
reader
Summarizes or
describes a work with
no attempt to interpret
or analyze; Discussion
is disconnected
completely from
referenced criteria,
theory or cultural
context
Can use subject
terminology consistently
and accurately. Can
redefine terms in their
own words; Can
recognize the historical
or cultural origin of
terms
4.
Outstanding
Can define in detail
appropriate aesthetic
criteria, theory or cultural
context; can use aesthetic
criteria, theory or cultural
context accurately; can
synthesize ideas beyond
restatements of the
classroom context.
3.
Acceptable
Can define aesthetic
criteria, theory or cultural
context in general terms;
can recall and restate
classroom arguments
which use aesthetic
criteria, theory or cultural
context
2.
Borderline
Attempts to define
aesthetic criteria, theory or
cultural context in a vague
way; vague understanding
of aesthetic criteria, theory
of cultural context causes a
misreading of work under
consideration
1. Unacceptable
Does not define, identify
or apply aesthetic criteria,
theory or cultural context;
grossly misreads work
with arbitrary and
unsupported responses
Terminology is present
but not used consistently
and is applied only
partially;
Understanding is implied
but not clearly evident;
restates verbatim
classroom experience.
Attempts to use
terminology but may
misapply or only display
partial understanding;
Terminology is vague
and inconsistent
No relevant terminology
used; no appreciation of
the need of clear or
specific terminology;
terminology grossly
misused
11
Humanities Assessment Project 2011 Results
Raters for the material were non-content faculty members divided into groups that were
designated by colors; e.g., Red, Green, etc. The scoring will be a double-blind, so that
faculty do not know the other scorer or the other score; they also will not know which
instructor taught the class.
During the 2005 HALP assessment, there were some issues regarding inter-rater
reliability. A detailed description of the inter-reliability issue can be found in the 2005
HALP report located under reports on the MCC Assessment Office web page. For this
administration, all instructors required training in the appropriate question/prompt
development.
The inter-rater reliability for the scoring groups was conducted using kappa. Kappa is a
measure of inter-rater agreement that tests if the counts in cells of a scoring activity differ
from those expected by chance alone. The important thing to remember about kappa is
that the higher the kappa score the more closely equivalent the two ratings are for the
same piece of writing. If writings require third readings then the kappa value will
decrease accordingly. As a measure, kappa values follow an interpretative pattern of:
.75 or higher indicates Excellent Agreement; .40 to .75 is Fair to Good Agreement; and
.40 or less indicates Poor Agreement.
Distribution of Scores
Figure 1 depicts the distribution of scores for the first group of raters. As can be
observed, the distribution follows a normal curve. The mean score for the first group is
2.83 with a standard deviation of 0.8. This can be compared to the data from 2005 that
showed a mean of 2.27, a standard deviation of 0.8 and total number of subjects of 195.
Figure 1
Reader 1 Score Distribution
50
46
45
Mean = 2.83
SD = 0.81699627
N = 100
40
Frequency
35
28
30
25
21
20
15
10
5
5
0
0
1.00
2.00
3.00
4.00
5.00
Read 1 Scores
12
Humanities Assessment Project 2011 Results
Figure 2 shows the distribution for the second set of scores. It also follows a normal
distribution. The mean for these scores is 3.04 with a standard deviation of 0.8, very
similar characteristics of the first group of scores shown above. The 2005 data showed a
mean of 2.35, a standard deviation of 0.9 and an N of 195.
Figure 2
Reader 2 Score Distribution
50
46
45
Mean = 3.04
SD = 0.811419
N = 100
40
Frequency
35
31
30
25
19
20
15
10
5
4
0
0
1.00
2.00
3.00
4.00
5.00
Read 2 Scores
Inter-rater Reliability
In order to examine the inter-rater reliability, we compared each combination of raters
and conducted a cross tabulation of their ratings. These cross tabulations allow the
readers to visualize the instances in which two raters agree for each score given. The
coefficient kappa, which is normally used to establish the level of inter-rater reliability,
was used only on some of the cases. In order to use kappa, there needs to be a parallel set
of scores for each rater (i.e., if one rater gave a score of 4 and the other rater gave 1, 2 or
3, there is no parallel comparison). See discussion regarding kappa above. For the
purposes of this assessment, the names of the raters have been replaced by colors so that
the raters remain anonymous.
Table 1 shows the kappa value of all participants using two-rater scoring. This table
depicts the level of agreement between the scores given by all combinations of raters. As
can be observed in this table, on 3 occasions, both scorers gave a score of 1; 13 instances
of both giving a 2; 29 cases of agreeing on a score of 3; and 14 instances of both scorers
agreeing on a score of 4; indicating 100 instances of agreement. Of the 10 student papers
scored by the 11 sets of scorers, the kappa value for the scoring is .386 (p<0.001), 95%
13
Humanities Assessment Project 2011 Results
CI (0.504, 0.848). The pairing of each packet was analyzed and if there was a confidence
interval of at least 95%, the kappa score was provided. The interpretative pattern would
rate this under the .40 or less which indicates Poor Agreement.
Table 1
All Participants
HAP 2011 Kappa
First
1
3
2
0
0
5
Second Rater Score
1
2
3
4
Total
Rater
2
1
13
11
3
28
Score
3
0
3
29
14
46
4
0
1
6
14
21
Total
4
19
46
31
100
NOTE: Kappa = .386
Table 2 depicts the first set of raters – Pink and Red, Packet A. Table 2 shows the level
of agreement between the scores given by Pink and Red. The darkened areas on the table
show where no ratings were given for specific score combinations. The sample size and
clustering of scores resulted in no kappa score calculated.
Table 2
Pink-Red Raters A Packet
Second Rater Score
1
2
3
4
Total
First
1
1
1
Rater
2
2
2
2
Score
3
4
3
1
4
0
Total
1
1
5
1
8
NOTE: Kappa cannot be calculated.
Table 3 depicts the level of agreement between the scores given by Blue and Green,
Packet B. The darkened areas on the table show where no ratings were given for specific
score combinations. Kappa was not calculated due to the small number of scores. When
raters evaluate a small number of papers, the sparse data rule (having cells with no
values) becomes more of an issue.
14
Humanities Assessment Project 2011 Results
Table 3
Yellow-Violet, Packet B
First
1
Second Rater Score
1
2
3
4
Total
Rater
2
Score
3
1
1
4
1
5
4
Total
1
1
2
1
6
1
8
NOTE: Kappa cannot be calculated
Table 4 depicts the level of agreement between scores given by Violet and Yellow,
Packet C. As can be observed, the raters agree 5 times; that is, provide identical ratings
for the same paper. Kappa was not calculated.
Table 4
Violet-Yellow Raters, Packet C
First
1
Second Rater Score
1
2
3
4
Total
Rater
2
Score
3
2
2
1
2
4
3
4
Total
1
1
2
3
5
1
9
NOTE: Kappa cannot be calculated.
Table 5 depicts the level of agreement between scores given by Red and Violet, Packet
D. As can be observed, the raters agree 6 times; that is, provide identical ratings for the
same paper. Consequently, the kappa value was calculated for this pair with a rating of
.491 (p<0.037). This would indicate Fair to Good Agreement.
Table 5
Red-Violet Raters, Packet D
Second Rater Score
1
2
3
4
Total
First
1
Rater
2
Score
3
4
Total
1
2
3
2
3
4
9
2
2
2
2
4
NOTE: Kappa = .491.
15
Humanities Assessment Project 2011 Results
Table 6 depicts the level of agreement between scores given by Yellow and Pink, Packet
E. This group had a very low number of participants (N=5) so Kappa would be
inconclusive and was not calculated
Table 6
Yellow-Pink, Packet E
Second Rater Score
1
2
3
4
Total
First
1
1
Rater
2
1
1
1
Score
3
2
4
2
Total
2
1
2
2
5
NOTE: Kappa cannot be calculated due to sparse cells.
Table 7 depicts the level of agreement between scores given by Yellow and Violet,
Packet F. Only 4 ratings of the 10 subjects matched so Kappa cannot be calculated.
Table 7
Yellow-Violet Raters, Packet F
Second Rater Score
1
2
3
4
Total
First
1
Rater
2
2
2
Score
3
4
Total
2
4
6
2
2
4
6
10
NOTE: Kappa cannot be calculated due to sparse cells.
Table 8 depicts the level of agreement between scores given by Pink and Yellow, Packet
G. Again, due to sparse cells, Kappa cannot be calculated.
Table 8
16
Humanities Assessment Project 2011 Results
Pink-Yellow Raters, Packet G
First
1
Second Rater Score
1
2
3
4
Total
Rater
2
Score
3
2
2
2
3
4
5
4
Total
1
4
6
10
NOTE: Kappa cannot be calculated due to sparse cells.
Table 9 depicts the level of agreement between scores given by Dark Blue and Green,
Packet H. Even though this packet of raters agree 5 times out of 10 scores given, kappa
could not be calculated due to sparse cells.
Table 9
Dark Blue-Green Raters, Packet H
Second Rater Score
1
2
3
4
Total
NOTE: Kappa cannot be calculated.
First
1
1
1
Rater
2
Score
3
1
1
2
2
4
2
4
Total
1
2
3
2
4
4
10
Table 10 depicts the level of agreement between scores given by Dark Blue and Red,
Packet I. As can be observed, this pair of readers scored a small number of papers.
However, because of the clustering of their respective score ratings, a kappa value of .582
(p< 0.002), or Fair to Good agreement was calculated for this set of ratings.
Table 10
Dark Blue-Red Raters, Packet I
Second Rater Score
1
2
3
4
Total
NOTE: Kappa = .582
First
1
1
Rater
2
Score
3
4
1
3
1
2
2
3
4
5
Total
1
1
3
6
11
17
Humanities Assessment Project 2011 Results
Table 11 depicts the level of agreement between scores given by Green and Violet,
Packet J. However, because of the clustering of their respective score ratings, a kappa
value of .457 (p<0.025), or Fair to Good agreement was calculated for this set of ratings.
Table 11
Green-Violet Raters, Packet J
First
1
Second Rater Score
1
2
3
4
Total
Rater
2
Score
3
2
1
1
4
1
3
1
5
4
Total
2
2
3
4
4
11
NOTE: Kappa = .457.
Table 12 depicts the level of agreement between scores given by Green and Yellow,
Packet K. Because of the asymmetrical score distribution, a kappa value cannot be
calculated.
Table 12
Green-Yellow Raters, Packet K
Second Rater Score
1
2
3
4
Total
NOTE: Kappa cannot be calculated.
First
1
Rater
2
Score
3
4
3
3
4
1
5
1
1
Total
3
5
1
9
Table 13 depicts the level of agreement between Dark Blue, and Green raters. When
there was a discrepancy of at least 2 scores between the raters, a third rater examined the
paper. This table has the third raters score substituted for the score of the score that is
most unlike the other two. This raised the Kappa score from .582 to .847 (p<0.001).
18
Humanities Assessment Project 2011 Results
Table 13
Dark Blue-Red-Green Raters, Packet I
Second Rater Score
1
2
3
4
Total
NOTE: Kappa = .847
First
1
1
1
Rater
2
Score
3
1
3
1
3
4
6
6
Total
1
1
3
6
11
Table 14 depicts the level of agreement between scores given by Green, Violet and
Yellow, Packet J. Yellow was the third rater and adding this adjustment changed the
kappa value of .457 (p<0.025) to kappa value .588 (p<0.005). This indicates Excellent
Agreement.
Table 14
Green-Violet-Yellow Raters, Packet J
Second Rater Score
1
2
3
4
Total
First
1
Rater
2
2
1
3
Score
3
1
3
1
5
4
Total
3
3
3
4
4
11
NOTE: Kappa = .588.
Table 15 shows the kappa value of all participants using three-rater scoring. Packets I
and J had discrepancies in the scoring where the results were more than 2 points from
each other. Four student papers were scores by a third rater. With these adjustments, the
kappa value went from .386 (p<0.001) to kappa value .427 (p<0.001) – Fair to Good
Agreement.
Table 15
All Participants
HAP 2011 Kappa
Second Rater Score
1
2
3
4
Total
First
1
3
2
0
0
5
Rater
2
1
13
12
0
28
Score
3
0
3
29
14
46
4
0
0
6
17
21
Total
4
19
46
31
100
NOTE: Kappa = .427
19
Humanities Assessment Project 2011 Results
Summary of Rating Analysis

This cycle of ratings for HAP was based on 100 subjects and had a Mean
value of 2.935 and a standard deviation of .8. This can be compared to the
ratings from 2005 HALP is based on 195 scorings representing a Mean HALP
value of 2.32 and a standard deviation of 0.9.

It appears that students earned higher ratings in 2011 than in 2005 which may
indicate better understanding of the material.

Eleven different pairs of raters scored the material. Raters were assigned a
color in order to maintain anonymity.

Inter-rater reliability was obtained by using the coefficient kappa. Kappa is a
symmetrical analysis where there needs to be a parallel set of scores for each
rater (i.e., if one rater gave a score of 4 and the other rater gave 1, 2 or 3, there
is no parallel comparison).

Wide dispersion of score ratings and low sample of subjects eliminates the
kappa statistic from being calculated.

Kappa coefficient for the data set with two raters is .386 and with three raters
is .427.
Submitted by,
Kristy Bishop, Ph.D.
Director of Institutional Research and Assessment
Metropolitan Community College
September 6, 2011
20
Humanities Assessment Project 2011 Results
HUMANITIES ASSESSMENT PROJECT
SUPPLEMENTAL ANALYSIS
Spring Semester 2011
This document is a supplemental analysis of the HAP data collected for Spring 2011.
Analysis was conducted on 100 scores. A previous document contained the kappa values
focusing on inter-rater reliability. This document examines how groups of students
scored on the instrument given specific demographic and academic characteristics.
The analysis is conducted using three scores: Reader 1 HAP score, Reader 2 HAP score,
and an Average Score based on the average of the score sum of Reader 1 and 2. Each
score is presented using the MEAN for that score and its associated standard deviation.
Table 1 shows the scores for each HAP Score Group. Notice the similarity of Mean
scores and their equivalent standard deviation. Reader 1 and 2 scores are very similar
with identical standard deviation. Reader 3 only had 4 scores so the mean and standard
deviation are not listed below.
Table 1
HAP Scores
HAP Score Group
MEAN
Standard Deviation
Reader 1 HAP Score
2.83
.81
Reader 2 HAP Score
3.04
.81
Average HAP Score*
2.94
.82
NOTE: Analysis based of 100 scores..
* Represents a simple average of Reader1 and 2 scores summed.
Table 2 shows the HAP scores by Racial/Ethnic Affiliation. As can be seen, White
students scored .47 points higher than their Minority counterparts with using the Average
HAP score. This difference was .28 in 2005. Regardless of the course (e.g., Reader 1, 2
or Average Score), White students scored higher than their Minority counterparts.
Readers are cautioned to avoid over simplification of the difference. The wide variation
in number of students (e.g., N=) makes using measures of association of little value.
21
Humanities Assessment Project 2011 Results
Table 2
HAP Scores by Race/Ethnic Affiliation
Demographic Group
MEAN
Standard Deviation
White Students (N=75)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.91
3.18
3.05
.75
.72
.75
Minority Students (N=25)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.58
2.58
2.58
.95
.91
.93
Race/Ethnic Affiliation:
Table 3 shows the variation in HAP scores based on student age. The age distinctions
used are those for “Traditional Students”, students aged 24 years or less, and for “NonTraditional Students”, or those students aged 25 years or greater. As can be seen,
younger students scored higher than their older counterparts. This is opposite of the
results in 2005. As with the analysis for Racial/Ethnic Affiliation, the sample size
differences are too great to perform any measure of association that would be meaningful.
Table 3
HAP Scores by Age Group
Demographic Group
MEAN
Standard Deviation
Students Aged 24 Years or Less
(N=83)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.86
3.06
2.96
.81
.83
.77
Students Aged 25 Years or More
(N=17)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.71
2.94
2.82
.82
.73
.78
NOTE: Based on an analysis of the Traditional/Non-Traditional Paradigm.
22
Humanities Assessment Project 2011 Results
Table 4 examines students that had a COMPASS Write placement score that would have
placed them into Developmental English. In 2005, placement in Development English
was based on a COMPASS Write score of 64 or less. In 2011, placement in
Developmental English was based on a COMPASS Write score of 69 or less. The
distinction between Developmental English and non-Developmental English is not very
large. This could be based on the small sample size.
Table 4
HAP Scores by Write Placement
Demographic Group
MEAN
Standard Deviation
Placed into Developmental English
(N=25)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.72
2.96
2.84
.87
.82
.86
Not Placed into Developmental English
(N=75)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.87
3.07
2.97
.79
.81
.80
NOTE: Placement in Developmental English based on a COMPASS Write Placement score of 69 or less;
placement score for placement into ENGL 101 is 70.
Table 5 shows an analysis between those students who have completed 40 credit hours, at
the time of completing the HAP assessment, and those with less than 40 credit hours.
Students who an earned 40 credit hours or more prior to taking the HAP assessment
scored slightly higher than the “less than 40” counterparts.
23
Humanities Assessment Project 2011 Results
Table 5
HAP Scores by </> 40 Credit Hours
Demographic Group
MEAN
Standard Deviation
Earned Less than 40 Credit Hours
(N=48)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.77
2.96
2.86
.85
.82
.84
Earned More than 40 Credit Hours
(N=52)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.88
3.12
3.00
.78
.80
.80
Table 6 shows the HAP scores by student cumulative GPA. As can be seen, and
probably with no surprise, students with a cumulative GPA of 2.00 or higher tend to
perform much better on the HAP assessment. In fact, the difference is quite large, .71
scale points.
Table 6
HAP Scores by CumGPA
Demographic Group
MEAN
Standard Deviation
Cum GPA Less than 2.00
(N=10)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.00
2.60
2.30
.77
1.11
1.00
Cum GPA Greater Than 2.00
(N=90)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.92
3.09
3.01
.76
.75
.76
NOTE: Cum GPA is used as an index of total college experience. A cum GPA of “Less than 2.00” means
1.99 or less.
Table 7 shows those students who have enrolled in ENGL 101 prior to taking the HAP
assessment. As can be seen there is a small difference in the two sets of scores.
According to these data, having completed ENGL 101 does provide students a small
score advantage when taking the HAP assessment.
24
Humanities Assessment Project 2011 Results
Table 7
HAP Scores by Enrolled ENGL 101
Group
MEAN
Standard Deviation
Haven’t enrolled in ENGL 101 (N=27)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.67
3.04
2.85
.90
.88
.90
Enrolled in ENGL 101 (N=73)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.89
3.04
2.97
.77
.78
.78
Table 8 shows the relative score difference between all students and those that have
completed ENGL 102 prior to taking the HAP assessment. As can been seen from the
data, students with ENGL 102 experience slightly higher HAP assessment scores than do
students who have not completed ENGL 102.
Table 8
HAP Scores by Enrolled ENGL 102
Group
MEAN
Standard Deviation
Haven’t enrolled in ENGL 102 (N=36)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.61
2.83
2.72
.92
.83
.88
Enrolled in ENGL 102 (N=64)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.95
3.16
3.05
.72
.77
.75
25
Humanities Assessment Project 2011 Results
Table 9 shows the HAP scores for those students who have completed either ENGL 101
or 102 and have taken the HAP assessment. Students with “COMP” experience do earn
higher scores than students who have no “COMP” experience.
Table 9
HAP Scores by Enrolled in COMP
Group
MEAN
Standard Deviation
No COMP (N=15)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.47
2.67
2.57
1.02
.94
.99
Enrolled in COMP* (N= 85)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.89
3.11
3.00
.75
.77
.77
NOTE: * Refers to any tested student that has completed either ENGL 101 or 102 or both.
Table 10 shows the variation in HAP scores based on student gender. As can be seen,
female students scored higher than male students. As with the analysis for Racial/Ethnic
Affiliation, the sample size differences are too great to perform any measure of
association that would be meaningful.
Table 10
HAP Scores by Gender
Demographic Group
MEAN
Standard Deviation
Male Students (N=44)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.68
3.01
2.83
.85
.79
.81
Female Students (N=56)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.95
3.09
3.02
.77
.85
.81
26
Humanities Assessment Project 2011 Results
Table 11 shows variation in HAP scores based on student enrollment. Full-Time
enrollment is based on the student’s enrollment in 12 or more hours on census date. PartTime enrollment is 11 or fewer hours on census date. As can be seen, full-time students
scored higher than part-time students. The differences in the sample size are too great to
perform any measure of association that would be meaningful.
Table 11
HAP Scores by Full-Time/Part-Time Enrollment
Group
MEAN
Standard Deviation
Part-Time Enrollment (N=12)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.41
2.64
2.52
.94
.98
.97
Full-Time Enrollment (N=78)
Reader 1 HAP Score
Reader 2 HAP Score
Average HAP Score
2.95
3.15
3.05
.73
.72
.73
Any questions regarding this report should be directed to:
Kristy Bishop, Ph.D.
Director of Institutional Research and Assessment
September 26, 2011
27
Download