2010-11 HAP Proposal Project Title: 2010-11 Humanities Assessment Project Persons Submitting Proposal: Cathy Hardy (2321) & Anne Dvorak (2260) I: Brief Description: The Humanities Assessment Project has undergone a number of iterations since 2001. Originally conceived as a 40-minute timed essay response, it has changed to an essay involving the writing process and currently will require a ten-page documented essay on a particular prompt. Our assumption is that a more involved, capstone-like measure would be the best means of demonstrating significant competence for the outcome. The structure for assessing this measure has changed relatively little since the inception of the assessment tool. Faculty across the humanities disciplines will be normed to a four-point scale and evaluate the essays holistically in a double-blind technique. Prior iterations of the tool have shown problematic inter-rater reliability. The goal for this iteration is to try to regularize the prompt and perhaps tweak the scoring rubric to make scoring easier across the disciplines. The data sample at present comes from Longview and perhaps Maple Woods; the size of the sample and the type of courses under study would be similar to the previous study in order to determine any significant changes over time. The prompt would ask the following: A. Humanities Question: Identify and articulate the aesthetic standards used to determine if a piece of music, art or literature is a masterpiece or classic. II. MCC GenEd Outcome Assessed: Our measure is derived from the MCC outcome for the humanities, specifically subcategories 2 and 5, described as follows: Humanities Component Courses in MCC‟s general education component will provide opportunities for students to develop their understanding of the ways in which humans have addressed their condition through imaginative work in creative art and speculative thought; to deepen their understanding of how that imaginative process is informed and limited by social, cultural, linguistic, and historical circumstances; and to recognize that the virtual world of the creative imagination is a form of knowledge different from, but as important as, empirical knowledge. Students will be able to: 2: identify aesthetic standards used to make critical judgments. 5: articulate a response to participation in, or observance of, works in the arts and humanities based upon aesthetic standards. 1 2010-11 HAP Proposal III. Detailed Description of Assessment: A: Administration 1. What question/prompts/artifacts will be used to assess which outcome? The question faculty will provide students as a writing prompt is to “Identify and articulate the aesthetic standards used to determine if a piece of music, art or literature is a masterpiece or classic.” A final eight to ten page, multi-revisionary paper with appropriate MLA documentation and citation will be submitted by students in Longview humanities courses at the end of the 2011 spring semester. 2. How will the assessment be administered? The written prompt will be provided to faculty before the beginning of the spring 2011 semester, to be embedded in the course content. The final draft will be used for the assessment project. 3. To how many students/classes will the assessment be administered? The anticipated sample will provide 10-15 classes with an average of 20 responses per class, yielding between 200-300 student submissions. Such a sample size duplicates the previous iteration of the tool. 4. In which classes will the assessment be administered? Music Appreciation, Music of the World’s Cultures, Art History and 200 level literature classes will administer the assessment. Students in these classes are likely to have taken 30+ credit hours; hence, a random sample of students meeting this data element will be extracted from the entire sample under study for further analysis. B. Evaluation of responses: 1. How will the student responses be scored or rated? The artifacts will be rated using the 4 point Humanities rubric developed for the 2005 HALP. It is possible, based on rubrics developed by other schools recently, that this rubric will be tweaked slightly to ensure clear guidelines. Each artifact will have a minimum of two readings. Non-content instructors will rate the submissions. If there is more than a 1-point discrepancy the artifact will have a third, non-content instructor reading. 2 2010-11 HAP Proposal Rubric for Humanities Assessment Project Students will identify aesthetic criteria, theory or cultural context and will apply such to analyze or interpret a work in the Humanities. Conceptual Analytical Terminology Provides strong, effective and detailed evidence which is significant and appropriate to the chosen interpretation; Interprets evidence in a sophisticated manner to support a thesis; can use self evident arguments and can go beyond the self evident to explain and support an arguments with relevant evidence. Uses specific evidence but it may not be singularly appropriate or significant; Interpretations are sporadic, uses inappropriate evidence or simply restates classroom observations. Does not always provide evidence when needed; significant interpretations are missing; arguments do not stand without generous reading between the lines and assumptions of understanding by reader Summarizes or describes a work with no attempt to interpret or analyze; Discussion is disconnected completely from referenced criteria, theory or cultural context Can use subject terminology consistently and accurately. Can redefine terms in their own words; Can recognize the historical or cultural origin of terms 4. Outstanding Can define in detail appropriate aesthetic criteria, theory or cultural context; can use aesthetic criteria, theory or cultural context accurately; can synthesize ideas beyond restatements of the classroom context. 3. Acceptable Can define aesthetic criteria, theory or cultural context in general terms; can recall and restate classroom arguments which use aesthetic criteria, theory or cultural context 2. Borderline Attempts to define aesthetic criteria, theory or cultural context in a vague way; vague understanding of aesthetic criteria, theory of cultural context causes a misreading of work under consideration 1. Unacceptable Does not define, identify or apply aesthetic criteria, theory or cultural context; grossly misreads work with arbitrary and unsupported responses 3 Terminology is present but not used consistently and is applied only partially; Understanding is implied but not clearly evident; restates verbatim classroom experience. Attempts to use terminology but may misapply or only display partial understanding; Terminology is vague and inconsistent No relevant terminology used; no appreciation of the need of clear or specific terminology; terminology grossly misused Humanities Assessment Project 2011 Results 2. Who will score or rate the responses? Non-content instructors will rate the responses. For example, music instructors will rate art and literature responses. Art instructors will rate music and literature responses. Content instructors will not rate responses in their own discipline. The scoring will be a double-blind, so that faculty do not know the other scorer or the other score; they also will not know which instructor taught the class. 3. Will evaluators need to be trained or normed? Yes. First, all instructors will require training in the appropriate question/ prompt development. Issues had surfaced in the 2005 HALP with the inter-rater reliability. A detailed description of the inter-reliability issue can be found in the 2005 HALP report located under reports on the MCC Assessment Office web page. Below is a brief explanation of the issue. In Appendix 3 of the 2005 HALP report, the data drawn up from the research office indicated the Kappa scores (inter-rater reliability) had only one set of raters (Table 17) with a Fair to Good Agreement. When there was enough data to calculate inter-rater reliability, most raters were in Poor Agreement. It was not a surprise to the HALP committee. During the faculty debriefing sessions after each rating period, concerns were expressed about the inconsistency among the various prompts prepared by the faculty for the HALP. Some prompts were structured according to the HALP rubric guidelines. Other prompts did not request information in that depth. Raters were conflicted as how to score submissions that were not structured according to the HALP rubric. The kappa scores support such conflict. Purposed Intervention Training for faculty willing to participate in the HALP will be necessary in order to develop questions that incorporate the three aspects of the rubric. (Conceptual Framework, Analytical Skills and Use of Terminology). Prompts and assessment questions need to be developed that keep in mind that raters cannot know the specific classroom context in which the question is used. The structure of HALP prompts should provide the framework for raters devoid of specific classroom context. A faculty training retreat is the suggested plan of action. Close work with the WAC coordinator in question and assignment development will be a benefit to the HALP contributors. Secondly, once the appropriate prompts/questions are embedded in the course content, then at the end of the semester the final draft will be submitted for the assessment project. The classroom instructor will give the student the final grade and all papers are to be submitted to the HALP coordinator. The classroom instructor will determine which paper would be a good example of a 4Outstanding response, a 3- Acceptable response, a 2- Borderline response and a 1Unacceptable response. The classroom instructor will indicate to the HALP coordinator that these submissions are to be used for the norming session. At no time is the instructor to give any indication of his/her ratings. The coordinator will assemble the 4 Humanities Assessment Project 2011 Results norming packets and the classroom instructor will explain to the raters the prompt and the desired outcome. The other instructors will rate and discuss each submission until a consistent understanding is achieved. C. Reporting of results How will the results of the assessment be reported to faculty? The assessment results will be reported in the form of a written report located on the Assessment Office‟s web page. In formation will also be shared in breakout sessions during the district in-service and faculty convocation. HAP committee members will present the results to any constituent of the MCC district upon invitation. HAP will also share the results with any of the Humanities Divisions at a division meeting. How will the discussion of the results, interpretation and intervention be facilitated? Careful notes of comments, interpretations and debriefing conversations will be incorporated in the feedback year 2011/12. If interventions are deemed necessary, the committee will collect suggestions from faculty at debriefing and create an intervention plan. IV. Resources Required by Assessment A. Personnel All instructors submitting artifacts or responses will be urged to become raters for the assessment project. It is the goal to have all instructors involved. Additional faculty in the humanities will be recruited as needed. B. Compensation All raters will be paid $25.00 an hour for training, norming and assessing. The proposed 2011 length paper of eight to ten pages will require 30 minutes to assess. (This figure represents the total time for 2 readings. Each reading will take approximately 15 minutes.) Based on the previous HALP study, we anticipate that 30 % of the papers will require a third reading and will require an average time of 15 minutes to assess. Minimum: 200 papers; 10 raters Cost based on 200 papers and 10 raters Time in hours for 10 raters Training (4 hours) 40 Cost Per Hour $25.00 Subtotal Cost $1,000.00 Norming (2.5 hours) 25 $25.00 $625.00 Assessment on 200 papers (5 hours) 50 $25.00 $1,250.00 5 Humanities Assessment Project 2011 Results Assessment on 60 papers requiring a 3rd reading (1.5 hrs) 15 $25.00 $375.00 $3,250.00 Total Cost on 200 papers Maximum: 300 papers; 15 raters Cost based on 300 papers and 15 raters Time in hours for 15 raters Training (4 hours) 60 Norming (2.5 hours) Assessment on 300 papers (5 hours) Assessment on 90 papers requiring a 3rd reading (1.5 hrs) Cost Per Hour $25.00 Subtotal Cost $1,500.00 37.5 $25.00 $937.50 75 $25.00 $1,875.00 22.5 $25.00 $562.50 $4,875.00 Total Cost on 300 papers Coordinator Compensation -Spring 2011 3 work units $800.00=$2400.00 Coordinator Duty Description 1 Print and Assemble Training packets for each rater. A Develop training materials with WAC Coordinator B Copy and assemble all materials. C Facilitate training session 2 Assemble Norming Packet A Gather instructors’ prompts and responses representing a 4 paper, a 3 paper, paper and a 1 paper. a2 B Print sample rating sheets on 2 part carbonless paper. C Copy all papers and prompts and assemble norming packets. D Facilitate norming session and note comments and enter scores on artifacts the norming session. 6 used for Humanities Assessment Project 2011 Results 3. Organize and Facilitate the Assessment session. A Collect all artifacts. B Peruse each paper removing or covering up the student name and assure papers have the only Student ID number. Label the paper with the packet letter and number of order. C Put 8 to 10 papers in each packet. Packets are identified by letters. D Use Metrosoft to print each class roster. Place packet letter(s) on each roster. E Download student ID numbers and place on each score sheet. F Assign and print color labels that anonymously identify each rater and still allow Kappa Score analysis. F Develop the raters scoring sheets in excel and print on 2-part carbonless for paper. G Build a spreadsheet for each packet. H Enter the rater’s score into the spreadsheet. I Prepare the hard copy score sheets for shipment to the district assessment J Email, burn onto CD and mail hard copy of spreadsheet to the Office of office. Assessment. K Work with and coordinate with the Office of Assessment on interpretation of data. L Write the Spring 2011 HALP report. M Prepare and serve refreshments and lunch. C. Itemized Cost of Materials 1. 1x 2 5/8 labels for rater color code labels, packet labels, paper labels-$8.55 2. 1 ream of 2 part carbonless paper for score sheets--$7.40 3. 10x12 clasps envelops for packets-$14.56 4. Copy Expense of prompts and norming examples-$150.00 5. Assessment day refreshments and lunch for 15 raters and coordinator-$200.00 6. Banker Box for artifact and hard copy data storage-6.99 Supplies Cost: 387.50 Maximum Project Cost: $7662.50 V. Supporting Information 1. Literature The humanities is one of the last general education outcomes to be assessed, for two reasons. Humanities faculty have often been utilized for conducting other assessment projects, especially in communication and in critical thinking, so the human 7 Humanities Assessment Project 2011 Results resources have not been available. Secondly, the literature has been silent regarding other humanities measures in academia, so we have had no models for guidance. There may be other studies done across the country, but the only evidence of such lies in a few webpage rubrics. Notable studies follow. Parkland College collected essays from their Literature classes and scored them on a four-point scale on such criteria as “a) ability to analyze and interpret, b) ability to make clear connections between ideas, c) ability to support a stance with textual evidence, d) ability to recognize and acknowledge variant readings and/or ambiguities in meaning, e) ability to write clearly, with appropriate terminology.” A second study, in College Teaching (2000), describes an assessment project for music listening classes that used a three point rubric to discern if students “1. listen actively to the music, 2. describe the music, in both plain English and musical language, and discuss it in terms of style; 3. make connections between the music and its social and historical context, 4. write coherently about music.” A third project, a portfolio including an “aesthetic analysis,” asks students to choose an analysis they have written for one of the visual or performing arts. Students are instructed to “demonstrate [their] ability to analyze the work‟s form, structure, and contexts; ultimately, it should interpret the work in some way.” Students are also asked to “describe the analytical thinking involved in the entry. . . . [to provide a]judgment about the quality and the „representativeness‟ of [their] use of analysis and/or evaluation.” Our study is more ambitious than the studies listed here: most of these assessments only address one discipline; some blend writing skills with appreciation skills in their assessment; and they use a variety of measures, from tests to essays to portfolios. The ambitiousness of our study may explain the difficulty we are having in achieving strong inter-rater reliability; however, we are confident that such reliability can be achieved. The HALP committee feels there is much merit to the MCC-Longview Humanities Rubric. Even though there was a Poor Agreement in the 2005 Kappa scores, there was one exception to the Kappa scores in the HALP 2005 assessment. The prompt used by the music department for the 2005 assessment was consistent between all instructors. It dealt with all three aspects of the rubricTerminology, Analytical and Conceptual. The inter-rater reliability was at 78.9%. That placed the Kappa scores for music at the Excellent Agreement category. (See the 2008 Music Program Evaluation for more information-Longview section.) With that evidence, the HALP committee feels that the rubric has great merit and with the appropriate training and norming, we expect to see a vast improvement in the Humanities Kappa Scores along with a reduction of 3rd readings. VI. Bibliography Even 5 years after the 2005 HALP report, there is still not a lot of literature on assessment projects in the country. The following website addresses have some information on projects at a few institutions. “Assessment of Student Learning Outcomes in General Education- Humanitites 2004-05.” Assessment at Buffalo State: State University of New York. Office of Academic Information and Assessment. 1999. Web. 24 March 2010. <http://www.buffalostate.edu/offices/assessment/assessment/humant.htm.> 8 Humanities Assessment Project 2011 Results Dallinger, Judith M., and Karen B. Mann. “Assessing Student Knowledge of and Attitudes Towards the Humanities.” College Teaching 48.3 (Summer 2000):95-101. JSTOR. Web. 24 March 2010. Mann, Karen B. “You Can Herd CATs: Assessing Learning in the Humanities.” College Teaching 48.3 (Summer 2000): 82-89. JSTOR. Web. 24 March 2010. Murphy, John. “Case Study: Lessons Learned from Humanities Assessment 2000): 102-103. JSTOR. Web. 24 March 2010. in Music” College Teaching 48.3 (Summer University of South Florida. “Foundations of Knowledge and Learning Core Curriculum: Humanities Assessment Rubric.” Web. 24 March 2010. <http://usfweb2.usf.edu/assessment/Resources/Humanities%20 core%20area%20integrated%20assessment%20rubric%20.pdf.> http://heldref-publications.metapress.com/app/home/main.asp?referrer=backto “Case Study: Assessing Students’ Learning in Introduction to Art.” College Teaching 48.3 (Summer 2000): 90-94. JSTOR. Web. 24 March 2010. Though there is still not a lot of literature and other assessment projects, the rubric developed for the HAP project by Anne Dvorak is in keeping with some of these other institutes’ holistic humanities rubrics. 9 Humanities Assessment Project 2011 Results Overview The Humanities Assessment Project has undergone a number of iterations since 2001. Originally conceived as a 40-minute timed essay response, it has changed to an essay involving the writing process and it requires a final eight to ten page documented essay on a particular prompt. The assumption is that a more involved, capstone-like measure would be the best means of demonstrating significant competence for the outcome. The structure for assessing this measure has changed relatively little since the inception of the assessment tool. Faculty across the humanities disciplines will be normed to a four-point scale and evaluate the essays holistically in a double-blind technique. Prior iterations of the tool have shown problematic inter-rater reliability. The goal for this iteration is to try to regularize the prompt and perhaps tweak the scoring rubric to make scoring easier across the disciplines. The data sample at present comes from Longview and from multiple disciples (i.e., Music, Literature, Art, etc.). A total of 100 students participated in this assessment during 2011 spring semester. The prompt asked the following Humanities Question: Identify and articulate the aesthetic standards used to determine if a piece of music, art or literature is a masterpiece or classic. This measure was derived from the MCC outcome for the humanities, specifically subcategories 2 and 5, described as follows: Humanities Component Courses in MCC’s general education component will provide opportunities for students to develop their understanding of the ways in which humans have addressed their condition through imaginative work in creative art and speculative thought; to deepen their understanding of how that imaginative process is informed and limited by social, cultural, linguistic, and historical circumstances; and to recognize that the virtual world of the creative imagination is a form of knowledge different from, but as important as, empirical knowledge. Students will be able to: 2: identify aesthetic standards used to make critical judgments. 5: articulate a response to participation in, or observance of, works in the arts and humanities based upon aesthetic standards. 10 Humanities Assessment Project 2011 Results There were two readings for each packet of papers. Papers were scored by non-content instructors using the following 4-point rubric: Conceptual Analytical Terminology Provides strong, effective and detailed evidence which is significant and appropriate to the chosen interpretation; Interprets evidence in a sophisticated manner to support a thesis; can use self evident arguments and can go beyond the self evident to explain and support an arguments with relevant evidence. Uses specific evidence but it may not be singularly appropriate or significant; Interpretations are sporadic, uses inappropriate evidence or simply restates classroom observations. Does not always provide evidence when needed; significant interpretations are missing; arguments do not stand without generous reading between the lines and assumptions of understanding by reader Summarizes or describes a work with no attempt to interpret or analyze; Discussion is disconnected completely from referenced criteria, theory or cultural context Can use subject terminology consistently and accurately. Can redefine terms in their own words; Can recognize the historical or cultural origin of terms 4. Outstanding Can define in detail appropriate aesthetic criteria, theory or cultural context; can use aesthetic criteria, theory or cultural context accurately; can synthesize ideas beyond restatements of the classroom context. 3. Acceptable Can define aesthetic criteria, theory or cultural context in general terms; can recall and restate classroom arguments which use aesthetic criteria, theory or cultural context 2. Borderline Attempts to define aesthetic criteria, theory or cultural context in a vague way; vague understanding of aesthetic criteria, theory of cultural context causes a misreading of work under consideration 1. Unacceptable Does not define, identify or apply aesthetic criteria, theory or cultural context; grossly misreads work with arbitrary and unsupported responses Terminology is present but not used consistently and is applied only partially; Understanding is implied but not clearly evident; restates verbatim classroom experience. Attempts to use terminology but may misapply or only display partial understanding; Terminology is vague and inconsistent No relevant terminology used; no appreciation of the need of clear or specific terminology; terminology grossly misused 11 Humanities Assessment Project 2011 Results Raters for the material were non-content faculty members divided into groups that were designated by colors; e.g., Red, Green, etc. The scoring will be a double-blind, so that faculty do not know the other scorer or the other score; they also will not know which instructor taught the class. During the 2005 HALP assessment, there were some issues regarding inter-rater reliability. A detailed description of the inter-reliability issue can be found in the 2005 HALP report located under reports on the MCC Assessment Office web page. For this administration, all instructors required training in the appropriate question/prompt development. The inter-rater reliability for the scoring groups was conducted using kappa. Kappa is a measure of inter-rater agreement that tests if the counts in cells of a scoring activity differ from those expected by chance alone. The important thing to remember about kappa is that the higher the kappa score the more closely equivalent the two ratings are for the same piece of writing. If writings require third readings then the kappa value will decrease accordingly. As a measure, kappa values follow an interpretative pattern of: .75 or higher indicates Excellent Agreement; .40 to .75 is Fair to Good Agreement; and .40 or less indicates Poor Agreement. Distribution of Scores Figure 1 depicts the distribution of scores for the first group of raters. As can be observed, the distribution follows a normal curve. The mean score for the first group is 2.83 with a standard deviation of 0.8. This can be compared to the data from 2005 that showed a mean of 2.27, a standard deviation of 0.8 and total number of subjects of 195. Figure 1 Reader 1 Score Distribution 50 46 45 Mean = 2.83 SD = 0.81699627 N = 100 40 Frequency 35 28 30 25 21 20 15 10 5 5 0 0 1.00 2.00 3.00 4.00 5.00 Read 1 Scores 12 Humanities Assessment Project 2011 Results Figure 2 shows the distribution for the second set of scores. It also follows a normal distribution. The mean for these scores is 3.04 with a standard deviation of 0.8, very similar characteristics of the first group of scores shown above. The 2005 data showed a mean of 2.35, a standard deviation of 0.9 and an N of 195. Figure 2 Reader 2 Score Distribution 50 46 45 Mean = 3.04 SD = 0.811419 N = 100 40 Frequency 35 31 30 25 19 20 15 10 5 4 0 0 1.00 2.00 3.00 4.00 5.00 Read 2 Scores Inter-rater Reliability In order to examine the inter-rater reliability, we compared each combination of raters and conducted a cross tabulation of their ratings. These cross tabulations allow the readers to visualize the instances in which two raters agree for each score given. The coefficient kappa, which is normally used to establish the level of inter-rater reliability, was used only on some of the cases. In order to use kappa, there needs to be a parallel set of scores for each rater (i.e., if one rater gave a score of 4 and the other rater gave 1, 2 or 3, there is no parallel comparison). See discussion regarding kappa above. For the purposes of this assessment, the names of the raters have been replaced by colors so that the raters remain anonymous. Table 1 shows the kappa value of all participants using two-rater scoring. This table depicts the level of agreement between the scores given by all combinations of raters. As can be observed in this table, on 3 occasions, both scorers gave a score of 1; 13 instances of both giving a 2; 29 cases of agreeing on a score of 3; and 14 instances of both scorers agreeing on a score of 4; indicating 100 instances of agreement. Of the 10 student papers scored by the 11 sets of scorers, the kappa value for the scoring is .386 (p<0.001), 95% 13 Humanities Assessment Project 2011 Results CI (0.504, 0.848). The pairing of each packet was analyzed and if there was a confidence interval of at least 95%, the kappa score was provided. The interpretative pattern would rate this under the .40 or less which indicates Poor Agreement. Table 1 All Participants HAP 2011 Kappa First 1 3 2 0 0 5 Second Rater Score 1 2 3 4 Total Rater 2 1 13 11 3 28 Score 3 0 3 29 14 46 4 0 1 6 14 21 Total 4 19 46 31 100 NOTE: Kappa = .386 Table 2 depicts the first set of raters – Pink and Red, Packet A. Table 2 shows the level of agreement between the scores given by Pink and Red. The darkened areas on the table show where no ratings were given for specific score combinations. The sample size and clustering of scores resulted in no kappa score calculated. Table 2 Pink-Red Raters A Packet Second Rater Score 1 2 3 4 Total First 1 1 1 Rater 2 2 2 2 Score 3 4 3 1 4 0 Total 1 1 5 1 8 NOTE: Kappa cannot be calculated. Table 3 depicts the level of agreement between the scores given by Blue and Green, Packet B. The darkened areas on the table show where no ratings were given for specific score combinations. Kappa was not calculated due to the small number of scores. When raters evaluate a small number of papers, the sparse data rule (having cells with no values) becomes more of an issue. 14 Humanities Assessment Project 2011 Results Table 3 Yellow-Violet, Packet B First 1 Second Rater Score 1 2 3 4 Total Rater 2 Score 3 1 1 4 1 5 4 Total 1 1 2 1 6 1 8 NOTE: Kappa cannot be calculated Table 4 depicts the level of agreement between scores given by Violet and Yellow, Packet C. As can be observed, the raters agree 5 times; that is, provide identical ratings for the same paper. Kappa was not calculated. Table 4 Violet-Yellow Raters, Packet C First 1 Second Rater Score 1 2 3 4 Total Rater 2 Score 3 2 2 1 2 4 3 4 Total 1 1 2 3 5 1 9 NOTE: Kappa cannot be calculated. Table 5 depicts the level of agreement between scores given by Red and Violet, Packet D. As can be observed, the raters agree 6 times; that is, provide identical ratings for the same paper. Consequently, the kappa value was calculated for this pair with a rating of .491 (p<0.037). This would indicate Fair to Good Agreement. Table 5 Red-Violet Raters, Packet D Second Rater Score 1 2 3 4 Total First 1 Rater 2 Score 3 4 Total 1 2 3 2 3 4 9 2 2 2 2 4 NOTE: Kappa = .491. 15 Humanities Assessment Project 2011 Results Table 6 depicts the level of agreement between scores given by Yellow and Pink, Packet E. This group had a very low number of participants (N=5) so Kappa would be inconclusive and was not calculated Table 6 Yellow-Pink, Packet E Second Rater Score 1 2 3 4 Total First 1 1 Rater 2 1 1 1 Score 3 2 4 2 Total 2 1 2 2 5 NOTE: Kappa cannot be calculated due to sparse cells. Table 7 depicts the level of agreement between scores given by Yellow and Violet, Packet F. Only 4 ratings of the 10 subjects matched so Kappa cannot be calculated. Table 7 Yellow-Violet Raters, Packet F Second Rater Score 1 2 3 4 Total First 1 Rater 2 2 2 Score 3 4 Total 2 4 6 2 2 4 6 10 NOTE: Kappa cannot be calculated due to sparse cells. Table 8 depicts the level of agreement between scores given by Pink and Yellow, Packet G. Again, due to sparse cells, Kappa cannot be calculated. Table 8 16 Humanities Assessment Project 2011 Results Pink-Yellow Raters, Packet G First 1 Second Rater Score 1 2 3 4 Total Rater 2 Score 3 2 2 2 3 4 5 4 Total 1 4 6 10 NOTE: Kappa cannot be calculated due to sparse cells. Table 9 depicts the level of agreement between scores given by Dark Blue and Green, Packet H. Even though this packet of raters agree 5 times out of 10 scores given, kappa could not be calculated due to sparse cells. Table 9 Dark Blue-Green Raters, Packet H Second Rater Score 1 2 3 4 Total NOTE: Kappa cannot be calculated. First 1 1 1 Rater 2 Score 3 1 1 2 2 4 2 4 Total 1 2 3 2 4 4 10 Table 10 depicts the level of agreement between scores given by Dark Blue and Red, Packet I. As can be observed, this pair of readers scored a small number of papers. However, because of the clustering of their respective score ratings, a kappa value of .582 (p< 0.002), or Fair to Good agreement was calculated for this set of ratings. Table 10 Dark Blue-Red Raters, Packet I Second Rater Score 1 2 3 4 Total NOTE: Kappa = .582 First 1 1 Rater 2 Score 3 4 1 3 1 2 2 3 4 5 Total 1 1 3 6 11 17 Humanities Assessment Project 2011 Results Table 11 depicts the level of agreement between scores given by Green and Violet, Packet J. However, because of the clustering of their respective score ratings, a kappa value of .457 (p<0.025), or Fair to Good agreement was calculated for this set of ratings. Table 11 Green-Violet Raters, Packet J First 1 Second Rater Score 1 2 3 4 Total Rater 2 Score 3 2 1 1 4 1 3 1 5 4 Total 2 2 3 4 4 11 NOTE: Kappa = .457. Table 12 depicts the level of agreement between scores given by Green and Yellow, Packet K. Because of the asymmetrical score distribution, a kappa value cannot be calculated. Table 12 Green-Yellow Raters, Packet K Second Rater Score 1 2 3 4 Total NOTE: Kappa cannot be calculated. First 1 Rater 2 Score 3 4 3 3 4 1 5 1 1 Total 3 5 1 9 Table 13 depicts the level of agreement between Dark Blue, and Green raters. When there was a discrepancy of at least 2 scores between the raters, a third rater examined the paper. This table has the third raters score substituted for the score of the score that is most unlike the other two. This raised the Kappa score from .582 to .847 (p<0.001). 18 Humanities Assessment Project 2011 Results Table 13 Dark Blue-Red-Green Raters, Packet I Second Rater Score 1 2 3 4 Total NOTE: Kappa = .847 First 1 1 1 Rater 2 Score 3 1 3 1 3 4 6 6 Total 1 1 3 6 11 Table 14 depicts the level of agreement between scores given by Green, Violet and Yellow, Packet J. Yellow was the third rater and adding this adjustment changed the kappa value of .457 (p<0.025) to kappa value .588 (p<0.005). This indicates Excellent Agreement. Table 14 Green-Violet-Yellow Raters, Packet J Second Rater Score 1 2 3 4 Total First 1 Rater 2 2 1 3 Score 3 1 3 1 5 4 Total 3 3 3 4 4 11 NOTE: Kappa = .588. Table 15 shows the kappa value of all participants using three-rater scoring. Packets I and J had discrepancies in the scoring where the results were more than 2 points from each other. Four student papers were scores by a third rater. With these adjustments, the kappa value went from .386 (p<0.001) to kappa value .427 (p<0.001) – Fair to Good Agreement. Table 15 All Participants HAP 2011 Kappa Second Rater Score 1 2 3 4 Total First 1 3 2 0 0 5 Rater 2 1 13 12 0 28 Score 3 0 3 29 14 46 4 0 0 6 17 21 Total 4 19 46 31 100 NOTE: Kappa = .427 19 Humanities Assessment Project 2011 Results Summary of Rating Analysis This cycle of ratings for HAP was based on 100 subjects and had a Mean value of 2.935 and a standard deviation of .8. This can be compared to the ratings from 2005 HALP is based on 195 scorings representing a Mean HALP value of 2.32 and a standard deviation of 0.9. It appears that students earned higher ratings in 2011 than in 2005 which may indicate better understanding of the material. Eleven different pairs of raters scored the material. Raters were assigned a color in order to maintain anonymity. Inter-rater reliability was obtained by using the coefficient kappa. Kappa is a symmetrical analysis where there needs to be a parallel set of scores for each rater (i.e., if one rater gave a score of 4 and the other rater gave 1, 2 or 3, there is no parallel comparison). Wide dispersion of score ratings and low sample of subjects eliminates the kappa statistic from being calculated. Kappa coefficient for the data set with two raters is .386 and with three raters is .427. Submitted by, Kristy Bishop, Ph.D. Director of Institutional Research and Assessment Metropolitan Community College September 6, 2011 20 Humanities Assessment Project 2011 Results HUMANITIES ASSESSMENT PROJECT SUPPLEMENTAL ANALYSIS Spring Semester 2011 This document is a supplemental analysis of the HAP data collected for Spring 2011. Analysis was conducted on 100 scores. A previous document contained the kappa values focusing on inter-rater reliability. This document examines how groups of students scored on the instrument given specific demographic and academic characteristics. The analysis is conducted using three scores: Reader 1 HAP score, Reader 2 HAP score, and an Average Score based on the average of the score sum of Reader 1 and 2. Each score is presented using the MEAN for that score and its associated standard deviation. Table 1 shows the scores for each HAP Score Group. Notice the similarity of Mean scores and their equivalent standard deviation. Reader 1 and 2 scores are very similar with identical standard deviation. Reader 3 only had 4 scores so the mean and standard deviation are not listed below. Table 1 HAP Scores HAP Score Group MEAN Standard Deviation Reader 1 HAP Score 2.83 .81 Reader 2 HAP Score 3.04 .81 Average HAP Score* 2.94 .82 NOTE: Analysis based of 100 scores.. * Represents a simple average of Reader1 and 2 scores summed. Table 2 shows the HAP scores by Racial/Ethnic Affiliation. As can be seen, White students scored .47 points higher than their Minority counterparts with using the Average HAP score. This difference was .28 in 2005. Regardless of the course (e.g., Reader 1, 2 or Average Score), White students scored higher than their Minority counterparts. Readers are cautioned to avoid over simplification of the difference. The wide variation in number of students (e.g., N=) makes using measures of association of little value. 21 Humanities Assessment Project 2011 Results Table 2 HAP Scores by Race/Ethnic Affiliation Demographic Group MEAN Standard Deviation White Students (N=75) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.91 3.18 3.05 .75 .72 .75 Minority Students (N=25) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.58 2.58 2.58 .95 .91 .93 Race/Ethnic Affiliation: Table 3 shows the variation in HAP scores based on student age. The age distinctions used are those for “Traditional Students”, students aged 24 years or less, and for “NonTraditional Students”, or those students aged 25 years or greater. As can be seen, younger students scored higher than their older counterparts. This is opposite of the results in 2005. As with the analysis for Racial/Ethnic Affiliation, the sample size differences are too great to perform any measure of association that would be meaningful. Table 3 HAP Scores by Age Group Demographic Group MEAN Standard Deviation Students Aged 24 Years or Less (N=83) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.86 3.06 2.96 .81 .83 .77 Students Aged 25 Years or More (N=17) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.71 2.94 2.82 .82 .73 .78 NOTE: Based on an analysis of the Traditional/Non-Traditional Paradigm. 22 Humanities Assessment Project 2011 Results Table 4 examines students that had a COMPASS Write placement score that would have placed them into Developmental English. In 2005, placement in Development English was based on a COMPASS Write score of 64 or less. In 2011, placement in Developmental English was based on a COMPASS Write score of 69 or less. The distinction between Developmental English and non-Developmental English is not very large. This could be based on the small sample size. Table 4 HAP Scores by Write Placement Demographic Group MEAN Standard Deviation Placed into Developmental English (N=25) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.72 2.96 2.84 .87 .82 .86 Not Placed into Developmental English (N=75) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.87 3.07 2.97 .79 .81 .80 NOTE: Placement in Developmental English based on a COMPASS Write Placement score of 69 or less; placement score for placement into ENGL 101 is 70. Table 5 shows an analysis between those students who have completed 40 credit hours, at the time of completing the HAP assessment, and those with less than 40 credit hours. Students who an earned 40 credit hours or more prior to taking the HAP assessment scored slightly higher than the “less than 40” counterparts. 23 Humanities Assessment Project 2011 Results Table 5 HAP Scores by </> 40 Credit Hours Demographic Group MEAN Standard Deviation Earned Less than 40 Credit Hours (N=48) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.77 2.96 2.86 .85 .82 .84 Earned More than 40 Credit Hours (N=52) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.88 3.12 3.00 .78 .80 .80 Table 6 shows the HAP scores by student cumulative GPA. As can be seen, and probably with no surprise, students with a cumulative GPA of 2.00 or higher tend to perform much better on the HAP assessment. In fact, the difference is quite large, .71 scale points. Table 6 HAP Scores by CumGPA Demographic Group MEAN Standard Deviation Cum GPA Less than 2.00 (N=10) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.00 2.60 2.30 .77 1.11 1.00 Cum GPA Greater Than 2.00 (N=90) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.92 3.09 3.01 .76 .75 .76 NOTE: Cum GPA is used as an index of total college experience. A cum GPA of “Less than 2.00” means 1.99 or less. Table 7 shows those students who have enrolled in ENGL 101 prior to taking the HAP assessment. As can be seen there is a small difference in the two sets of scores. According to these data, having completed ENGL 101 does provide students a small score advantage when taking the HAP assessment. 24 Humanities Assessment Project 2011 Results Table 7 HAP Scores by Enrolled ENGL 101 Group MEAN Standard Deviation Haven’t enrolled in ENGL 101 (N=27) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.67 3.04 2.85 .90 .88 .90 Enrolled in ENGL 101 (N=73) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.89 3.04 2.97 .77 .78 .78 Table 8 shows the relative score difference between all students and those that have completed ENGL 102 prior to taking the HAP assessment. As can been seen from the data, students with ENGL 102 experience slightly higher HAP assessment scores than do students who have not completed ENGL 102. Table 8 HAP Scores by Enrolled ENGL 102 Group MEAN Standard Deviation Haven’t enrolled in ENGL 102 (N=36) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.61 2.83 2.72 .92 .83 .88 Enrolled in ENGL 102 (N=64) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.95 3.16 3.05 .72 .77 .75 25 Humanities Assessment Project 2011 Results Table 9 shows the HAP scores for those students who have completed either ENGL 101 or 102 and have taken the HAP assessment. Students with “COMP” experience do earn higher scores than students who have no “COMP” experience. Table 9 HAP Scores by Enrolled in COMP Group MEAN Standard Deviation No COMP (N=15) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.47 2.67 2.57 1.02 .94 .99 Enrolled in COMP* (N= 85) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.89 3.11 3.00 .75 .77 .77 NOTE: * Refers to any tested student that has completed either ENGL 101 or 102 or both. Table 10 shows the variation in HAP scores based on student gender. As can be seen, female students scored higher than male students. As with the analysis for Racial/Ethnic Affiliation, the sample size differences are too great to perform any measure of association that would be meaningful. Table 10 HAP Scores by Gender Demographic Group MEAN Standard Deviation Male Students (N=44) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.68 3.01 2.83 .85 .79 .81 Female Students (N=56) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.95 3.09 3.02 .77 .85 .81 26 Humanities Assessment Project 2011 Results Table 11 shows variation in HAP scores based on student enrollment. Full-Time enrollment is based on the student’s enrollment in 12 or more hours on census date. PartTime enrollment is 11 or fewer hours on census date. As can be seen, full-time students scored higher than part-time students. The differences in the sample size are too great to perform any measure of association that would be meaningful. Table 11 HAP Scores by Full-Time/Part-Time Enrollment Group MEAN Standard Deviation Part-Time Enrollment (N=12) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.41 2.64 2.52 .94 .98 .97 Full-Time Enrollment (N=78) Reader 1 HAP Score Reader 2 HAP Score Average HAP Score 2.95 3.15 3.05 .73 .72 .73 Any questions regarding this report should be directed to: Kristy Bishop, Ph.D. Director of Institutional Research and Assessment September 26, 2011 27