G E A

advertisement
G ENERAL E DUCATION
A SSESSMENT
Spring 2010 Report
Prepared by
Dr. Linda Siefert
General Education Assessment Director
August 2010
This page purposely blank.
ii
Acknowledgement
I would like to acknowledge the help and support of the following people in the assessment
efforts for Spring 2010:
Dr. Ken Spackman, Director of University Planning
Dr. Carrie Clements, Director of the Center for Teaching Excellence
Dr. Abdou Ndoye, Assessment Director, Watson School of Education
Erin Danielle Cooke, Graduate Assistant, Department of Psychology
Robert Wilcox and Judy Kinney, Office of Institutional Research and Assessment
iii
This page purposely blank.
iv
TABLE OF CONTENTS
EXECUTIVE SUMMARY .......................................................................................................................... 1
BACKGROUND AND SCOPE ................................................................................................................... 3
METHODOLOGY .................................................................................................................................. 5
ASSESSMENT TOOLS ........................................................................................................................ 6
SAMPLE SELECTION ......................................................................................................................... 6
SCORING ........................................................................................................................................ 7
RESULTS ............................................................................................................................................ 9
A NOTE ON QUANTITATIVE AND QUALITATIVE DATA .......................................................................... 9
THOUGHTFUL EXPRESSION (WRITTEN COMMUNICATION) ................................................................. 11
INQUIRY ....................................................................................................................................... 17
CRITICAL THINKING ...................................................................................................................... 22
SOCIAL SCIENCE FOUNDATIONAL KNOWLEDGE ................................................................................. 27
COMPARISON OF SCORES FROM TWO RUBRICS ................................................................................. 28
RELIABILITY OF SCORES ................................................................................................................. 30
SCORER FEEDBACK ....................................................................................................................... 34
INSTRUCTOR FEEDBACK ................................................................................................................. 38
DISCUSSION, LIMITATIONS AND RECOMMENDATIONS ............................................................................. 39
THOUGHTFUL EXPRESSION (WRITTEN COMMUNICATION) ................................................................. 39
INQUIRY ....................................................................................................................................... 41
CRITICAL THINKING ...................................................................................................................... 43
SOCIAL SCIENCE FOUNDATIONAL KNOWLEDGE ................................................................................. 44
RELATIVE STRENGTHS AND WEAKNESSES ACROSS RUBRICS ............................................................... 45
METHODOLOGY AND PROCESS........................................................................................................ 46
LIMITATIONS ................................................................................................................................ 47
RECOMMENDATIONS ...................................................................................................................... 48
REFERENCES .................................................................................................................................... 49
APPENDIX A RUBRICS USED ............................................................................................................... 51
APPENDIX B DIMENSION MEANS AND STANDARD DEVIATIONS ................................................................ 57
APPENDIX C RESULTS BY COURSE ...................................................................................................... 59
APPENDIX D DETAILED SCORER FEEDBACK .......................................................................................... 63
APPENDIX E CORRELATIONS BETWEEN RUBRIC DIMENSIONS.................................................................. 65
v
LIST OF TABLES
Table 1 Written Communication Score Results ……………………………………………………………………. 12
Table 2 Distribution of Scores for Written Communication …………………………………………………. 13
Table 3 Distribution of Scores by Gender …………………………………………………………………………….. 16
Table 4 Inquiry Rubric Score Results ……………………………………………………………………………………. 18
Table 5 Distribution of Scores for Inquiry, Applicable Scores Only ……………………………………….. 19
Table 6 Critical Thinking Score Results …………………………………………………………………………………. 23
Table 7 Distribution of Scores for Critcal Thinking, Applicable Scores Only ………………………….. 24
Table 8 Foundational Knowledge Score Results …………………………………………………………………... 27
Table 9 List of Statistically Significant Correlations across Rubrics …..…………………………………… 29
Table 10 Interrater Reliability .……………………………………………………………………………………………… 33
Table 11 Scorer Feedback on Process ………………………………………………………………………………….. 35
Table 12 Written Communication Percent of Sample Scored at Least 2 and at Least 3 ……….. 39
Table 13 Inquiry Percent of Sample Scored at Least 2 and at Least 3 ………………………………….. 41
Table 14 Critical Thinking Percent of Sample Scored at Least 2 and at Least 3 …………………….. 43
Table 15 Foundational Knowledge Percent of Sample Scored at Least 2 and at Least 3 ………. 45
Table B1 Means and Standard Deviations for Each Rubric Dimension …………………………………. 57
Table C1 Written Communication Results by Course …………………………………………………………… 60
Table C2 Inquiry Rubric Score Results by Course …………………………………………………………………. 61
Table C3 Critical Thinking Score Results by Course ………………………………………………………………. 62
Table E1 Correlation between Dimensions ………………………………………………………………………….. 66
Table E2 Correlation between Dimensions ………………………………………………………………………….. 67
LIST OF FIGURES
Figure 1 Distribution of Scores for Written Communication ………………………………………………… 13
Figure 2 Distribution of Scores for Inquiry, Applicable Scores Only ……………………………………… 19
Figure 3 Distribution of Scores for Critcal Thinking, Applicable Scores Only …………………………. 24
vi
EXECUTIVE SUMMARY
This report provides the results of the General Education Assessment efforts for Spring 2010.
The processes used were recommended by the General Education Assessment Committee in its
March 2009 Report. Three UNCW Learning Goals were assessed using AAC&U VALUE
Rubrics: Thoughtful Expression, Inquiry, and Critical Thinking. In addition, a locally created
rubric for Foundational Knowledge was piloted. The sample consisted of 293 student work
products from the following Basic Studies courses: ENG 201, FST 210, MUS 115, PSY 105, and
SOC 105.
RESULTS FOR THOUGHTFUL EXPRESSION (WRITTEN COMMUNICATION)
The median score for all five dimensions was 2 on the 4-level scale (with level 4 the expectation
for UNCW graduates). Work products were strongest on the dimension WC1 Context and
Purpose for Writing. Work products were weakest on the dimensions WC3 Genre and
Disciplinary Conventions and WC4 Sources and Evidence. There were significant differences
between the results for females and males. Scores were higher for term papers than for in-class
test questions, except on the dimension WC5 Control of Syntax and Mechanics.
RESULTS FOR INQUIRY
Three of the dimensions were considered not applicable by scorers for some assignments.
The median score for five of the six dimensions was 2 on the 4-level scale (with level 4 the
expectation for UNCW graduates). The median was 3 for the dimension IN2 Existing
Knowledge, Research, and/or Views. Work products were strongest on the dimension IN2
Existing Knowledge, Research, and/or Views, and weakest on the dimension IN6 Limitations
and Implications.
RESULTS FOR CRITICAL THINKING
All dimensions of the rubric were considered not applicable for at least one assignment. The
median score for three of the five dimensions was 2 on the 4-level scale (with level 4 the
expectation for UNCW graduates). The median was 1 for the other two dimensions. Work
products were strongest on the dimensions CT1 Explanation of issues and Evidence. Work
products were weakest on CT3 Influence of context and assumptions and CT5 Conclusions and
related outcomes. Scores were higher on term papers than in-class test questions on all
dimensions except CT1 Explanations of Issues.
1
OTHER FINDINGS
The results for the pilot of the Foundational Knowledge rubric were inconclusive. It was
determined that the student work products were collected too early in the semester to provide an
accurate measure of student knowledge of discipline terminology and concepts.
Interrater reliability was measured using a number of statistical methods. While only 3 of the 16
dimensions across all three rubrics met the benchmark chosen, the findings were promising for
the first use of the rubrics. Additional exposure to and use of the rubrics, along with enhanced
training, should improve interrater reliability in the future.
PROCESS FEEDBACK
Instructor and scorer feedback was gathered for all steps in the process. Both instructors and
scorers had a high level of satisfaction with the process. Two scorers suggested that more
training would be helpful. Scorers also provided valuable feedback on aspects of the rubrics, and
this feedback will be used to make modifications to the rubrics.
RECOMMENDATIONS
Based on the analysis of the findings from the student work products sampled and of the
participant feedback, the following recommendations were made by the Learning Assessment
Council.
•
•
•
•
•
Levels of expected performance at the basic studies, or lower division, level should be
developed for each rubric.
Additional exposure to the content of and rationale for the UNCW Learning Goals should
be provided to increase faculty ownership and awareness of these Goals. The LAC will
ask the Center for Teaching Excellence to provide a workshop series on these Goals. The
LAC will ask the University Curriculum Committee to consider actions in this area.
To increase student exposure to the writing process, the Writing Intensive component of
University Studies should be implemented by Fall 2012.
Modifications and improvements to the general education assessment process should be
made as needed, including the following: modify rubrics based on feedback, develop
benchmarks work products, and enhance instructor and scorer workshops.
Long term implementation schedule should provide flexibility for targeting additional
sampling for specific learning goals that are characterized by ambiguous or unclear
assessment results. For 2010 – 2011, Critical Thinking will be sampled for this purpose.
2
BACKGROUND AND SCOPE
Before discussing General Education Assessment, it is important to understand what we mean by
General Education. General Education is most often thought of as the curriculum requirements
outside of the majors that expose students to foundational knowledge across the disciplines and
to a variety of ways of thinking about and exploring the world. General Education can also be
thought of in terms of broad learning outcomes, the knowledge and set of abilities that are
needed by citizens and workers throughout their lives. Examples of these broad learning
outcomes are the ability to think critically and the ability to communicate thoughtfully and
clearly. In terms of these broad learning outcomes, General Education is any curriculum or
formal experience that provides students opportunities to practice and eventually master these
abilities. Through this lens, General Education becomes broader than “the Gen Ed curriculum” to
also include any work within the college experience than helps cultivate the set of General
Education learning outcomes. Taking this broader perspective, UNCW has adopted nine
Learning Goals. These nine UNCW Learning Goals are Foundational Knowledge, Inquiry,
Information Literacy, Critical Thinking, Thoughtful Expression, Second Language, Diversity,
Global Citizenship, and Teamwork. For each learning goal, detailed learning outcomes are
described for both basic studies and the characteristics of UNCW graduates.
In August 2008 Provost Brian Chapman created and charged a General Education Assessment
Committee with designing assessment mechanisms for the current Basic Studies structure (as it
appeared in the 2008-09 Undergraduate Catalogue) using the Faculty Senate-approved learning
outcomes for general education. After collaborating with the University Studies Advisory
Committee, which was drafting Revising General Education at UNCW, and designing and
administering an information gathering survey of faculty teaching Basic Studies courses, the
committee presented a Report of the General Education Assessment Committee to the Provost in
March 2009. Key findings and recommendations included in that report were:
•
•
•
•
•
An alignment between the UNCW Learning Goals and basic studies component common
student learning outcomes;
An estimation of the fit of the University Studies component common student learning
outcomes to the Basic Studies courses, based on a faculty survey;
A recommendation to use student work products from assignments embedded in normal
basic studies coursework to assess student learning;
A recommendation to use the AAC&U VALUE Rubrics for the Learning Goals
Information Literacy, Critical Thinking, Thoughtful Expression (written), and Inquiry;
A recommendation to implement a three-year recurring cycle for assessing the nine
UNCW Learning Goals, with a recommended schedule from Fall 2009 through Fall
2010.
3
In late Spring 2009, three members of the General Education Assessment Committee performed
a Pilot Assessment of Basic Studies: Critical Thinking, Inquiry and Analysis, and Written
Communication. The main purpose of the pilot was to test the process recommended by the
committee. Their report outlined additional recommendations about the process.
During Fall 2009, it was determined that the College of Arts and Sciences Director of
Assessment would be responsible for implementing general education assessment at the basic
studies level in Spring 2010. Based on the recommended implementation schedule, the following
UNCW Learning Goals were assessed: Thoughtful Expression, Inquiry, and Critical Thinking.
This report outlines the methodology of and findings from that study. In the scheme of all
general education assessment at UNCW, this report provides useful information on the abilities
of UNCW students during their basic studies work as seen through course-embedded
assignments.
4
METHODOLOGY
The purpose of the general education assessment activities in Spring 2010 was to examine the
following questions:
•
•
•
•
What are the overall abilities of students taking basic studies courses with regard to the
UNCW Learning Goals of Thoughtful Expression, Inquiry, and Critical Thinking?
What are the relative strengths and weaknesses within the subskills of those goals?
Are there any differences in performance based on demographic and preparedness
variables such as gender, race or ethnicity, transfer students vs. freshman admits, honors
vs. non-honors students, total hours completed, or entrance test scores?
What are the strengths and weaknesses of the assessment process itself?
A final purpose was to pilot test a Social Science Foundational Knowledge rubric.
UNCW has adopted an approach to assessing its Learning Goals at the basic studies level that
uses assignments that are a regular part of the course content. A strength of this approach is that
the student work products are an authentic part of the curriculum, and hence there is a natural
alignment often missing in standardized assessments. Students are motivated to perform at their
best because the assignments are part of the course content and course grade. The assessment
activities require little additional effort on the part of course faculty because the assignments
used are a regular part of the coursework. An additional strength of this method is faculty
collaboration and full participation in both the selection of the assignments and the scoring of the
student work products.
The student work products collected are scored independently on a common rubric by trained
scorers. The results of this scoring provide quantitative estimates of students’ performance and
qualitative descriptions of what each performance level looks like, which provides valuable
information for the process of improvement. The normal disadvantage to this type of approach
when compared to standardized tests is that results cannot be compared to other institutions. This
disadvantage is mitigated in part by the use of the AAC&U VALUE rubrics for many of the
Learning Goals. This concern is also addressed by the regular administration of standardized
assessments, in particular, the CLA and the ETS Proficiency Profile, giving the university the
opportunity to make such comparisons.
5
ASSESSMENT TOOLS
For three of the four UNCW Learning Goals assessed, Association of American Colleges and
Universities (AAC&U) Valid Assessment of Learning in Undergraduate Education (VALUE)
rubrics were used:
• for Thoughtful Expression, the VALUE Written Communication rubric was used;
• for Inquiry, the VALUE Inquiry and Analysis rubric was used; and
• for Critical Thinking, the VALUE Critical Thinking rubric was used.
The VALUE rubrics, part of the AAC&U Liberal Education and America’s Promise (LEAP)
initiative, were developed by over 100 faculty and other university professionals. Each rubric
contains the common dimensions and most broadly shared characteristics of quality for each
dimension. A locally created rubric was piloted for assessing Foundational Knowledge in the
Social Sciences. Appendix A contains the versions of each of the rubrics that were used in the
study.
SAMPLE SELECTION
The sampling method used lays the foundation for the generalizability of the results. As
mentioned in the introduction, no one part of the basic studies curriculum, nor for that matter no
one part of the university experience, is solely responsible for helping students to write well,
think critically, or conduct responsible inquiry and analysis. These skills are practiced in many
courses. The Fall 2008 survey helped determine which basic studies courses are most appropriate
for assessing each of these goals. For this first round of assessment, five basic studies courses
which are taken by a large number of students were selected, in order to represent as much as
possible the work of “typical” UNCW students. Within each course, sections were divided into
those taught by tenure-line and non-tenure-line faculty, those taught in the classroom and online,
and honors and non-honors. Within each subgroup, sections were selected randomly in quantities
that represent as close as possible the overall breakdown of sections by these criteria. Thirteen
sections were selected in all. Within each section, all student work products were collected, and
random samples of the work products were selected.
Prior to the start of the semester, the CAS Director of Assessment met with course instructors to
familiarize them with the VALUE rubrics. Instructors were asked to review their course content
and assignments, and to select one assignment that they felt fit the dimensions of at least one of
the rubrics.
6
Each student filled out a Student Work Product Cover Sheet, which acknowledged the use of
their work for the purpose of general education assessment. These cover sheets were removed
before scoring. The name and student ID information on the cover sheets was matched with
student demographic information in university records for the purpose of analysis based on
demographic and preparedness variables.
SCORING
REVIEWER RECRUITMENT AND SELECTION
Reviewers were recruited from UNCW faculty across the college and all schools. A recruitment
email was sent to all department chairs on February 11, 2010, with a request that it be forwarded
to all department faculty. The desire was to include reviewers from a broad spectrum of
departments. The intent was to give faculty who do not teach in departments that offer basic
studies courses the opportunity to see the work being by students in the general education
courses, and faculty who teach upper-level courses, such as capstone courses, within departments
that do offer general education courses the opportunity to see the learning students experience as
they begin their programs, as well as faculty who do teach basic studies courses. It was also
important to have a least one faculty member from each of the departments from which student
work products were being reviewed.
SCORING PROCESS
Metarubrics, such as the VALUE rubrics, are constructed so that they can be used to score a
variety of student artifacts across disciplines, across universities, and across preparation levels.
But their strength is also a weakness: the generality of the rubric makes it more difficult to use
than a rubric that is created for one specific assignment. To address this issue, a process must be
created that not only introduces the rubric to the scorers, but also makes its use more
manageable.
Volunteer scorers initially attended a two-hour workshop on one of the three rubrics (Written
Communication, Inquiry and Analysis, or Critical Thinking). During the workshop, scorers
reviewed the rubric in detail and were introduced to the following assumptions adopted for
applying the rubrics to basic studies work products.
Initial assumptions
1. Each rubric can be used across all school years (freshman to senior), with Level 4
Capstone representing the characteristics we want the work of UNCW graduates to
demonstrate.
2. When scoring, we are comparing a particular work product to the characteristics we want
the work of UNCW graduates to demonstrate.
7
3. A main purpose of the scoring is to determine the relative strengths and weaknesses of
our students. Therefore it is important to look for evidence for each dimension of the
rubric separately, and not score the work products holistically (i.e. tend towards one score
for all dimensions).
4. The instructor’s directions about the assignment should guide the scorer’s interpretation
of the rubric dimensions.
5. Other assumptions will need to be made when each rubric is used to score individual
assignments. For example, a dimension may not fit a particular assignment.
After reviewing the rubric and initial assumptions, the volunteers read and scored 3 – 4 student
work products. Scoring was followed by a detailed discussion, so that scorers could better see the
nuances of the rubric and learn what fellow scorers saw in the work products. From these
discussions, assumptions began to be developed for applying the rubric to each specific
assignment.
The work on common assignment-specific assumptions or guidelines was continued on the day
of scoring. Scorers were assigned to groups of 2, 3, or 4. Scoring of each assignment began with
the group scoring one student work product together and discussing their individual scores.
Discussion clarified any implicit assumptions each scorer had used in scoring the first work
product. From that discussion, each group created any assignment-specific assumptions that they
would use for scoring the rest of the set of assignments. After completing a packet of work
products, each scorer completed a rubric feedback form and turned in the assignment-specific
assumptions used by the group. The feedback form asked for information on how well each
rubric dimension fit the assignment and student work. It also asked for feedback on the quality
criteria for each dimension. Scorers were also asked to complete an end-of-day survey to provide
feedback on the entire process.
In order to measure the consistency of the application of the rubric, additional common work
products were included in each packet for statistically measuring interrater reliability.
8
RESULTS
A NOTE ON QUANTITATIVE AND QUALITATIVE DATA
Quantitative data are numerical scores, such the number of correct questions on a test.
Qualitative data are verbal summaries, such as the oral or written feedback professors provide on
an assignment. Rubrics combine aspects of both qualitative data and quantitative data. They
contain detailed descriptions of the quality criteria for each level on the scoring continuum.
When a student work product is scored, it is compared to these criteria and categorized into the
level that best matches the features of the work. The levels or categories are often designated
with labels such as Novice, Developing, Apprentice, and Expert. Sometimes the levels are
simply numbers. With or without the use of numbers, the levels usually represent an ordering of
categories, but the categories are not equally spaced along a number line. Although a level 2 is
considered higher, or larger, than a level 1, it is not proper to assume that a student that scores at
a level 2 or Developing is twice as knowledgeable as a student who scored at a level 1 or Novice;
nor can we assume that, whatever the difference is between these two categories, that it is exactly
the same as the difference between levels 2 and 3. For this reason, these ordinal data do not yield
valid mean scores. And averages are not what we’re interested in anyway. When we analyze the
results of an assessment effort, we want to determine if we are satisfied with the demonstrated
knowledge and skills of our students. Rather than describing the average student, we want to
discover what percent of our students are below, meet, or exceed our expectations. In this report,
score results are given in percentage of students scored at each level. The nature of the data also
requires the use of non-parametric tests of significance. Means and standard deviations are often
provided by researchers for non-interval level data. This information is given in appendix Table
B1, as it may be helpful to some readers as a starting point to suggest further investigation using
statistical methods appropriate to ordinal data.
As previously mentioned, one of the assumptions of our use of the VALUE Rubrics is that the
Level 4 Capstone describes the qualities of understanding that we want our graduating seniors to
demonstrate. We have not defined as an institution yet our minimum expectations for our first
and second year students, the predominate group taking basic studies courses. After this initial
project, we might be in the position to set reasonable expectations.
9
DESCRIPTION OF SAMPLE
DESCRIPTION OF COURSES
A total of 302 student work products were sampled from the 13 assignments collected. However,
the cover sheet information could not be matched with Banner data for nine of them. After
removal of these work products from the sample, a total of 293 student work products were
scored from the following 5 courses:
• ENG 201 College Writing and Reading II (4 sections, one taught by tenure-line faculty,
one by a full-time lecturer, 2 by part-time instructors)
• FST 210 Moviemakers and Scholars Series (1 section taught by a full-time lecturer)
• MUS 115 Survey of Music Literature (3 sections, two taught by tenure-line faculty, one
by a part-time instructor; one honors section)
• PSY 105 General Psychology (3 sections, 2 taught by tenure-line faculty, 1 by a part-time
instructor)
• SOC 105 Introduction to Sociology (3 sections, taught by two tenure-line faculty; two
online sections)
The breakdown of sections taught by tenure-line faculty, lecturers, and part-time faculty are
representative of those breakdowns for the course as a whole.
SAMPLE BY RUBRIC
The total number of work products in the final sample scored using each rubric was:
• Written Communication: 116 work products scored by four scorers
• Inquiry: 98 work products scored by four scorers
• Critical Thinking: 183 work products scored by seven scorers
• Foundational Knowledge: 45 work products scored by one scorer
The total number of scores produced was larger than the total number of work products because
27 work products were scored using both the Written Communication and Inquiry rubrics, 37
were scored using both the Written Communication and Critical Thinking rubrics, 40 were
scored using both the Inquiry and Critical Thinking rubrics, and 45 were scored using both the
Critical Thinking and Foundational Knowledge rubrics. No work products were scored using
more than two rubrics.
DESCRIPTION OF STUDENTS
The 293 work products were produced by 288 unique students (five students provided work
products for two different courses). A few Banner records did not contain all demographic
variables of interest, therefore the sample size for any particular variable may be smaller. The
demographic breakdown of the participating students, compared in parenthesis to the overall
undergraduate enrollment for AY 2009-2010 was: 55.0% (59.4%) female; 13.1% (28.0%)
transfer students; 7.7% (6.5%) honors students; 3.4% (4.9%) African American; 0.3% (0.6%)
10
American Indian; 1.0% (1.8%) Asian; 3.0% (3.9%) Hispanic; 2.3% (1.3%) of Multiple race or
ethnicity; 0.3% (0.4%) Non-resident Alien; 83.6% (83.0%) white; and 3.0% (4.1%) listed
unknown or other ethnicity. There were no students of Hawaiian or Pacific Island ethnicity in the
sample, although 0.1% of all students describe themselves of this ethnicity (UNCW OIRA,
2009). The only group that was not representative of all UNCW students is the percent of
transfer students. It is to be expected that transfer students would not be represented
proportionally in basic studies courses.
For those students with SAT score information (223), the mean Total SAT score was (compared
in parenthesis to the overall undergraduate enrollment for AY 2009-2010) 1145.6 (1166), the
mean SAT Math was 583.7 (589), and the mean SAT Verbal was 561.9 (577). For those who
took the ACT college placement test (77), the mean composite score was 23.5, which, like the
SAT scores, is just slightly below the 50% percentile for Fall 2009 freshman (UNCW OIRA,
2010).
The mean total number of credit hours students had completed prior to Spring 2010, 44.9, was
skewed due to a number of outliers with well over 120 total hours (maximum was 191). The
median number of hours was 37. This included both UNCW hours and transfer hours. The
median UNCW hours was 16 (mean 33.8), and the median transfer hours was 3 (mean 11.1).
Broken down into groups, 43.0% had completed between 0 and 29 hours, 32.9% had completed
between 30 and 59 hours, 8.7% had completed between 60 and 89 hours, and 15.4% had
completed 90 or more hours.
THOUGHTFUL EXPRESSION (WRITTEN COMMUNICATION)
At the basic studies level, the UNCW Thoughtful Expression Learning Goal is for students to
demonstrate an ability to express meaningful ideas in writing. For purposes of this Learning
Goal, “Thoughtful Expression is the ability to communicate meaningful ideas in an organized,
reasoned and convincing manner. Thoughtful expression involves a purpose responsive to an
identified audience, effective organization, insightful reasoning and supporting detail, style
appropriate to the relevant discipline, purposeful use of sources and evidence, and error-free
syntax and mechanics” (UNCW Learning Goals, 2009). The VALUE Written Communication
rubric contains five dimensions that are aligned with the UNCW description of Thoughtful
Expression.
SUMMARY OF SCORES BY DIMENSION
Four faculty members scored 116 work products from four courses, ENG 201, FST 210,
MUS115, and PSY 105. Nineteen work products (16.4%) were scored by multiple scorers. Table
1 provides summary information for all work products.
11
Table 1 Written Communication Score Results
Benchmark
WC1 Context of and Purpose for Writing
WC2 Content Development
WC3 Genre and Disciplinary Conventions
WC4 Sources and Evidence
WC5 Control of Syntax and Mechanics
Milestones
Capstone
0
1
2
3
4
NA
1
(.9%)
4
(3.4%)
5
(4.3%)
23
(19.8%)
1
(0.9%)
17
(14.7%)
23
(19.8%)
25
(21.6%)
18
(15.5%)
26
(22.4%)
44
(37.9%)
43
(37.1%)
48
(41.4%)
24
(20.7%)
40
(34.5%)
36
(31.0%)
36
(31.0%)
34
(29.3%)
45
(38.8%)
42
(36.2%)
18
(15.5%)
10
(8.6%)
4
(3.4%)
6
(5.2%)
7
(6.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
All assignments were scored on each dimension (no dimension was consider not applicable for
any assignment). Figure 1 and Table 2 provide additional illustration of the score distributions
for each dimension.
12
WRITTEN COMMUNICATION RESULTS BY DIMENSION
Figure 1 Distribution of Scores for Written Communication
Table 2 Distribution of Scores for Written Communication
0
1
2
3
4
th
25 %tile
50th %tile
75th %tile
Mode
WC1
WC 2
0.9%
3.4%
14.7%
WC 3
WC 4
WC 5
4.3%
19.8%
0.9%
19.8%
21.6%
15.5%
22.4%
34.9%
37.1%
41.4%
20.7%
34.5%
31.0%
31.0%
29.3%
38.8%
36.2%
15.5%
2
8.6%
2
3.4%
1
5.2%
1
6.0%
2
2
2
2
2
2
3
3
3
3
3
2
2
2
3
3
13
RESULTS BY DIMENSION
WC1 Context of and Purpose for Writing
This dimension was the highest scoring Written Communication dimension. Less than one
percent of the work products demonstrated no attention to context, audience, purpose, and to the
assigned task (scores of 0). One in seven work products demonstrated minimal attention to
context, audience, purpose, and to the assigned task (scores of 1). Over one third of work
products demonstrated awareness of the context, audience, purpose, and assigned task (scores of
2). Three in ten work products demonstrated adequate consideration of context, audience, and
purpose, and a clear focus on the assigned task (scores of 3). One in seven work products
demonstrated a thorough understanding of context, audience, and purpose that was responsive to
the assigned task and focused all elements of the work (scores of 4).
WC2 Content Development
Less than one in twenty work products demonstrated no content development (scores of 0). One
in five work products used appropriate and relevant content to develop simple ideas in some
parts of the work (scores of 1). Over one third of the work products used appropriate and relevant
content to develop and explore ideas through the work (scores of 2). Two in five work products
used appropriate, relevant, and compelling content to explore ideas within the context of the
discipline (scores of 3 and 4).
WC3 Genre and Disciplinary Conventions
With the exception of WC4 Sources and Evidence, scores on this dimension were the lowest.
This dimension was the most problematic for scorers, requiring a number of assumptions. All
teams had questions as to where to score use of citations, and all decided to score this under
disciplinary conventions. One in twenty work products demonstrated no attempt to use a
consistent system for organization and presentation (scores of 0). One in five work products
demonstrated an attempt to use a consistent system for basic organization and presentation
(scores of 1). Two in five work products followed expectations appropriate to the specific
writing task for basic organization, content, and presentation (scores of 2). One third of work
products demonstrated consistent use of important conventions particular to the writing task,
including stylistic choices (scores of 3 and 4).
WC4 Sources and Evidence
The scores on this dimension were the lowest of all dimensions, showing very mixed results,
with a large portion of students scoring 0. However, 18 of the 23 scores of zero came from one
assignment, an in-class compare and contrast essay that did not specifically ask for examples.
Including those work products, one in five demonstrated no attempt to use sources to support
ideas (scores of 0). One in seven work products demonstrated an attempt to use sources to
support ideas (scores of 1). One in five work products demonstrated an attempt to use credible
and/or relevant sources to support ideas that were appropriate to the task (scores of 2). More than
14
two in five work products demonstrated consistent use of credible, relevant sources to support
ideas (scores of 3 and 4).
WC5 Control of Syntax and Mechanics
Less than one percent of the work products did not mean the level 1 benchmark (scores of 0).
Almost one fourth of work products used language that sometimes impeded meaning because of
errors in usage (scores of 1). Over one third of work products used language that generally
conveyed meaning with clarity, although writing included some errors (scores of 2). Over one
third of work products used straightforward language that generally conveyed meaning, with few
errors (scores of 3). One in twenty work products used graceful language that skillfully
communicated meaning with clarity and fluency, with virtually no errors (scores of 4).
CORRELATION BETWEEN DIMENSIONS
All dimension scores were correlated with each other at the .01 level, except for the correlation
between WC4 Sources and Evidence and WC5 Control of Syntax and Mechanics, which was
significant at the .05 level. The magnitudes of the correlations range from .193 to .671, with the
highest correlation between WC2 Content Development and WC4 Sources and Evidence. This
finding seems to be appropriate as content development requires the use of appropriate and
relevant content or sources and evidence. See appendix Table E1 for a complete presentation of
correlation coefficients. These large and statistically significant correlations between the scores
on each dimension of the rubric might suggest some “cross scoring,” or lack of independent
scoring on the part of the scorers. They may, however, also simply represent the interdependence
of the components of writing.
DEMOGRAPHIC AND PREPAREDNESS FINDINGS
The most notable finding related to demographic variables is that there was a difference between
the score distributions for males and females, and two of these differences were statistically
significant. Table 3 below illustrates these distributions. There were statistically significant
differences between the distributions of scores for males and females for the dimensions WC2
Content Development and WC5 Control of Syntax and Mechanics, with females scores higher.
Although not statistically significant, the distributions for the rest of the dimensions all
demonstrated females scoring higher than males. (Although only percentages are given, the
sample contained an equal number of work products written by females and males, 58 each).
Many research studies have demonstrated that females score much higher than males on verbal
fluency and basic writing skills (Hockenbury and Hockenbury, 2006, p.426), so these results
should come as no surprise. Keep in mind that scoring was done anonymously.
15
Table 3 Distribution of Scores by Gender
Dimension
WC1
0
Female
0.0%
Male
1.7%
WC2
Female
1.7%
Male
5.2%
WC3
Female
3.4%
Male
5.2%
WC4
Female
17.2%
Male
22.4%
WC5
Female
0.0%
Male
1.7%
*Statistically significant at the .05 level
**Statistically significant at the .01 level
1
6.9%
22.4%
5.2%
34.5%
13.8%
29.3%
10.3%
20.7%
13.8%
31.0%
2
37.9%
37.9%
50.0%
24.1%
43.1%
39.7%
19.0%
22.4%
37.9%
31.0%
3
36.2%
25.9%
34.5%
27.6%
36.2%
22.4%
48.3%
29.3%
37.9%
34.5%
4
19.0%
12.1%
8.6%
8.6%
3.4%
3.4%
5.2%
5.2%
10.3%
1.7%
Chi-square test
for
independence
7.321
19.242**
5.406
5.247
8.913*
There was a significant positive correlation between the number of credit hours completed and
the scores on WC2 Content Development (.312**), WC3 Genre and Disciplinary Conventions
(.247**), and WC4 Sources and Evidence (.255**). The lack of significant correlation between
number of hours completed and WC1 Context and Purpose for Writing and WC5 Control of
Syntax and Mechanics may indicate that these dimensions of writing are not being addressed for
improvement as much as the other three. Despite the fact that scores increased as credit hours
increased, there were still a substantial number of students who had already completed 90 or
more credit hours that did not score at a level of 3 or 4 (45.5% for WC1, 45.5% for WC2, 54.6%
for WC3, 36.4% for WC4, and 57.8% for WC5). With regard to college entrance test scores, the
only significant correlation was between SAT Math scores and WC5 Control of Syntax and
Mechanics (.231*).
There were no statistically significant differences in the score distributions between transfer
students and students who entered as freshman, or between honors and non-honors students. Due
to the small (though representative) number of students from each of the race/ethnicity categories
other than white, no analysis on this variable was done.
COMPARISON BETWEEN COURSES AND ASSIGNMENTS
Scores by subject are provided in appendix Table C1. Analysis of scores separated into the four
subjects did not result in significant differences in the distribution of scores, except for WC4.
There was a significant difference in the distribution of scores on the Film Studies assignment
compared to all other assignments, with the Film Studies scores much lower. This is most likely
due to the wording of the question.
16
There were differences in the distributions of scores for most of the dimensions based on the type
of assignment; for three dimensions the differences were statistically significant (WC1, WC2,
and WC4). Scores were higher on out-of-class term papers for all dimensions except WC5
Control of Syntax and Mechanics. While the differences for WC5 was not significant, the results
were contrary to expectations.
INQUIRY
At the basic studies level, the UNCW Inquiry Learning Goal is for students to practice rigorous,
open-minded and imaginative inquiry. For purposes of this Learning Goals, “Inquiry is the
systematic and analytic investigation of an issue or problem with the goal of discovery. Inquiry
involves the clear statement of the problem, issue or question to be investigated; examination of
relevant existing knowledge; design of an investigation process; analysis of the complexities of
the problem, clear rationale supporting conclusions; and identification of limitations of the
analysis” (UNCW Learning Goals, 2009). The VALUE Inquiry and Analysis rubric contains six
dimensions that are aligned with the UNCW description of Inquiry.
SUMMARY OF SCORES BY DIMENSION
Four faculty members scored 98 work products from two courses, ENG 201 and PSY 105.
Fifteen work products (15.3%) were scored by multiple scores. Table 4 provides summary
information for all work products.
17
Table 4 Inquiry Rubric Score Results
Benchmark
IN1 Topic Selection
IN2 Existing Knowledge, Research, and/or
Views
IN3 Design Process
IN4 Analysis
IN5 Conclusions
IN6 Limitations and Implications
Milestones
Capstone
0
1
2
3
4
NA
1
(1.0%)
1
(1.0%)
3
(3.1%)
4
(4.1%)
6
(6.1%)
22
(23.2%)
1
(1.0%)
26
(22.4%)
3
(3.1%)
5
(5.1%)
84
(85.7%)
40
(40.8%)
6
(6.1%)
9
(9.2%)
10
(10.2%)
6
(6.1%)
10
(10.2%)
8
(8.2%)
19
(19.4%)
26
(26.5%)
34
(34.7%)
39
(39.8%)
30
(30.6%)
32
(32.7%)
40
(40.8%)
35
(35.7%)
30
(30.6%)
14
(14.3%)
8
(8.2%)
7
(7.1%)
9
(9.2%)
7
(7.1%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
13
(13.3%)
IN1 Topic Selection was judged by scorers as not applicable for four of the five assignments.
IN2 Existing Knowledge and IN6 Limitations and Implications were each considered not
applicable for one assignment. Figure 2 and Table 5 provide the score distributions for each
dimension for work products that were scored on that dimension (i.e., work products in the NA
column above are not included).
18
INQUIRY RESULTS BY DIMENSION FOR APPLICABLE SCORES ONLY
Figure 2 Distribution of Scores for Inquiry, Applicable Scores Only
Table 5 Distribution of Scores for Inquiry, Applicable Scores Only
0
1
2
3
4
th
25 %tile
50th %tile
75th %tile
Mode
IN1
IN2
IN3
IN4
7.1%
1.7%
6.1%
9.2%
10.2%
7.1%
21.4%
6.9%
10.2%
8.2%
19.4%
30.6%
42.9%
37.9%
34.7%
39.8%
30.6%
37.6%
7.1%
44.8%
40.8%
35.7%
30.6%
16.5%
21.4%
1
8.6%
2
8.2%
2
7.1%
2
9.2%
1
8.2%
1
2
3
2
2
2
2
3
3
3
3
3
3
2
3
3
2
2,3
2
19
IN5
IN6
RESULTS BY DIMENSION
IN1 Topic Selection
This dimension was not scored for four of the five assignments. Only one English assignment
provided students with the opportunity to select their topic. For the rest of the assignments,
students were instructed to analyze a given article, looking at particular aspects of the article. For
the assignment for which this dimension was applicable, one work product did not meet the
benchmark score of 1. Three work products identified a topic that was considered too general to
be manageable (score of 1). Six work products identified a manageable topic that left out
relevant aspects (score of 2). Four work products identified a focused and manageable topic that
addressed relevant aspects of the topic (scores of 3 and 4). The small number of student work
products makes it difficult to make performance comparisons with the other dimensions.
IN2 Existing Knowledge, Research, and/or Views
This dimension was the highest scoring Inquiry dimension. This dimension was scored for all 4
English assignments, but was not scored for the Psychology assignment. (Although the
assignment asked students to summarize the background information within the article, the
scorers determined this did not fit with the intent of the rubric dimension, which required
presenting information from relevant sources with various points of view.) Almost one in ten
work products either lacked enough content to judged (scores of 0) or presented information
from irrelevant sources with limited points of view (scores of 1). Over one third of work
products presented information from relevant sources representing limited points of view (scores
of 2). Over half the work products presented in-depth information from relevant sources
representing various points of view (scores of 3 and 4).
IN3 Design Process
This dimension was scored for all five assignments. The methodology that scorers looked for
was the methodology that was provided by the instructor in the assignment directions. About
one in twenty work products demonstrated no understanding of the methodology (scores of 0).
One in ten work products demonstrated some misunderstanding of the methodology (scores of
1). One in three students utilized a process, but parts of the process were missing or incorrectly
developed (scores of 2). Almost half the students demonstrated the ability to utilize a
methodology that was appropriately developed, even though for most, more subtle elements were
not there (scores of 3 and 4).
IN4 Analysis
This dimension was scored for all five assignments. One in ten work products contained no
elements of analysis (scores of 0). One in twelve work products did not organize evidence in
order to make analysis (scores of 1). Two in five work products contained organized evidence,
although the organization was not considered effective in revealing patterns, differences, or
20
similarities (scores of 2). Two in five work products contained evidence organized effectively to
reveal patterns, differences, or similarities (scores of 3 and 4).
IN5 Conclusions
This dimension was scored for all five assignments. One in ten work products stated no
conclusions (scores of 0). One in five work products stated a general conclusion that was
ambiguous, illogical, or unsupportable from the inquiry findings (scores of 1). Three in ten work
products stated general conclusions that went beyond the scope of the inquiry (scores of 2). Four
in ten work products stated conclusions focused solely on and arising specifically from the
inquiry finding (scores of 3 and 4).
IN6 Limitations and Implications
The scores on this dimension were the lowest of all Inquiry dimensions. This dimension was
scored for four of the five assignments. One in 14 work products presented no limitations and
implications (scores of 0). Three in ten work products presented limitations and implications that
were irrelevant and unsupported (scores of 1). Over one third of work products presented
relevant and supported limitations and assumptions (scores of 2). One fourth of work products
discussed relevant and supported limitations and implications (scores of 3 and 4). Scores on this
dimension were the lowest for this rubric.
CORRELATION BETWEEN DIMENSIONS
All dimensions of Inquiry were significantly correlated with each other except IN1, Topic
Selection. For this dimension, all correlation coefficients were positive, but the only statistically
significant correlation was with IN5, Conclusions. The reason for the lack of significance was
probably due to the fact that only 14 work products were scored on this dimension. The strongest
correlations were between IN3 and IN4, IN4 and IN5, and IN1 and IN5. See appendix Table E1
for a complete presentation of correlation coefficients. It should be no surprise that the
components of inquiry are highly correlated. These large and statistically significant correlations
between the scores on each dimension of the rubric might also suggest some “cross scoring,” or
lack of independent scoring on the part of the scorers.
DEMOGRAPHIC AND PREPAREDNESS FINDINGS
There was no significant difference in the distribution of scores between males and females on
each of the dimensions. For IN1 Topic Selection the distributions were virtually the same. For
IN5 Conclusions the distributions were somewhat different, with larger percentages of male
scores in the lower levels and larger percentages of female scores in the higher levels. But even
here, the significance level on all measures of comparison was only .085.
There was no significant difference in the distribution of scores between transfer students and
students who entered as freshman. Likewise, there was no significant difference in the
21
distribution of scores between honors students and non-honors students. However the sample
size for each of these was extremely low (no more than 10 transfer students and 4 honors
students were scored on each dimension). Due to the small (though representative) number of
students from each of the race/ethnicity categories other than white (no more than 10 for each
category), no analysis on this variable was done.
There were statistically significant, though not large, positive correlations between the number of
credit hours completed and IN3 (.275**), IN4 (.279**), IN5 (.322**), and IN6 (.278*). This
finding is what we would hope for. The correlation between IN2 Existing Knowledge, which
marked the highest scores, and number of credit hours completed was small (.053) and not
significantly different from zero. There were no correlations between and dimension scores and
ACT, SAT Verbal and SAT Math scores.
COMPARISON BETWEEN COURSES AND ASSIGNMENTS
Psychology work products were not scored on dimensions IN1 and IN2. Scores for the other four
dimensions were significantly higher on the English composition papers than on the Psychology
papers (see appendix Table C2). The percent of 2’s and 3’s were about the same, but there were
no scores of 4 on the Psychology papers, and more 0’s and 1’s.
All assignments were out-of-class term papers, hence there is no comparison between
assignment types.
CRITICAL THINKING
At the basic studies level, the UNCW Critical Thinking Learning Goal is for students to use
multiple methods and perspectives to critically examine complex problems. For purposes of this
Learning Goal, “Critical Thinking is ‘skilled, active interpretation and evaluation of
observations, communications, information and argumentation’ (Fisher and Scriven, 1997).
Critical thinking involves a clear explanation of relevant issues, skillful investigation of
evidence, purposeful judgments about the influence of context or assumptions, reasoned creation
of one’s own perspective, and synthesis of evidence and implications from which conclusions are
drawn” (UNCW Learning Goals, 2009). The VALUE Critical Thinking rubric contains five
dimensions that are aligned with the UNCW description of Critical Thinking.
SUMMARY OF SCORES BY DIMENSION
A total of 183 student work products from two Music 115, three Psychology 105, and three
Sociology 105 sections were scored by seven scorers. Twenty-two work products (12.0%) were
scored by multiple scorers. Table 6 provides summary information for all work products.
22
Table 6 Critical Thinking Score Results
Benchmark
CT1 Explanation of Issues
CT2 Evidence
CT3 Influence of Context and
Assumptions
CT4 Student’s Position
CT5 Conclusions and Related Outcomes
Milestones
Capstone
0
1
2
3
4
NA
12
(6.6%)
13
(7.1%)
31
(16.9%)
40
(21.9%)
44
(24.0%)
40
(21.9%)
42
(23.0%)
60
(32.8%)
35
(19.1%)
40
(21.9%)
39
(21.3%)
13
(7.1%)
11
(6.0%)
7
(3.8%)
0
(0.0%)
38
(20.8%)
20
(10.9%)
64
(35.0%)
20
(10.9%)
39
(21.3%)
49
(26.8%)
41
(22.4%)
47
(25.7%)
33
(18.0%)
27
(14.8%)
9
(4.9%)
1
(0.5%)
5
(2.7%)
39
(21.3%)
56
(30.6%)
Each of the five dimensions was judged by the scorers as not applicable for at least one of the
assignments. CT3 Influence of Context and Assumptions and CT5 Conclusions and Related
Outcomes were not applicable for three assignments; CT1 Explanation of Issues and CT4
Student’s Position were not applicable for two assignments. Figure 3 and Table 7 provide the
score distributions for each dimension for work products that were scored on that dimension (i.e.,
work products in the NA column above are not included).
23
CRITICAL THINKING RESULTS BY DIMENSION FOR APPLICABLE SCORES ONLY
Figure 3 Distribution of Scores for Critcal Thinking, Applicable Scores Only
Table 7 Distribution of Scores for Critcal Thinking, Applicable Scores Only
CT1
0
1
2
3
4
th
25 %tile
50th %tile
75th %tile
Mode
CT 2
CT 3
CT 4
CT 5
8.3%
8.0%
26.1%
13.9%
30.7%
27.6%
27.0%
33.6%
34.0%
32.3%
29.0%
36.8%
29.4%
32.6%
26.0%
27.6%
23.9%
10.9%
18.8%
7.1%
7.6%
1
4.3%
1
0.0%
0
0.9%
1
3.9%
0
2
2
1
2
1
3
3
2
2
2
2
2
1
1
1
24
DIMENSION
CT1 Explanation of Issues
This dimension was scored for six of the eight assignments. Scores on this dimension were the
highest of all dimensions of Critical Thinking (along with CT2 Evidence). Less than one in ten
work products provided no explanation of the issue (scores of 0). Over one fourth of work
products stated the issue or problem with no clarification (scores of 1). Over one third of work
products stated the issue or problem, but left some points ambiguous (scores of 2). One in three
work products stated, described, and clarified the issue or problem (scores of 3 and 4).
RESULTS BY
CT2 Evidence
This dimension was scored for seven of the eight assignments. Scores on this dimension were the
highest of all dimensions of Critical Thinking (along with CT1 Explanation of issues). Over one
third of the work products provided no evidence (scores of 0) or provided evidence that was
taken from sources without evaluation of relevance or factualness (scores of 1). Over one third of
students provided evidence with some interpretation, although the viewpoints of authors were
taken as fact, with little questioning (scores of 2). About three in ten work products provided
evidence that was interpreted, analyzed or synthesized, and evaluated as to factualness (scores of
3 and 4).
CT3 Influence of Context and Assumptions
This dimension was scored for five of the eight assignments. Scores on this dimension were one
of the lowest (along with CT5 Conclusions and related outcomes). One fourth of the work
products demonstrated no awareness of assumptions (scores of 0). Over one third of work
products showed an emerging awareness of assumption and some identification of context
(scores of 1). Three in ten work products questioned some assumptions (but overlooked others)
and identified some relevant context (scores of 2). One in ten work products identified the
student’s own and others’ assumptions as well a several relevant contexts (scores of 3). There
were no scores of 4.
CT4 Student’s Position
This dimension was scored for six of the eight assignments. One in seven work products
contained no statement of student position (scores of 0). One third of the work products provided
a simplistic or obvious position (scores of 1). One third of the work products provided a specific
position that acknowledged different sides of an issue (scores of 2). One in ten work products not
only acknowledged difference sides of an issue, but incorporated those positions and took into
account the complexities of the issue (scores of 3 and 4).
CT5 Conclusions and Related Outcomes
This dimension was scored for five of the eight assignments. Scores were the lowest on this
dimension. One in three work products provided no conclusions (scores of 0). Almost one third
25
of work products provided oversimplified outcomes and conclusions that were inconsistently tied
to some of the information discussed (scores of 1). Approximately one fourth of work products
provided conclusions that were logically tied to information (because information was chosen to
fit the conclusion) and identified some related outcomes clearly (scores of 2). Approximately one
in ten work products provided conclusions logically tied to a range of information, including
opposing viewpoints and identified related outcomes clearly (scores of 3 and 4).
CORRELATION BETWEEN DIMENSIONS
All dimensions of Critical Thinking were significantly correlated with each other. The
correlation coefficients range in magnitude from .247 to .692. See appendix Table E1 for a
complete presentation of correlation coefficients. It should be come as surprise that the
components of critical thinking are highly correlated. The fairly large values of the correlation
coefficients, and the fact that they are all statistically significant, might also point to some “cross
scoring” or lack of independence in the scoring of each dimension.
DEMOGRAPHIC AND PREPAREDNESS FINDINGS
The following statistically significant correlations were found: the number of hours completed
was negatively correlated with CT5 (-.320**), SAT Verbal was positively correlated with CT3
(.271**), CT4 (.261**), and CT5 (.306**), and the ACT composite score was positively
correlated with CT4 (.442**).
There were no significant differences between the distributions of scores between genders, or
between transfer students and students starting as freshmen, or between honor and non-honor
students. There were only 5 honors students in this part of the sample, and their scores were
distributed across the scale. Due to the small (though representative) number of students from
each of the race/ethnicity categories other than white (no more than 10 for each category), no
analysis on this variable was done.
COMPARISON BETWEEN COURSES AND ASSIGNMENTS
Scores by subject are provided in appendix Table C3. There were statistically significant
differences between courses on all dimension except CT4. In each case, the score distributions
were higher for MUS115 than for SOC105 and PSY105 (PSY105 was not scored on CT3 and
CT5).
There were differences in the distributions of scores for most of the dimensions based on the type
of assignment; for two dimensions the differences were statistically significant (CT3 and CT5).
Scores were higher on out-of-class term papers for all dimensions except CT1 Explanation of
issues.
26
SOCIAL SCIENCE FOUNDATIONAL KNOWLEDGE
At the basic studies level, the UNCW Foundational Knowledge Learning Goal is for students to
acquire foundational knowledge, theories and perspectives in a variety of disciplines. For
purposes of this Learning Goal, “Foundational knowledge comprises the facts, theories,
principles, methods, skills, terminology and modes of reasoning that are essential to more
advanced or independent learning in an academic discipline” (UNCW Learning Goals, 2009). A
locally created rubric for assessing Foundational Knowledge in the Social Sciences, based on the
Social Science Component student learning outcomes, was piloted in Spring 2010.
SUMMARY OF SCORES BY DIMENSION
A total of 45student work products from two Sociology 105 online sections were scored by one
scorer. Table 8 provides summary information for the work products.
Table 8 Foundational Knowledge Score Results
Use of Discipline Terminology
Explanation and Understanding of Concepts
and Principles
0
1
2
3
4
NA
0
(0.0%)
0
(0.0%)
27
(60.0%)
27
(60.0%)
10
(22.2%)
11
(24.4%)
7
(15.6%)
5
(11.1%)
1
(2.2%)
2
(4.4%)
0
(0.0%)
0
(0.0%)
RESULTS BY DIMENSION
FK1 Use of Disciplinary Terminology
In three in five work products, the meaning of the discourse was unclear, and/or attempts to use
terminology were inaccurate or inappropriate to the context (scores of 1). Over one in five work
products conveyed meaning, although not always using appropriate terminology (scores of 2).
Not quite one in five work products conveyed meaning by using all relevant terminology
appropriately (scores of 3 and 4).
FK2 Explanation of Understanding of Concepts and Principles
Three in five work products demonstrated an attempt to describe or explain concepts or
principles, but those descriptions or explanations were too vague or simplistic (scores of 1). One
fourth of the work products explained concepts and principles at a basic level, but left out
important information or connections (scores of 2). One in seven work products accurately
explained the concepts and principles within the context of the situation.
CORRELATION BETWEEN DIMENSIONS
There was a very high, statistically significant correlation between the two dimensions of the
rubric (.933**). This might indicate some “cross scoring,” or lack of independence in scoring the
two dimensions. Another possibility is that the two dimensions are assessing almost the same
thing.
27
DEMOGRAPHIC AND PREPAREDNESS FINDINGS
There were no significant differences between the distribution of scores for males and females on
either dimension. In addition, there were no correlations that were significantly different from
zero between either dimension and credit hours completed, or ACT or SAT Math scores.
However, there was a statistically significant correlation between FK1 Use of Disciplinary
Terminology and students’ SAT Verbal scores (.354*).
COMPARISON OF SCORES FROM TWO RUBRICS
Assignments often require the demonstration of skills related to multiple learning goals, such as
written communication and critical thinking, or written communication and inquiry and analysis.
Seven instructors suggested two rubrics as fitting their assignments, and six assignments were
scored with two rubrics: 27 work products from two sections were scored using both the Written
Communication and Inquiry rubrics, 37 work products from two sections were scored using both
the Written Communication and Critical Thinking rubrics, 40 work products from one section
were scored using both the Inquiry and Critical Thinking rubrics, and 45 work products from two
sections (one assignment) were scored using both the Critical Thinking and Foundational
Knowledge rubrics. Each work product was scored on the two rubrics by a separate set of scorers
trained on the rubric.
Any comparison of scores across rubrics must be done with caution, and should only serve as a
starting place for further investigation. This is because the criteria for each level of the rubric
cannot be assumed to be scaled the same. For example, Level 2 cannot be considered to be in the
identical place on a scale of abilities for each of the rubrics. With this in mind, the review of the
distribution of scores for work products scored with two rubrics provides the following
observations:
•
•
•
For those work products scored with both the Written Communication and Inquiry
rubrics, there was no dimension that stood out as relatively stronger than the others, while
those of WC5 Control of Syntax and Mechanics, WC 3 Genre and Disciplinary
Conventions, and IN1 Topic Selection, were relatively low.
For those work products scored with both the Written Communication and Critical
Thinking rubrics, the distribution of scores on WC1 Context and Purpose for Writing and
CT1 Explanation of issues were relatively strong, while those of CT3 Influence of
context and assumptions and CT5 Conclusions and related outcomes were relatively low.
For those work products scored with both the Inquiry and Critical Thinking rubrics, there
was no dimension that stood out as relatively stronger or weaker than the others.
28
•
For those work products scored with both the Critical Thinking and Foundational
Knowledge rubrics, there was no dimension that stood out as relatively stronger than the
others, while that of CT5 Conclusions and related outcomes was relatively low.
For the most part, these results are consistent with what you would see by comparing the results
of all work products scored with each rubric. An exception to this is point three, where scores for
this set of work products were about the same for Inquiry and Critical Thinking. A review of the
scores of all work products scored on Critical Thinking and those scored on Inquiry would
suggest that student skills on inquiry are strong relative to critical thinking. The fact that there
was only one assignment in this sample makes any conclusions based solely on it unjustified.
Correlation was tested between the scores on all the dimensions from each rubric. Three fourths
of the 95 correlations were not significantly different from zero at the .05 level. See appendix
Tables E1 and E2 for complete correlation tables. The statistically significant correlations are
presented in Table 9.
Table 9 List of Statistically Significant Correlations across Rubrics
Dimensions
Spearman’s
Sample Size
Dimensions
Rho
CT1 – IN3
.529**
40
WC1 – CT1
CT1 – IN4
.506**
40
WC1 – IN1
CT1 – IN5
.363*
40
WC1 – IN2
CT1 – IN6
.434**
40
WC5 – IN4
CT2 – IN6
.399*
40
WC5 – IN6
CT5 – IN6
.420**
40
WC5 – IN5
WC5 – CT5
FK1 – CT1
.471**
45
FK2 – CT1
FK1 – CT2
.372*
45
FK2 – CT2
FK1 – CT3
.398**
45
FK2 – CT3
FK1 – CT4
.484**
45
FK2 – CT4
FK1 – CT5
.336*
45
FK2 – CT5
Spearman’s
Rho
.576*
-.569**
-.425*
.533**
.712**
.427*
-.624**
.455**
.340*
.347*
.474**
.323*
Sample Size
19
14
27
27
14
27
18
45
45
45
45
45
*Statistically significant at the .05 level
**Statistically significant at the .01 level
There were six significant correlations between the dimensions of Critical Thinking and the
dimensions of Inquiry and Analysis, all positive. CT1 Explanation of Issues was correlated with
four of the six Inquiry dimensions, IN3 Design Process, IN4 Analysis, IN5 Conclusions, and IN6
Limitations and Implications. (Note that there are no correlation coefficients between CT1 and
the other two dimensions of Inquiry because those two dimensions were considered by the
29
scorers to be not applicable to the assignment.) CT2 Evidence was positively correlated with IN6
Limitations and Implications. And CT5 Conclusions and related outcomes (implications and
consequences) was positively correlated with IN6 Limitations and Implications. It is actually
surprising that this correlation was not higher. The skills of Critical Thinking and Inquiry and
Analysis overlap to a large extent, so these results are not surprising.
The correlations between Written Communication and the other rubrics are very inconsistent and
difficult to interpret. It is not surprising that WC1 Context of and Purpose for Writing is
positively correlated with CT1 Explanation of issues, which requires a description of the issue or
problem to be considered. It is very surprising, however, that WC1 Context and Purpose for
Writing is negatively correlated with IN1 Topic selection. Further review of the 14 work
products in this subsample shows extremely high scores on WC1 compared to the rest of the
sample. This sample of 14 work products from one assignment is too small to draw any
conclusions from. It is also interesting that WC1 is negatively correlated with IN2 Existing
Knowledge, Research, and/or Views, as IN2 and WC2 Content Development are overlapping
concepts, and WC1 and WC2 are highly correlated. In fact it is interesting that WC2 and IN2 are
not significantly correlated (the correlation coefficient is negative, but not significantly different
from zero). As above, the number of work products is too small to draw any conclusions from.
WC5 Control of Syntax and Mechanics is correlated with four dimensions from the other two
rubrics, one correlation being negative. Any significant positive correlation could be an
indication that scorers, while not looking specifically for control of syntax and mechanics, are
biased by the overall quality of use of language. The fact that one of the correlations is negative
(with CT5 Conclusions and related outcomes) and the fact that the other three Inquiry
dimensions are not significantly correlated with WC5 suggests that this is probably not the case.
All of the correlation coefficients between Critical Thinking and Foundational Knowledge were
significantly difference from zero.
RELIABILITY OF SCORES
A NOTE ABOUT VALIDITY
Reliability is a necessary, but not sufficient, condition for validity. According to standards
published jointly by the American Education Research Association (AERA), the American
Psychological Association (APA), and the National Council on Measurement in Education
(NCME), validity is the degree to which evidence and theory support the interpretation of test
scores for the proposed use (AERA, 1999). The VALUE rubrics were recommended as valid
means to assess specific UNCW Learning Goals because (1) they align directly to the definitions
30
of Thoughtful Expression, Inquiry, and Critical Thinking adopted by UNCW, and (2) according
to the AAC&U developers,
“The rubric development teams relied on existing campus rubrics when available,
other organizational statements on outcomes, experts in the respective fields and
faculty feedback from campuses throughout the process. Each VALUE rubric
contains the most common and broadly shared criteria or core characteristics
considered critical for judging the quality of student work in that outcome area”
(AAC&U, 2010).
The Social Sciences Foundational Knowledge rubric also aligns with the definitions within the
UNCW Learning Goals, as well as the social science component student learning outcomes. The
rubric does, however, require additional vetting by faculty before it can meet validity standards.
MEASURING RELIABILITY
To ensure that a means of assessment, such as a rubric, that is considered to be valid produces
reliable scores, we must also look at reliability, in this case interrater reliability. The Spring 2010
scoring event was the first use of the four rubrics for the 15 scorers. Details about how scorers
were normed were given the in Methodology chapter of this report. Briefly, scorer norming
consisted of two stages. First, each scorer attended a two-hour workshop at which the rubric was
reviewed and three to four student work products were scored and discussed. Second, on the day
of scoring, scorers worked in groups of 2, 3, or 4. They began the scoring process for each
assignment packet by scoring an discussing one common work product from their packets, and
created additional scoring guidelines specific to that assignment, if necessary. There were a
number of additional common student work products in each packet so that interrater reliability
could be assessed. Only the independently scored work products were used to measure
interrater reliability (as scorers came to consensus on all discussed papers).
Interrater reliability is a measure of the degree of agreement between scorers, and provides
information about the trustworthiness of the data. It helps answer the question—Would a
different set of scorers at a different time arrive at the same conclusions? In practice, interrater
reliability is enhanced over time through scorer discussion, as well as through improvements to
the scoring instructions (rubric).
There is much debate about the best means of measuring interrater reliability. There are many
measures that are used. Some differences in the measures are due to the types of data (nominal,
ordinal, or interval data). Other differences have to do with what is actually being measured.
Correlation coefficients describe consistency between scorers. For example, if Scorer 1 always
scored work products one level higher than Scorer 2, there would be perfect correlation between
them. You could always predict one scorer’s score by knowing the other’s score. It does not,
31
however, yield any information about agreement. A value of 0 for a correlation coefficient
indicates no association between the scores, and a value of 1 indicates complete association.
Spearman’s rho rank order correlation coefficient is an appropriate correlation coefficient for
ordinal data.
Percent agreement measures exactly that—the percentage of scores that are exactly the same. It
does not, however, account for chance agreement. Percent adjacent measures the number of
times the scores were exactly the same plus the number of times the scores were only one level
different. Percent adjacent lets the research know how often there is major disagreement between
the scorers on the quality of the artifact. If percent adjacent is far from 100%, the rubric should
be reevaluated and/or additional norming may be required.
Krippendorff’s alpha is a measure of agreement that accounts for chance agreement. It can be
used with ordinal data, small samples, and with scoring practices where there are multiple
scorers. A value of 0 for alpha indicates only chance agreement, and a value of 1 indicates
reliable agreement not based on chance.
Each of these measures of reliability are provided in Table 10. For Written Communication, 19
work products were double scored. Six of those work products were discussed, leaving a sample
of 13 (11.2%) for testing interrater reliability. For Inquiry, 15 work products were double scored.
Four of those work products were discussed, leaving a sample of 11 (11.2%) for testing interrater
reliability. For Critical Thinking, 22 work products were double, triple, or quadruple scored.
Seven of those work products were discussed, leaving a sample of 15 (8.2%) for testing interrater
reliability. (Since in many cases each Critical Thinking artifact was scored by three or four
scores, the number of cases to check was larger.) Table 10 provides the results of the various
interrater reliability measures for each dimension.
32
Table 10 Interrater Reliability
Percent
Agreement
Plus
Percent
Adjacent
46.2%
38.5%
53.8%
46.2%
41.2%
84.6%
84.6%
100.0%
84.6%
100.0%
0.242
0.397
0.550
0.377
0.259
0.381
0.427
0.588*
0.377
0.284
62.5%
63.6%
81.8%
63.6%
55.6%
87.5%
100.0%
90.9%
90.9%
77.8%
0.396
0.503
0.642
0.588
0.213
0.497
0.561
0.621*
0.600
0.205
55.6%
40.0%
31.3%
41.2%
75.0%
100.0%
100.0%
81.3%
88.2%
93.8%
0.841
0.728
0.590
0.549
0.895
0.874**
0.737**
0.594*
0.542*
0.909**
Krippendorff’s
Alpha
Spearman’s
Rho
Written Communication
WC1 Context of and Purpose for Writing
WC2 Content Development
WC3 Genre and Disciplinary Conventions
WC4 Sources and Evidence
WC5 Control of Syntax and Mechanics
Inquiry and Analysis
1
IN1 Topic Selection
IN2 Existing Knowledge, Research, Views
IN3 Design Process
IN4 Analysis
IN5 Conclusions
IN6 Limitations and Implications
Critical Thinking
CT1 Explanation of Issues
CT2 Evidence
CT3 Influence of Context and Assumptions
CT4 Student’s Position
CT5 Conclusions and Related Outcomes
1
Sample was too small to analyze after the removal of assignments for which this dimension was considered Not
Applicable.
*Statistically significant at the .05 level
**Statistically significant at the .01 level
Determining acceptable values for interrater reliability measures is not easy. Acceptable levels
will depend on the purposes that the results will be used for. These levels must also be chosen in
relationship to the type of scoring tool or rubric, and the measure of reliability being used. In this
case, the tool is a “metarubric,” a rubric that is designed to be applied across a broad range of
artifacts and contexts. This type of instrument requires more scorer interpretation than rubrics
designed for specific assignments. For consistency measures, such as correlation coefficients, in
a seminal work, Nunnally states that .7 may suffice for some purposes whereas for other
purposes “it is frightening to think that any measurement error is permitted” (Nunnally, 1978,
pp.245-246). The standard set for Krippendorff’s alpha by Krippendorff himself is .8 to ensure
that the data are at least similarly interpretable by researchers. However, “where only tentative
conclusions are acceptable, alpha greater than or equal to .667 may suffice” (Krippendorff, 2004,
33
p. 241). In the present context, we should aim for values of at least .67, with the recognition that
this could be difficult given the broad range of artifacts scored with the metarubrics.
Comparing the results of the reliability indices for this study to the benchmark of .67, three
Krippendorff’s alpha and Spearman’s rho coefficients are above .67—CT1, CT2, and CT5. The
percent agreement was high for most of the Inquiry dimensions, higher than the alpha
coefficient, which is generally the case when there are two scorers. For Critical Thinking, most
work products in the sample were scored by three to four scorers, which decreased the percent
agreement (across all scorers), but increased alpha and rho. High levels of percent agreement are
seen for IN4 and CT5. Five dimensions show percent adjacent at 100%, and only 17 dimension
scores out of 202 total dimension scores (8.4%) were more than one level different.
An interesting finding is that scorers that worked in groups of 3 and 4 (those scoring the Critical
Thinking rubric) had higher interrater reliability as measured by alpha and rho than those
working in pairs. Whether this was due to the rubric itself or to working in a larger group cannot
be determined without further research.
Overall, these various measures of reliability are promising for the first use of the rubrics at
UNCW. They provide some evidence that the scorer norming activities had an effect. They do,
however, indicate that more work needs to be done.
SCORER FEEDBACK
All scores filled out two types of feedback forms. At the end of the day, each scorer filled out a
process feedback survey. This survey asked for their opinions about how well each step of the
process had gone, and for any recommendations for improvement. During the day, after
completing each packet of student work products, each scorer filled out a rubric feedback form.
This form asked for information on how well each rubric dimension fit the assignment and
student work. It also asked for feedback on the quality criteria for each dimension.
SCORER FEEDBACK ON PROCESS
Table 11 provides the results on the selected responses items on the survey.
34
Table 11 Scorer Feedback on Process
1.
2.
3.
4.
5.
6.
7.
The invitation to volunteer
accurately described the
experience.
The timing of the invitation gave
adequate opportunity to arrange
for attending workshops and
scoring.
The 2-hour norming session
adequately prepared me for what
was expected of me during the allday scoring session.
The all-day scoring session was
well-organized.
The structure of the all-day
scoring made it reasonable to
work for the full time.
When I had questions, one of the
leaders was available to answer it.
When I had questions, the
question was answered.
8.
I was comfortable scoring student
work products from outside my
discipline on the broad Learning
Goals.
9. The process is an appropriate way
to assess students on the UNCW
Learning Goals.
10. This process is valuable in
improving student learning.
11. I would participate in this process
again.
12. I would recommend participating
in this process to my colleagues.
Strongly
Agree
14
(93.3%)
Disagree
0
(0.0%)
Strongly
Disagree
0
(0.0%)
Missing
or NA
0
(0.0%)
Agree
1
(6.7%)
Neutral
0
(0.0%)
14
(93.3%)
1
(6.7%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
11
(73.3%)
2
(13.3%)
1
(6.7%)
1
(6.7%)
0
(0.0%)
0
(0.0%)
13
(86.7%)
12
(80.0%)
1
(6.7%)
2
(13.3%)
1
(6.7%)
1
(6.7%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
15
(100.0%)
13
(86.7%)
5
(33.3%)
0
(0.0%)
2
(13.3%)
4
(26.7%)
0
(0.0%)
0
(0.0%)
3
(20.0%)
0
(0.0%)
0
(0.0%)
1
(6.7%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
2
(13.3%)
5
(33.3%)
7
(46.7%)
2
(13.3%)
0
(0.0%)
0
(0.0%)
1
(6.7%)
6
(40.0%)
7
(46.7%)
7
(46.7%)
7
(46.7%)
7
(46.7%)
7
(46.7%)
1
(6.7%)
1
(6.7%)
1
(6.7%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
1
(6.7%)
0
(0.0%)
0
(0.0%)
There were also three open-ended questions on the survey. The results of these are included in
the summary below. The complete results of these questions are provided in Appendix D.
There was a high level of satisfaction with regard to most aspects of the process. The initial
contact and explanations about the responsibilities of the volunteers was clear to all. Most
scorers responded that the norming session adequately prepared them for the all-day scoring
session. Positive comments were that it was a great way to learn, that it provided context, and
that it was helpful, and even fun. Although scorers felt prepared to score, they did provide a
35
number of recommendations for improving preparation. Two scorers suggested that an additional
session, maybe optional, would have been beneficial.
With regard to the all-day scoring session, scorers were generally satisfied. Eight scorers
commented that working with a partner was beneficial. One stated that the initial discussion
facilitated uniformity and efficiency. Another said that it allowed them to check themselves and
made the exercise enjoyable. Two scorers commented on the value of participating in the scoring
process. One stated that it was a valuable way for faculty to share and learn and discuss, and the
other stated that s/he learned a lot that will be able to be applied to courses taught. One scorer
who scored the same assignment in the beginning of the day and again at the end of the day
stated that it was very instructive to revisit the same assignment (most scorers scored three
packets from different assignments). With regard to the length of the session, one respondent
suggested breaking the scoring downing into two sessions. Finally, there was a suggestion to
create a set of guidelines for the scoring process.
There were a number of comments related to the rubrics in the comments on the end-of-day
survey. Two scorers suggested defining exemplar works to use a guides in the scoring process.
One scorer was not sure that the Inquiry rubric could be applied universally. Another suggested
refining some parts of the rubrics. Most comments concerned the match between the assignments
and rubrics. Four scorers suggested that rubrics and assignments should be synchronized better
or more effectively, one suggesting that assignments be written with the rubric in mind. Another
scorer stated that fuller instructions from the professor to the students would improve the
responses.
SCORER FEEDBACK ON RUBRIC AND REQUIRED ASSUMPTIONS
Written Communication Rubric
For the most part, the Written Communication Rubric fit the six different assignments well,
requiring only a few assumptions. The dimension WC3 Genre and Disciplinary Conventions was
the most problematic, requiring a number of assumptions. All teams had questions as to where to
score use of citations, and all decided to score this under disciplinary conventions. For scorers
who were scoring outside their discipline, lack of knowledge about disciplinary conventions was
also an issue. This issue was somewhat abated because there were faculty from four of the five
disciplines at the event who could answer questions. The dimension WC4 Sources and Evidence
was problematic for one assignment (See Written Communications Results section). In this case,
it was determined by the scorers that the prompt for the assignment could have been improved.
Inquiry and Analysis
Inquiry is approached differently across the disciplines, making it difficult to use a metarubric.
However, the way the dimensions are divided places most of these differences into one
dimension—IN3 Design Process, and most instructors provided information on the approach to
36
the inquiry in the assignment directions. This dimension required the scorers to make specific
guidelines for four of the five assignments . The guideline common to all scoring was that the
appropriate methodological approach was that described (or inferred) in the assignment
instructions.
One dimension in the rubric stands out as clearly not applicable to most assignments at the basic
studies level—IN1 Topic selection. Only one assignment gave students a clear choice of topics
(a second one gave choice of focus within the topic). Therefore, this dimension was scored as not
applicable for four of the five assignments.
The dimension IN2 Existing Knowledge, Research, and/or Views fit all assignments except one.
Although the assignment asked students to summarize the background information within the
article, the scorers determined this did not fit with the intent of the rubric dimension, which
required presenting information from relevant sources with various points of view.
The dimensions IN4 Analysis and IN5 Conclusions fit all assignments well. There were a
number of comments on the quality criteria for the Conclusions dimension. Scorers felt that it
does not follow that the level 2 quality criteria is “better” than the level 1 criteria, or visa versa,
for that matter. Because of this flaw, the level 3 criteria was too broad, requiring a score of 3
even if the conclusion was weak. There was one similar comment about the Analysis dimension
quality criteria, indicating a large jump between the level 2 and level 3 criteria.
The dimension IN6 Limitations and Implications was assumed not to fit one assignment.
However, review after the scoring session showed that the assignment did contain a question
regarding implications. This illustrates the need to check scorer assumptions before they are
used.
Critical Thinking
This rubric was the most difficult for scorers to apply. Some of the difficulty seems to have come
from the rubric itself, and some with the course instructors’ understanding of the dimensions of
the rubric. Most assignments did not fit all dimensions. Each of the five dimensions was judged
by the scorers as not applicable for at least one of the assignments. CT3 Influence of Context and
Assumptions and CT5 Conclusions and Related Outcomes were not applicable for three
assignments; CT1 Explanation of Issues and CT4 Student’s Position were not applicable for two
assignments.
Most of the assumptions were clarifications of the quality criteria into their own word. There
were a number of assumptions for the dimension CT2 Evidence. Since this dimension has two
parts, scorers needed to create rules about what happens when a work product met one part but
not the other at any given level.
37
INSTRUCTOR FEEDBACK
A brief survey was sent to the 13 instructors who provided the student work products. Four
responded. The survey was sent at the end of April right after the all-day scoring session.
Instructors were asked to comment on the process of selecting an assignment, the collection
process, and to provide any additional comments. All four respondents said that the assignment
selection process was not difficult, although one said that he would appreciate a bit more
training. Regarding the collection process, all said that it was clear and worked well; three also
mentioned that the communication from the beginning of the process until the time of collection
was very important. Under other comments, two respondents mentioned that the only suggestion
for improvement in the process would be more updates or meetings.
No clear signs of difficulties with the process were seen in the survey nor in any of the
interactions with the instructors throughout the semester. A portion of the communications,
mainly concerning collection of papers, was between a graduate assistant and the instructors, and
this worked well.
38
DISCUSSION, LIMITATIONS AND RECOMMENDATIONS
Returning to the purpose of the general education assessment activities in Spring 2010, what
evidence was obtained for the following questions:
•
•
•
•
What are the overall abilities of students taking basic studies courses with regard to the
UNCW Learning Goals of Thoughtful Expression, Inquiry, and Critical Thinking?
What are the relative strengths and weaknesses within the subskills of those goals?
Are there any differences in performance based on demographic and preparedness
variables such as gender, race or ethnicity, transfer students vs. freshman admits, total
hours completed, or entrance test scores?
What are the strengths and weaknesses of the assessment process itself?
THOUGHTFUL EXPRESSION (WRITTEN COMMUNICATION)
The median score on all five dimensions of Written Communication was 2, which means that at
least half of students work products sampled demonstrate performance at the first Milestone
level (level 2). In fact, for all dimensions, it was substantially over half. Table 12 shows the
percent of work products scored at a level 2 or higher and the percent of work products scored at
a level 3 or higher for each dimension.
Table 12 Written Communication Percent of Sample Scored at Least 2 and at Least 3
Dimension
WC1 Context of and Purpose for Writing
WC2 Content Development
WC3 Genre and Disciplinary Conventions
WC4 Sources and Evidence
WC5 Control of Syntax and Mechanics
% of Work Products
Scored 2 or higher
84.4%
76.8%
74.1%
64.7%
76.7%
% of Work Products
Scored 3 or Higher
46.5%
39.7%
32.7%
44.0%
42.2%
The university has not as yet determined an expected level of attainment on this rubric for basic
studies students, so no statement can be made as to whether expectations have or have not been
met. These scores may demonstrate adequate skills for the majority of students completing basic
studies courses. These scores also demonstrate, however, the need for additional practice and
feedback for the remainder of their four-year experience. There is some evidence from this study
that some students may not attain skills at the level 4 before leaving the university. Scores
increased with hours of coursework completed on only three of the five dimensions, and even
these correlations were low. Additionally, results showed that for each dimensions there were
39
still a large portion of students who had completed 90 or more credit hours who scored below 3
(from 36.4 to 54.6% per dimension). These findings point to the need for additional instruction
and practice in the writing process, such as from the Writing Intensive curriculum that will be
introduced in the new University Studies curriculum.
Student performance was strongest on WC1 Context and Purpose for Writing. Areas of relative
weakness were WC3 Genre and Disciplinary Conventions and WC4 Sources and Evidence.
However, a substantial portion of the work products that were scored below the level 2 on WC4
were from one assignment. Had the test question been specific in requiring evidence, it’s
possible these scores would have been higher. This fact, along with findings from the other
rubrics, indicates the need for additional training of instructors on the dimensions of each
Learning Goal.
Other than the issue just mentioned, there were no significant differences in the score
distributions between the four content areas. There were significant differences in scores by type
of assignment, with higher scores on the term papers than on the in-class test questions for 4 of
the 5 dimensions. Scores on WC5 Control of Syntax and Mechanics were higher on the in-class
test questions (though not significantly), which could be due to scorers being more lenient in this
area, taking into account the time constraints. This is an area for additional investigation and to
be discussed with scorers in the future.
A significant finding, consistent with other research findings, was that females scored higher
than males on all dimensions, and significantly higher on WC2 Content Development and WC5
Control of syntax and mechanics. This suggests the need for additional writing support for males.
Differences between student-reported race/ethnicity classifications could not be researched due
to the fact that the samples were too small for all categories except white. Future research will
need to determine ways that possible differences between race/ethnicity classifications can be
checked for.
The only correlation between SAT or ACT scores was a positive correlation between SAT Math
scores and WC5 Control of Syntax and Mechanics. There were no differences between transfer
student and 4-year students, or between honors and non-honors students. The lack of strong
association between preparedness and writing ability as scored by this rubric is difficult to
interpret.
The rubric was deemed by the scorers to fit the student work products well. There was some
scorer confusion about dimension WC3Genre and Disciplinary Conventions, which will need to
be addressed in scorer training and/or through some wording changes in the rubric. Scorers
suggested some minor changes to quality criteria. There were high correlations between the
40
scores of each dimension, which may indicate that more emphasis needs to be placed on scoring
each dimension separately, in accordance with the rubric. Interrater reliability measures were all
lower than we would like them to be, and ways to improve them in the future need to be
implemented.
INQUIRY
The median value of five the six dimensions was 2, while the median of IN2 Existing
Knowledge, Research, and/or Views was 3. As you can see in Table 13 below, more than 60% of
work products were scored at least a level 2 on each dimension.
Table 13 Inquiry Percent of Sample Scored at Least 2 and at Least 3
Dimension
IN1 Topic Selection
IN2 Existing Knowledge, Research, and/or Views
IN3 Design Process
IN4 Analysis
IN5 Conclusions
IN6 Limitations and Implications
% of Work Products
Scored 2 or higher
71.5%
91.4%
83.7%
82.6%
70.4%
62.3%
% of Work Products
Scored 3 or Higher
28.6%
53.5%
49.0%
42.8%
39.8%
24.7%
The university has not as yet determined an expected level of attainment on this rubric for basic
studies students, so no statement can be made as to whether expectations have or have not been
met. These scores show that a large majority of students possess inquiry skills at least at level 2,
and a large portion at level 3 or higher. Growth, of course, is needed in the skills represented by
each dimension for most students in order to attain level 4 skills. There were statistically
significant, though not large, increases in scores as credit hours completed increased for four of
the dimensions, demonstrating the positive impact of coursework on inquiry skills. The
correlation between hours completed and IN2 Existing Knowledge was small (.053) and not
significantly different from zero. Along with the fact that the scores on this dimension were very
high, this may suggest most that students enter the university with adequate skills for their basic
studies work. Additional investigation is needed to shed more light on this issue. Students with
over 90 hours of coursework completed (there were only five in the sample), scored at levels 3
and 4 for all dimensions, except for one score of 2 for IN5, and two scores of 1 and one score of
2 for IN6.
Student performance was strongest on IN2 Existing Knowledge, Research, and/or Views, which
represents the skill in presenting information from relevant, diverse sources. Scores on IN6
Limitations and Implications were the weakest regardless of credit hours completed. This
41
indicates that students need additional instruction and practice in presenting and supporting the
limitations and implications of inquiry findings.
Only two content areas were scored with this rubric—ENG201 and PSY105. Scores were
significantly higher on the English composition papers than the Psychology papers on the four
dimensions that all assignments were scored on. This data may suggest that the use of a process
approach to writing affects not only the dimensions of written communication, but also the
dimensions of inquiry. Although there is not often time for instructor feedback loops in many
basic studies classes, peer review mechanisms could provide benefits in such courses.
The only statistically significant correlations between Inquiry scores and demographic and
preparedness variables were those between the number of credit hours completed and four of the
dimensions of the rubric, which was discussed above. However, differences between studentreported race/ethnicity classifications could not be researched due to the fact that the samples
were too small for all categories except white. Future research will need to determine ways that
possible differences between race/ethnicity classifications can be checked for.
The correlations between the scores on each dimension were fairly large and statistically
significant except for those involving correlations with IN1, which had a small sample. This
suggests the possibility that scorers were not looking at each dimension independently, which
would introduce bias into the results. Future training and scoring sessions will focus on this
issue. Interrater reliability measures were all lower than we would like them to be, and ways to
improve them in the future need to be implemented.
The VALUE rubric fit the sample assignments well, except for IN1 Topic Selection. Only one
assignment was scored on that dimension. All other assignments did not require students to
select a topic. While this might suggest a concern to some readers, there are also arguments to
the contrary. Curriculum for teaching inquiry skills is often constructed to give students
increasing ownership of the various steps in the inquiry process (for example, see Teaching
Issues and Experiments in Ecology, 2010). In the level often called guided inquiry, the research
question is given to the student (as well as other portions of the process). While students should
graduate with the skills to complete open-ended inquiry, the guided inquiry represented by this
sample of student work products is an appropriate step in the learning process and therefore an
appropriate sample for assessing Inquiry at the basic studies level.
There were a number of scorer comments on the quality criteria for dimensions IN4 Analysis and
IN5 Conclusions. The rubric needs to be evaluated in these areas for ways to address these
concerns.
42
Finally, while scores on this Learning Goal are relatively high when comparing all three rubrics,
any comparison of scores across rubrics must be done with caution.
CRITICAL THINKING
The median score on CT1, CT2, and CT 4 was 2, which means that at least half of student work
products sampled demonstrate performance at the first Milestone level, level 2. The median
score for CT3 and CT5 was 1, indicating that less than half of student work products
demonstrated performance at the first Milestone level (level 2). Table 14 shows the percent of
work products scored at a level 2 or higher and the percent of work products scored at a level 3
or higher for each dimension.
Table 14 Critical Thinking Percent of Sample Scored at Least 2 and at Least 3
Dimension
CT1 Explanation of Issues
CT2 Evidence
CT3 Influence of Context and Assumptions
CT4 Student’s Position
CT5 Conclusions and Related Outcomes
% of Work Products
Scored 2 or higher
64.1%
65.0%
40.3%
51.1%
37.0%
% of Work Products
Scored 3 or Higher
35.1%
28.2%
10.9%
18.5%
11.0%
The university has not as yet determined an expected level of attainment on this rubric for basic
studies students, so no statement can be made as to whether expectations have or have not been
met. These scores may demonstrate adequate skills for the majority of students completing basic
studies courses in only three of the five dimensions of critical thinking. These results are
consistent with the evidence attained so far from the Collegiate Learning Assessment (CLA).
These scores also demonstrate the need for additional practice and feedback for the remainder of
their four-year experience. Evidence from this study suggests, however, that student abilities
may not be improving by graduation. All correlations between dimension scores and credit hours
completed were negative, although only one was statistically significant (CT5 -.329).
Additionally, results showed that for each dimension there were still an extremely large portion
of students who had completed 90 or more credit hours who scored below 3 (from 72.7% to
100%).
It should be no surprise that students performed higher in explaining the issue (CT1) and in
presenting evidence from sources (CT2) than they did on the other dimensions. What is hidden
from this summary is the number of assignments in this sample that were not scored on all
dimensions. This fact indicates is that students need more opportunities to practice all
dimensions of critical thinking, especially those skills associated with understanding the
43
influence of context and assumptions (CT3), formulating a position (CT4), drawing conclusions,
and understanding implications and consequences (CT5). This does not necessarily mean that all
dimensions must be practiced on the same assignment.
There were significant positive correlations with the preparedness variables SAT (Verbal) and
ACT scores. There were no differences between genders, transfer students and students who
started as freshman, and honors and non-honors students (there were only 5 honors students in
this part of the sample). Scores could not be analyzed with respect to race/ethnicity due to the
small sample size of all categories except white.
There was not a significant difference between scores by assignment type for the three highest
scoring dimensions. However, there were significant differences between scores by assignment
type for the two lowest-scoring dimensions, CT3 and CT5, where the scores on the one term
paper were higher than on the test questions. The lack of association for three dimensions may be
more interesting than the association.
The correlation between the scores on each dimension was fairly large and statistically
significant. This suggests the possibility that scorers were not looking at each dimension
independently. Future training and scoring sessions will focus on this issue. Interrater reliability
met the benchmark for three of the five dimensions, and the reliability of the other two
dimensions was still high compared to the other two rubrics. This was the only rubric for which
scorers were placed into groups of 3 and 4 for part of the scoring session. Whether the higher
reliability results were due to the rubric itself or to working in larger groups cannot be
determined without further investigation. However, given the similarity in structure of the
rubrics, the hypothesis that working in larger groups improves calibration seems worthy of future
investigation.
This rubric was the most difficult for scorers to apply. Some of the difficult seems to have come
from the rubric itself, and some with the course instructors’ understanding of the dimensions of
the rubric. The dimensions and rubric quality criteria need to be reviewed. In addition, more time
needs to be spent in the information sessions for instructors on each of the dimensions of critical
thinking. It is all important to keep in might that assignments at the basic studies level may not
always address all dimensions of critical thinking.
SOCIAL SCIENCE FOUNDATIONAL KNOWLEDGE
The median score on both FK1 and FK2 was 1, which means that less than half the work
products received scores of 2 or higher. Table 15 below shows the percent of work products
44
scored at a level 2 or higher and the percent of work products scored at a level 3 or higher for
each dimension.
Table 15 Foundational Knowledge Percent of Sample Scored at Least 2 and at Least 3
Dimension
FK1 Use of Discipline Terminology
FK2 Explanation and Understanding of
Concepts and Principles
% of Work Products
Scored 2 or higher
40.0%
40.0%
% of Work Products
Scored 3 or Higher
17.8%
15.6%
Information from this rubric is for the evaluation of the rubric and process only, as this is the first
use of the rubric, evidence comes from only one of the social sciences, and only one scorer
assessed the work products. Both sections were online, and the test question came halfway into
the semester. The correlation between scores from each dimensions were significant and high
(.933). This rubric needs further review by social science faculty and further testing before
results can be evaluated with confidence.
One conclusion does start to come to light from the data. The test was given approximately halfway through the semester. The results indicate that this might be too early to assess foundational
knowledge. Assessment of terminology and concepts would be more appropriate at the end of an
introductory course, or during a second course, when more than one course in a discipline is
required.
RELATIVE STRENGTHS AND WEAKNESSES ACROSS RUBRICS
Comparison of scores across rubrics should be done cautiously and should only serve as a
starting place for further investigation. This is because the criteria for each level of the rubric
cannot be assumed to be scaled the same. For example, Level 2 cannot be considered to be in the
identical place on a scale of abilities for each of the rubrics. With this in mind, determination of
university-wide expectations for performance in basic studies courses should be done on a
rubric-by-rubric basis.
With this caveat in mind, it is helpful for the prioritization of effort to look at potential areas of
strength and weakness. Looking at the results from all scoring, the following areas stand out.
Areas of Relative strengths:
• IN2 Existing Knowledge, Research, and/or Views – presenting information from relevant
sources representing various points of view
45
•
•
WC1 Context and Purpose for Writing – demonstrating consideration of context,
audience, purpose, and a clear focus on the assigned task
IN3 Design Process – developing critical elements of the methodology or theoretical
framework
Areas of Relative weaknesses:
• CT5 Conclusions and related outcomes – logically tying conclusions to a range of
information, including opposing viewpoints
• CT3 Influence of context and assumptions – identifying own and others’ assumptions and
relevant contexts when presenting a position
• CT4 Student’s position – stating a specific position that takes into account the complexity
of an issue, including acknowledging others’ points of view
When you look at the results from only those work products that were scored on more than one
rubric, for the most part, these results are consistent with what is listed above. The notable
exception is that for this smaller sample, the Inquiry dimensions were low along with those of
Critical Thinking. However, there was only one assignment scored on both rubrics, making this
evidence less compelling.
In additional, an important finding is that, on all three VALUE rubrics, scores tended to drop as
you go down the dimensions of the rubric. That is, student abilities in understanding purpose,
explaining issues, and presenting information—those used to begin communication and
investigation—are stronger than their skills in identifying assumptions, stating conclusion or
position, and discussing limitations, implications, and consequences—those used to critically
evaluate information. The findings in this section all point towards the need to provide students
more opportunities to practice higher-order thinking skills, starting with general education
courses.
METHODOLOGY AND PROCESS
This assessment methodology ran fairly smoothly during its first full-scale implementation.
Feedback was good from both instructor and scorer participants. Based on the feedback from
scorers and instructors, and the results presented in this report, there are some areas for further
work.
PROCESS OF SELECTING ASSIGNMENTS
Most assignments selected for scoring with the Written Communication and Inquiry rubrics
matched their respected rubrics well. However, this was not the case for the Critical Thinking
rubric. There was no Critical Thinking dimension that scorers deemed applicable to all
46
assignments, and two of dimensions were not deemed applicable for three of the seven
assignments. This indicates clearly the need for additional discussion of the dimensions of
critical thinking in the initial workshop for instructors, in addition to follow up during the
selection process. This is not meant to suggest, however, that all assignments selected for general
education assessment purpose must align with all dimensions of the rubric. It would also be
helpful for instructional purposes for there to be more dissemination of information about the
UNCW Learning Goals, such as through Center for Teaching Excellence workshops, and
inclusion of these goals as appropriate in course syllabi.
PROCESS OF NORMING SCORERS
Interrater reliability was lower than the benchmark for 13 of the 16 dimensions. Reliability can
be improved in two ways. The first is by reexamining the rubrics. Scorers provided feedback on
a number of dimensions for which the quality criteria can be improved. The other means of
improving interrater reliability is through training. Scorers themselves noted that additional
training would be beneficial. In addition, the hypothesis that working in groups larger than two
improves calibrating seems worthy of future investigation.
OTHER ISSUES
Although the sample of students who contributed work products was representative of the
UNCW undergraduate student body, the sample of students from all race/ethnicity classifications
other than white was not large enough to test for differences between groups. Further studies will
need to include ways to analyze potential differences so that they can be addressed if necessary.
As previously mentioned, one of the assumptions of our use of the VALUE Rubrics is that the
Level 4 Capstone describes the qualities of understanding that we want our graduating seniors to
demonstrate. We have not defined as an institution yet our minimum expectations for our first
and second year students, the predominate group taking basic studies courses. It is important to
set these expectations to get the full benefit of the results of this and future assessment efforts.
LIMITATIONS
This was the first large-scale use of this methodology for assessing the General Education
learning goals. Therefore student ability is not the only thing reflected in the results. The
newness of the rubric to both the instructors selecting assignments and the faculty doing the
scoring has implications about the reliability of the results.
The sample of student work products for Spring 2010 was created to be a random sample from
representative courses in the Basic Studies curriculum. Still, it represents just five subject areas
in one point in time. Another limitation of the results is that interrater reliability measures were
47
lower than optimal for 13 of the 16 dimensions. Generalization to all students working on basic
studies requirements should be done only cautiously at this time. Additional sampling from other
basic studies courses as we continue to assess these Learning Goals should be combined with
these findings to provide a clearer picture of UNCW student abilities. They should also be
combined with finding from other instruments, such as the Collegiate Learning Assessment.
RECOMMENDATIONS
Although there are limitations to this study, some recommendations that will only have positive
effects on student learning can be made in light of these findings. Based on the analysis of the
findings presented in the previous sections of this chapter, the Learning Assessment Council
recommends the following actions to improve both student learning and the General Education
Assessment process.
•
Levels of expected performance at the basic studies, or lower division, level should be
developed for each rubric.
•
Additional exposure to the content of and rationale for the UNCW Learning Goals should
be provided to increase faculty ownership and awareness of these Goals. The LAC will
ask the Center for Teaching Excellence to provide a workshop series on these Goals. The
LAC will ask the University Curriculum Committee to consider actions in this area.
•
To increase student exposure to the writing process, the Writing Intensive component of
University Studies should be implemented by Fall 2012.
•
Modifications and improvements to the general education assessment process should be
made as needed, including the following: modify rubrics based on feedback, develop
benchmarks work products, and enhance instructor and scorer workshops.
•
Long term implementation schedule should provide flexibility for targeting additional
sampling for specific learning goals that are characterized by ambiguous or unclear
assessment results. For 2010 – 2011, Critical Thinking will be sampled for this purpose.
48
REFERENCES
American Association of Colleges and Universities. (2010). VALUE Project. Accessed July 15,
2010. http://www.aacu.org/value/index.cfm
American Educational Research Association. (1999). Standards for Educational and
Psychological Testing. Washington, D.C.: AERA.
Fisher A. and Scriven.M. (1997). Critical Thinking: Its Definition and Assessment. CA:
Edgeress/UK: Center for Research in Critical Thinking.
Hockenbury, D.H. and Hockenbury, S.E. (2006). Psychology (Fourth Edition). New York:
Worth Publishers.
Krippendorff, K. (2004). Content analysis: An introduction to its methodology. (2nd edition).
Thousand Oaks, CA: Sage Publications.
Lance, C.E., Butts, M.M.& Michels, L.C. (2006). The sources of four commonly reported cutoff
criteria: What did they really say? Organizational Research Methods 9 (2) 202-220.
Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York: McGraw-Hill.
Teaching Issues and Experiments in Ecology: Teaching Resources: Inquiry Framework.
Accessed June 12, 2010. http://tiee.ecoed.net/teach/framework.jpg
University of North Carolina Wilmington. (2009). Revising General Education at UNCW.
adopted by Faculty Senate March 17, 2009.
http://www.uncw.edu/universitystudies/documents/Univ.StudiesCurriculumReport.pdf
University of North Carolina Wilmington. (2009). UNCW Learning Goals. adopted by Faculty
Senate March 17, 2009. http://www.uncw.edu/assessment/uncwLearningGoals.html
University of North Carolina Wilmington. (2009). Report of the General Education Assessment
Committee, March 2009.
http://www.uncw.edu/assessment/Documents/General%20Education/GenEdAssessmentCom
mitteeReportMarch2009.pdf
49
University of North Carolina. (2009) Office of Institutional Research and Assessment. Factsheet
Fall 2009.
University of North Carolina Wilmington. (2010) Office of Institutional Research and
Assessment. Common Data Set 2009-2010. Accessed June 2, 2010.
http://www.uncw.edu/oira/documents/CDS2009_2010_finalrevisedfacultycounts_3.pdf
50
APPENDIX A RUBRICS USED
AAC&U Written Communication Rubric
AAC&U Inquiry and Analysis Rubric
AAC&U Critical Thinking Rubric
Locally created Social Sciences Foundational Knowledge Rubric
51
52
53
54
Social and Behavioral Sciences
Student Learning Outcome SBS 1: Describe and explain the major terms, concepts, and principles in at least one of the Social and
Behavior Sciences.
Rubric
Evaluators are encouraged to assign a zero to any work sample or collection of work that does not meet benchmark (cell one) level performance.
Level 4
Level 3
Use of Discipline
Terminology
Demonstrates fluency in the
terminology relevant to the
topic by displaying skillful
and precise word choices that
underscore meaning.
Conveys meaning to the
reader by using all relevant
terminology accurately and
appropriately.
Explanation and
Understanding of
Concepts and Principles
Demonstrates a thorough
understanding of the relevant
concepts and principles of the
discipline by correctly using
them in support of an
argument.
Accurately explains the
concepts and principles
within the context of the
situation, making relevant
connections.
Draft March 15, 2010
55
Level 2
Conveys meaning, although
often does not utilize terms
relevant to the topic, OR
when terminology is utilized,
it is sometimes used
inaccurately.
Explains concepts and
principles at a basic level, but
leaves out important
information and/or
connections.
Level 1
Meaning of the discourse is
unclear, and/or attempts to
use terminology of the
discipline are inaccurate or
inappropriate to the context.
Attempts to describe or
explain relevant concepts and
principles are too vague
and/or simplistic.
This page purposely blank.
56
APPENDIX B DIMENSION MEANS AND STANDARD DEVIATIONS
Note of caution: Data from these rubrics cannot be assumed to be interval-level data. That is,
although a level 2 is considered higher, or larger, than a level 1, it is not proper to assume that a
student that scores at a level 2 is twice as knowledgeable as a student who scored at a level 1; nor
can we assume that, whatever the difference is between these two categories, that it is exactly the
same as the difference between levels 2 and 3. In addition, the scale of quality criteria may differ
between the three rubrics. See page X for a more complete discuss of these possible differences.
Therefore this table should analyzed with extreme caution, and no hypothesize should be made
solely from this information.
Table B1 Means and Standard Deviations for Each Rubric Dimension
Dimension
Mean
Std. Dev.
WC1 Context of and Purpose for Writing
2.46
0.955
WC2 Content Development
2.22
0.976
WC3 Genre and Disciplinary Conventions
2.06
0.907
WC4 Sources and Evidence
1.94
1.246
WC5 Control of Syntax and Mechanics
2.24
0.900
IN1 Topic Selection
2.14
1.231
IN2 Existing Knowledge, Research, and/or Views
2.52
0.822
IN3 Design Process
2.35
0.985
IN4 Analysis
2.23
1.023
IN5 Conclusions
2.09
1.131
IN6 Limitations and Implications
1.88
1.040
CT1 Explanation of Issues
1.99
1.092
CT2 Evidence
1.90
0.998
CT3 Influence of Context and Assumptions
1.25
0.967
CT4 Student’s Position
1.58
0.972
CT5 Conclusions and Related Outcomes
1.21
1.081
FK1 Use of Discipline Terminology
1.60
0.837
FK2 Explanation and Understanding of Concepts
1.60
0.863
and Principles
57
N
116
116
116
116
116
14
58
98
98
98
85
145
163
119
144
127
45
45
This page purposely blank.
58
APPENDIX C RESULTS BY COURSE
The following three tables contain the distribution of score results by subject.
59
Table C1 Written Communication Results by Course
All Subjects (6 sections, 116 work products)
WC1 Context of and Purpose for Writing
WC2 Content Development
WC3 Genre and Disciplinary Conventions
WC4 Sources and Evidence
WC5 Control of Syntax and Mechanics
English 201 (2 sections, 27 work products)
WC1 Context of and Purpose for Writing
WC2 Content Development
WC3 Genre and Disciplinary Conventions
WC4 Sources and Evidence
WC5 Control of Syntax and Mechanics
FST 210 (1 section, 38 work products)
WC1 Context of and Purpose for Writing
WC2 Content Development
WC3 Genre and Disciplinary Conventions
WC4 Sources and Evidence
WC5 Control of Syntax and Mechanics
MUS 115 (2 sections, 33 work products)
WC1 Context of and Purpose for Writing
WC2 Content Development
WC3 Genre and Disciplinary Conventions
WC4 Sources and Evidence
WC5 Control of Syntax and Mechanics
PSY 105 (1 section, 18 work products)
WC1 Context of and Purpose for Writing
WC2 Content Development
WC3 Genre and Disciplinary Conventions
WC4 Sources and Evidence
WC5 Control of Syntax and Mechanics
0
1
2
3
4
NA
1
(.9%)
4
(3.4%)
5
(4.3%)
23
(19.8%)
1
(0.9%)
17
(14.7%)
23
(19.8%)
25
(21.6%)
18
(15.5%)
26
(22.4%)
44
(37.9%)
43
(37.1%)
48
(41.4%)
24
(20.7%)
40
(34.5%)
36
(31.0%)
36
(31.0%)
34
(29.3%)
45
(38.8%)
42
(36.2%)
18
(15.5%)
10
(8.6%)
4
(3.4%)
6
(5.2%)
7
(6.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
3
(11.1%)
5
(18.5%)
0
(0.0%)
0
(0.0%)
1
(3.7%)
3
(11.1%)
0
(0.0%)
7
(25.9%)
15
(55.6%)
7
(25.9%)
9
(33.3%)
5
(18.5%)
15
(55.6%)
4
(14.8%)
15
(55.6%)
10
(37.0%)
14
(51.9%)
5
(18.5%)
8
(29.6%)
4
(14.8%)
2
(17.4%)
1
(11.1%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
4
(10.5%)
1
(2.6%)
18
(47.4%)
1
(2.6%)
10
(26.3%)
16
(42.1%)
15
(39.5%)
11
(28.9%)
13
(34.2%)
15
(39.5%)
12
(31.6%)
14
(34.2%)
5
(13.2%)
6
(15.8%)
10
(26.3%)
6
(15.8%)
8
(21.1%)
4
(10.5%)
14
(36.9%)
2
(5.3%)
0
(0.0%)
1
(2.6%)
0
(0.0%)
4
(10.5%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
6
(18.2%)
5
(15.2%)
4
(12.1%)
5
(15.2%)
2
(6.1%)
10
(30.3%)
13
(39.4%)
16
(48.5%)
11
(33.3%)
13
(39.4%)
14
(42.4%)
10
(30.3%)
12
(36.4%)
17
(51.5%)
16
(48.5%)
3
(9.1%)
5
(15.2%)
1
(3.0%)
0
(0.0%)
2
(6.1%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
1
(5.6%)
0
(0.0%)
0
(0.0%)
1
(5.6%)
1
(5.6%)
3
(16.7%)
2
(11.1%)
4
(22.2%)
4
(22.2%)
11
(61.1%)
10
(55.6%)
3
(16.7%)
6
(33.3%)
8
(44.4%)
5
(27.8%)
4
(22.2%)
10
(55.6%)
7
(38.9%)
5
(27.8%)
1
(5.6%)
0
(0.0%)
3
(16.7%)
1
(5.6%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
60
Table C2 Inquiry Rubric Score Results by Course
0
All Subjects (5 sections, 98 work products)
Topic Selection
Existing Knowledge, Research, and/or Views
Design Process
Analysis
Conclusions
Limitations and Implications
English 201 (4 sections, 58 work products)
Topic Selection
Existing Knowledge, Research, and/or Views
Design Process
Analysis
Conclusions
Limitations and Implications
PSY 105 (1 section, 40 work products)
Topic Selection
Existing Knowledge, Research, and/or Views
Design Process
Analysis
Conclusions
Limitations and Implications
1
2
3
4
NA
1
(1.0%)
1
(1.0%)
6
(6.1%)
9
(9.2%)
10
(10.2%)
6
(6.1%)
3
(3.1%)
4
(4.1%)
10
(10.2%)
8
(8.2%)
19
(19.4%)
26
(26.5%)
6
(6.1%)
22
(23.2%)
34
(34.7%)
39
(39.8%)
30
(30.6%)
32
(32.7%)
1
(1.0%)
26
(22.4%)
40
(40.8%)
35
(35.7%)
30
(30.6%)
14
(14.3%)
3
(3.1%)
5
(5.1%)
8
(8.2%)
7
(7.1%)
9
(9.2%)
7
(7.1%)
84
(85.7%)
40
(40.8%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
13
(13.3%)
1
(1.7%)
1
(1.7%)
1
(1.7%)
1
(1.7%)
0
(0.0%)
0
(0.0%)
3
(5.2%)
4
(6.9%)
4
(6.9%)
3
(5.2%)
10
(17.2%)
9
(15.5%)
6
(10.3%)
22
(37.9%)
21
(36.2%)
27
(46.6%)
20
(34.5%)
20
(34.5%)
1
(1.7%)
26
(44.8%)
24
(41.4%)
20
(34.5%)
19
(32.8%)
9
(15.5%)
3
(5.2%)
5
(8.6%)
8
(13.8%)
7
(12.1%)
9
(15.5%)
7
(12.1%)
44
(75.9%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
13
(22.4%)
0
(0.0%)
0
(0.0%)
5
(12.5%)
8
(20.0%)
10
(25.0%)
6
(15.0%)
0
(0.0%)
0
(0.0%)
6
(15.0%)
5
(12.5%)
9
(22.5%)
17
(42.5%)
0
(0.0%)
0
(0.0%)
13
(32.5%)
12
(30.0%)
10
(25.0%)
12
(30.0%)
0
(0.0%)
0
(0.0%)
16
(40.0%)
15
(37.5%)
11
(27.5%)
5
(12.5%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
40
(100.0%)
40
(100.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
61
Table C3 Critical Thinking Score Results by Course
0
All Subjects (8 sections, 183 work products)
CT1 Explanation of Issues
CT2 Evidence
CT3 Influence of Context and Assumptions
CT4 Student’s Position
CT5 Conclusions and Related Outcomes
MUS 115 (2 sections, 39 work products)
CT1 Explanation of Issues
CT2 Evidence
CT3 Influence of Context and Assumptions
CT4 Student’s Position
CT5 Conclusions and Related Outcomes
PSY 105 (3 sections, 78 work products)
CT1 Explanation of Issues
CT2 Evidence
CT3 Influence of Context and Assumptions
CT4 Student’s Position
CT5 Conclusions and Related Outcomes
SOC 105 (3 sections, 66 work products)
CT1 Explanation of Issues
CT2 Evidence
CT3 Influence of Context and Assumptions
CT4 Student’s Position
CT5 Conclusions and Related Outcomes
1
2
3
4
NA
12
(6.6%)
13
(7.1%)
31
(16.9%)
20
(10.9%)
39
(21.3%)
40
(21.9%)
44
(24.0%)
40
(21.9%)
49
(26.8%)
41
(22.4%)
42
(23.0%)
60
(32.8%)
35
(19.1%)
47
(25.7%)
33
(18.0%)
40
(21.9%)
39
(21.3%)
13
(7.1%)
27
(14.8%)
9
(4.9%)
11
(6.0%)
7
(3.8%)
0
(0.0%)
1
(0.5%)
5
(2.7%)
38
(20.8%)
20
(10.9%)
64
(35.0%)
39
(21.3%)
56
(30.6%)
1
(2.6%)
4
(10.3%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
6
(15.4%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
5
(12.8%)
13
(33.3%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
9
(23.1%)
12
(30.8%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
4
(10.3%)
4
(10.3%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
20
(51.3%)
0
(0.0%)
39
(100.0%)
39
(100.0%)
39
(100.0%)
7
(8.0%)
6
(7.7%)
10
(12.8%)
8
(10.3%)
7
(9.0%)
14
(17.9%)
15
(19.2%)
18
(23.1%)
25
(32.1%)
29
(37.2%)
19
(24.4%)
33
(42.3%)
19
(24.4%)
28
(35.9%)
29
(37.2%)
19
(24.4%)
21
(26.9%)
11
(14.1%)
16
(20.5%)
8
(10.3%)
1
(1.3%)
3
(3.8%)
0
(0.0%)
1
(1.3%)
5
(6.4%)
18
(23.1%)
0
(0.0%)
20
(25.6%)
0
(0.0%)
0
(0.0%)
4
(6.1%)
3
(4.5%)
21
(31.8%)
12
(18.2%)
32
(48.5%)
26
(39.4%)
23
(34.8%)
22
(33.3%)
24
(36.4%)
12
(18.2%)
18
(27.3%)
14
(21.2%)
16
(24.2%)
19
(28.8%)
4
(6.1%)
12
(18.2%)
6
(9.1%)
2
(3.0%)
11
(16.7%)
1
(1.5%)
6
(9.1%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
0
(0.0%)
20
(30.3%)
5
(7.6%)
0
(0.0%)
17
(25.8%)
62
APPENDIX D DETAILED SCORER FEEDBACK
Scorer Qualitative Comments.
What parts of the process worked the best?
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
The 2-hr session was a great way to learn & score & gave context & materials that prepared me for longer
session. The paired/small group norming during long.
everything was very well-organized.
The small 2-hr norming sessioin was very helpful. Working with a partner is a great idea.
working with someone else.
The overall process was very well organized. The expectations were also very clear. There was adequate
time between norming session & the all-day scoring to review materials.
The training was excellent. Everything was well organized. Questions were answered thoroughly.
The workshop
Both the norming session and the initial group review of a common student response greatly facilitated
uniformity and efficiency of the process. Working in groups allowed us to check ourselves and made the
exercise enjoyable.
Collaboration, discussion & exchanging ideas.
The time we were allowed. I would hate to try this in a short time period.
It was good to work with a partner to come to decisions about rubric use. Having a second round of a set of
assignments later in the day was instructive.
training & creating assumptions before using the rubric for scoring
The training session was helpful, even fun. I also found it helpful to compare my assessment (scores) with
another volunteer, for the purpose of calibration.
very organized. Norming session helpful.
splitting in groups and discussing the same cases with the partner(s).
In what ways could the scoring process be improved?
•
•
•
•
•
•
•
•
•
•
perhaps an additonal short session prior to all day session, not required but available
we are still learning what assignments match what rubrics. Mismatches in this area cause confusion. Also it
is quite possible that some rubric categories will be interpreted one way for one assignment, and a different
way for another assignment.
I wasn't always clear what to list under "assumptions."
AC in the room.
I'm not sure I like the full day approach. Perhaps it could be broken down into 2 sessioins, maybe even
weekday evenings?
If possible, rubrics & assignments should be synchronized better/more effectively. (Of course, this
comment speaks more to scoring itself.) The process, while long, was fairly seamless & straight forward.
Refining some parts of the rubric (see specific suggestions on packet evaluations).
Exemplar work (examples) to use as a guide.
Probably to have the rubrics and the assignment instructions correlated.
Not all dimensions of the Inquiry rubric fit each assignment, and we questioned the applicability at all of
this rubric to the English assignment. Some assignments were difficult to evaluate without having prior
63
•
•
•
•
knowledge of course content. *Need more "linear" continuum for bins 1-4, as this is how the data will be
interpreted/presented.
Can instructors write assignment with rubric in mind? I'm not sure how universal the inquiry rubric truly is.
Clearer, fuller instructions from professor to student would help me assess whether the student met the
assignment. Sparse instructions invite vauge interpretations of whether the student successfully met the
requirements of the assignment. The would be especially helpful for me as I socred student work outside
my discipline.
An additional norming session with a different work product could be useful.
clearer/definitions/standards. Going through some classic examples for each level would be very helpful.
Any other comments or suggestions.
•
•
•
•
•
•
•
•
•
•
•
Well organized. A valuable site for faculty to share & learn & discuss from mulitple depts and disciplines!
Thank you. A worthwhile experience. I feel I learned a lot that I will be able to apply in my courses.
Thank you!
Request facilities to leave the AC during the work session!
Lunch was good. Everyone seemed delighted to participate. A well-manged process.
A set of guidelines for assignment to be used in this process could allow for a more uniform application of
the rubric and make the scoring more accurate and representative.
Sharing an overview of the process (different people using different rubrics with the same student work)
I will be interested in the inter-rater reliability of the rubric.
The end of the semester (especially April) is sooo busy that it might be better to hold this earlier in the
semester--either making use of work products from the fall semester, or earlier in the spring semester. #12
neutral just because of the time involved. #9 Neutral - yes & no. will depend on what data show and how
we modify the curriculum (if at all).
Thanks for the process. I learned something about my own assignments as a result of this project.
Great! #8 neutral - except music!
64
APPENDIX E CORRELATIONS BETWEEN RUBRIC DIMENSIONS
65
Table E1 Correlation between Dimensions—Spearman’s rho Rank Order Correlation Coefficients
WC1
WC2
WC3
WC1
1.00
.658**
.537**
n
116
116
116
WC2
.658**
1.00
.606**
n
116
116
116
WC3
.537**
.606**
1.00
n
116
116
116
WC4
.535**
.671**
.443**
n
116
116
116
WC5
.407**
.259**
.414**
n
116
116
116
IN1
-.569** -.250
-.627*
n
14
14
14
IN2
-.425*
-.111
-.111
n
27
27
27
IN3
-.326
-.033
-.150
n
27
27
27
IN4
-.090
.193
-.026
n
27
27
27
IN5
-.250
.105
-.026
n
27
27
27
IN6
-.294
.032
.071
n
14
14
14
CT1
.576*
.006
.172
n
19
19
19
CT2
.173
.163
.222
n
37
37
37
CT3
-.070
.215
.036
n
18
18
18
CT4
-.230
-.077
-.056
n
18
18
18
CT5
-.371
-.071
-.193
n
18
18
18
*Statistically significant at the .05 level
**Statistically significant at the .01 level
WC4
.535**
116
.671**
116
.443**
116
1.00
116
.193*
116
.115
14
-.428
27
.047
27
.330
27
.197
27
.510
14
.137
19
.037
37
-.024
18
-.293
18
-.253
18
WC5
.407**
116
.259**
116
.414**
116
.193*
116
1.00
116
.028
14
-.015
27
.235
27
.533**
27
.427*
27
.712**
14
.097
19
-.053
37
-.421
18
-.407
18
-.624**
18
IN1
-.569**
14
-.250
14
-.627*
14
.115
14
.028
14
1.00
14
.366
14
.446
14
.499
14
.717**
14
.451
14
IN2
-.425*
27
-.111
27
-.111
27
-.428
27
-.015
27
.366
14
1.00
58
.542**
58
.503**
58
.509**
58
.479**
45
0
0
0
0
0
0
0
0
0
0
IN3
-.326
27
-.033
27
-.150
27
.047
27
.235
27
.446
14
.542**
58
1.00
98
.775**
98
.639**
98
.493**
85
.529**
40
.157
40
.033
40
-.073
40
.182
40
66
IN4
-.090
27
.193
27
-.026
27
.330
27
.533**
27
.499
14
.503**
58
.775**
98
1.00
98
.753**
98
.521**
85
.506**
40
.190
40
.041
40
-.093
40
.254
40
IN5
-.250
27
.105
27
-.026
27
.197
27
.427*
27
.717**
14
.509**
58
.639**
98
.753**
98
1.00
98
.628**
85
.363*
40
.061
40
-.086
40
-.098
40
.165
40
IN6
-.294
14
.032
14
.071
14
.510
14
.712**
14
.451
14
.479**
45
.493**
85
.521**
85
.628**
85
1.00
85
.434**
40
.399*
40
.263
40
.135
40
.420**
40
CT1
.576*
19
.006
19
.172
19
.137
19
.097
19
CT2
.173
37
.163
37
.222
37
.037
37
-.053
37
CT3
-.070
18
.215
18
.036
18
-.024
18
-.421
18
CT4
-.230
18
-.077
18
-.056
18
-.293
18
-.407
18
0
0
0
0
0
.529**
40
.506**
40
.363*
40
.434**
40
1.00
145
.651**
125
.247*
101
.512**
126
.561**
109
0
.157
40
.190
40
.061
40
.399*
40
.651**
125
1.00
163
.692**
104
.668**
124
.618**
123
0
.033
40
.041
40
-.086
40
.263
40
.247*
101
.692**
104
1.00
119
.593**
119
.533**
107
0
-.073
40
-.093
40
-.098
40
.135
40
.512**
126
.668**
124
.593**
119
1.00
144
.634**
127
CT5
-.371
18
-.071
18
-.193
18
-.253
18
-.624**
18
0
0
.182
40
.254
40
.165
40
.420**
40
.561**
109
.618**
123
.533**
107
.634**
127
1.00
127
Table E2 Correlation between Dimensions—
Spearman’s rho Rank Order Correlation Coefficients
FK1
FK2
CT1
1.00
.933**
.471**
45
45
45
FK2
.933**
1.00
.455*
45
45
45
*Statistically significant at the .05 level
**Statistically significant at the .01 level
FK1
CT2
.372*
45
.340*
45
CT3
.398**
45
.347*
45
CT4
.484**
45
.474**
45
CT5
.336*
45
.323*
45
67
Download