Analysis of the 2006 CAAP Assessment of General Education: Critical Thinking Submitted by Lou Milanesi Executive Summary UW-Stout administered the ACT CAAP Critical Thinking test as a standardized nationally-validated measure of general education preparedness during the spring semester of 2006. A total of 488 test scores were collected (97.6% of the target sample of 500) from upper division students (97.6% juniors and seniors; 46.8% females, 53.2% males) whose instructors volunteered their classes for participation in the assessment. Raw data from ACT was analyzed using the national ranking of Stout participants (National Percent at or Below) to better understand the performance of our students relative to national norms. Strengths • Overall, UW-Stout student ranking scores were statistically equal to the national norm, whereas • Mean ranking scores for UW-Stout students that reported exerting at least “moderate effort” (74.2%) were significantly higher than the national norm (54.1 vs. 50.0) Opportunities for Improvement • Raise the target benchmark to above the national norm • Increase effort levels of UW-Stout students taking the test Action Plan for Improvement • Work to improve the context of the test administration such that students perceive a clear motivation to do well on the test. • Investigate raising the “stakes” of the testing from “low” to “moderate” Detailed Results and Analyses Synopsis of Results As Table 1 illustrates, 49.4% of Stout students scored at or above the national median score, and Figure 1 shows that the mean of Stout rankings was 49.7 (SD = 27.0); combined these demonstrate that the Stout sample is statistically equivalent to the national sample (normally distributed mean = median = 50.0). We currently find no indication that the sampling method used contributed to overestimating Stout student abilities; instead, given evidence described below regarding the impact of individual motivation on performance, we suggest that these data represent a conservative estimate of Stout student abilities when compared to the overall national data. We analyzed differential performance across subgroups defined by several demographic grouping variables available within the data captured by the CAAP instrument; and these analyses found no differences in performance on the central critical thinking measure when we segmented by gender, junior/senior class standing, or whether students enrolled at Stout as freshmen. There was a significant difference between mean performance between those who spoke English as their first language (mean = 50.5, SD = 20.7) and those who did not speak English as their first language (mean = 16.6, SD = 17.7), however there were only 11 individuals in the latter group. We were unable to perform statistical analyses based on ethnicity due to the homogeneity of the Stout population and the overlap with English as a second language among the few non-Caucasians. We next preformed multivariate analyses and found that both self-reported motivation and self-reported GPA were significantly and independently related to performance on the CAAP, with greater effort and higher GPA associated with higher levels of performance on the critical thinking test. Regarding effort, Figure 2 indicates how much effort Stout participants reported investing in taking the test, and Figure 3 shows the relationship between effort and performance. Thus, on the average, those who invested at least moderate effort performed at or above the mean of the national sample; whereas on the average those who invested little or no effort (or declined to reveal their effort) fell consistently short of the national average. Similarly regarding GPA, Figure 4 indicates participants’ reported GPA levels, and Figure 5 displays the direct linear relationship between GPA and performance on the CAAP test. Finally Figure 6 presents the combined effects of effort and GPA on performance and relative standing to the national mean (50.0) respectively. Conclusions Strengths: Simply stated, we found consistent support for asserting that, on the whole, UW-Stout graduates are on par with their national peers regarding their critical thinking abilities. Performance on the CAAP Critical Thinking test triangulated well with other self-reported measures including GPA and individual effort invested in taking the test. Moreover, these data replicate earlier results obtained using the same instrument in 2004 that, on the average, placed UW-Stout graduates’ critical thinking performance at or slightly above the national average. Opportunities for Improvement: While this analysis found that, as a group, UW-Stout graduates are equal to the average critical thinking abilities of their peers across the nation, it also suggests that there is ample room for improvement. Future benchmarking could include raising the average performance of Stout graduates to an above average target. This goal would be supported by simultaneously working to reduce the variability observed regarding performance of individual students, such that a greater percentage of individuals score at or above the national average. As described above we found clear evidence linking individual motivation to performance; and curiously, we found evidence that this effect seems to be mostly attributable to the males in the test sample. Drilling down by gender and controlling for the influence of GPA, we found that for females self-reported motivation had a near zero influence in predicting performance on the test (0.2%); whereas under the same control conditions, we found that for males self-reported motivation predicted 13.5% of test performance. This gender effect is particularly troubling in that the males in the sample also reported significantly lower GPA achievements than their female counterparts. Therefore, since both genders are exposed to the same instruction, we conclude that meaningful opportunities for improvement need to extend beyond (e.g. recruiting, engaging, advising, etc.), but by no means exclude, curriculum revision. Performance the ACT CAAP Critical Thinking test should be viewed holistically with other general education data (such as the 2005 ACT writing assessment) to inform an overarching strategic action plan for general education improvement; a plan that where appropriate includes curriculum revision and extracurricular processes that are preceded by systematic improvements to assessment practices. This analysis indicates that, while UW-Stout students’ current performance on the CAAP is equal to the national norm, their overall true ability is likely above the national norm but not seen due to inconsistencies in the effort invested by students taking the test. “Low Stakes” versus “High Stakes testing”: Currently, UW-Stout employs a very low stakes approach to general education testing; one where individual instructors “volunteer” students in the sections of courses they teach as participants, and testing is conducted solely within those classes. Students taking the CAAP fully understand that they will experience no consequences for poor performance, whereas only a few appreciate the potential advantages of doing well on the test. In contrast, a high stakes approach to general education testing mandates performance benchmarks on an identified test and imposes significant consequences on those that fall short of meeting them (denial of admission, failure to advance to the next higher level of progress and/or failure to graduate). Our inquiry to ACT revealed that a few institutions have already moved to high stakes models. For example, all six state institutions in South Dakota test Juniors and provides intervention if necessary. San Jose State University and Fresno State University also use ACT CAAP and both require a passing grade on the ACT CAAP Writing test before students are allowed into an upper-level writing class which is required for graduation. Our contact at ACT was quick to point out that ACT does not provide a set cut score; however, in his experience most institutions set their cut scores at one standard deviation below the national mean. Mandatory Assessment Days as a “Medium Stakes” testing alternative: Several institutions have mandated “Assessment Days” wherein a variety of institutional data are collected. Participation is mandatory, and students are individually assigned to a specific assessment activity for the day. Through this assignment/sampling approach, institutional assessment is simultaneously conducted at a variety of levels (program, college/school, university) and participation is monitored and enforced. The “stakes” for the student are contractually defined by the individual institution as a condition of acceptance/enrollment to the school. Table 1. Examples of Assessment Day Models Winona State University of Southern Indiana James Madison University Eastern New Mexico University Geneva College http://www.winona.edu/AIR/info/info.htm http://www.usi.edu/depart/instires/AssessmentDay.asp http://www.jmu.edu/assessment/JMUAssess/Aday_Overview.htm http://www.enmu.edu/academics/excellence/assessment/students/day/index.shtml http://www.geneva.edu/object/assess_day01.html In my opinion, this approach could work well at UW-Stout if the following were accomplished. 1. Establish a highly integrated and aligned central model of assessment, one that a. Identifies all mandated and ongoing needs for information/data for both external and internal stakeholders b. Simplifies and better coordinates existing assessment practices c. Defines a scope of assessment activities broad enough to engage a significant number of students in the Assessment Day activities 2. The model developed above is used to determine the optimal timing of data collection. 3. It is adequately supported as a shared governance process. Suggested immediate improvements to the current approach to testing: While the approaches to testing describe above will need to be studied and discussed to be deployed by UW-Stout, some immediate improvements can be made to existing assessment practices to better engage students in the testing activity. For example, the instructors that volunteer their students need to be better engaged in the testing activity. Historically, some instructors have been quite enthusiastic about the assessment activities and openly endorsed them to their students prior to testing. However, other instructors described the assessment activity as something that “had to be done” to satisfy others. These differences in the attitudes of instructors should clearly impact student effort that is directly linked to performance on the test. Instructors could also be used to build individual incentive to do well on the CAAP assessment by calling attention to the recognition certificates provided by ACT for above average performance and how these could be used to enhance student employment portfolios. Currently, we call attention to the chance to win a prize that is randomly drawn from all participants. This approach has two basic shortcomings. First, as in any raffle, most individuals will not expect to win, and more importantly, being eligible to win is not contingent upon test performance. Since the testing protocol described by ACT requires reading a standard script that clearly lacks any inspiration to the students, I suggest we provide instructors (or spokespersons) a presentation that emphasizes the personal benefits of doing well on the test. Appendix Table 1 Distribution of National Percent at or Below Ranking Stout Participants in the 2006 ACT CAAP Critical Thinking Assessment Valid Cumulative Percent .2 0 Frequency 1 Percent .2 Valid Percent .2 1 2 .4 .4 .6 2 4 .8 .8 1.4 3 8 1.6 1.6 3.1 6 6 1.2 1.2 4.3 8 11 2.3 2.3 6.6 11 12 2.5 2.5 9.0 15 23 4.7 4.7 13.7 19 17 3.5 3.5 17.2 24 18 3.7 3.7 20.9 29 36 7.4 7.4 28.3 34 56 11.5 11.5 39.8 40 24 4.9 4.9 44.7 45 29 5.9 5.9 50.6 50 33 6.8 6.8 57.4 61 27 5.5 5.5 62.9 66 38 7.8 7.8 70.7 72 55 11.3 11.3 82.0 79 18 3.7 3.7 85.7 85 15 3.1 3.1 88.7 90 20 4.1 4.1 92.8 94 14 2.9 2.9 95.7 98 13 2.7 2.7 98.4 99 8 1.6 1.6 100.0 488 100.0 100.0 Total Figure 1 Distribution of National Placement; Percent at or Below Individuals Score for All UW-Stout Students Tested Distribution of National Level Placement; Percent at or Below Individual's Score 60 50 Frequency 40 30 20 10 Mean = 49.69 Std. Dev. = 26.992 N = 488 0 0 20 40 60 80 100 National Percent at or Below Figure 1A Distribution of National Placement; Percent at or Below Individuals Score for Those Reporting at Least “Moderate” Effort 50 Frequency 40 30 20 10 Mean =54.07 Std. Dev. =25.597 N =362 0 0 20 40 60 National Percent at or Below 80 100 __ Figure 2 Self-reported Levels of Motivation within the Stout CAAP Participants Motivational Responses No Report of Effort Tried my best Gave moderate effort Gave little effort Gave no effort 1.6 8 16.2 45.5 28.7 Figure 3 Mean CAAP Performance across Self-reported Levels of Motivation 100 Mean National Percent at or Below 90 80 70 60 50 40 30 58.6 51.2 20 42.3 32.3 10 9 0 Did not report effort Gave no effort Gave little effort Gave moderate effort Tried my best Self-reported Motivational Level Table 2 Statistically Significant CAPP Performance Differences across Self-reported Levels of Motivation Effort Level Groups NO effort LITTLE effort Statistically Lower than effort level Statistically Equal to effort level Statistically Higher than effort level ALL OTHER GROUPS NO Did not report MODERATE & BEST NO & LITTLE Did not report & BEST MODERATE MODERATE effort BEST effort Did not report effort BEST LITTLE & MODERATE NO, LITTLE & Did not report NO Figure 4 Self-reported GPA within the Stout CAAP Participants Cummulative GPA Category % 2.00 to 2.50 2.51 to 3.00 3.01 to 3.50 3.51 or above 8.8 20.6 29 41.5 Figure 5 Mean CAAP Performance across Self-reported Levels of GPA Mean National Percent at or Below 100 80 60 40 62.7 52.5 41 20 33.1 0 2.00 to 2.50 2.51 to 3.00 3.01 to 3.50 Cummulative GPA Category 3.51 or above Table 3 Statistically Significant CAPP Performance Differences across Self-reported GPA Levels GPA Level Groups 2.00 to 2.50 2.51 to 3.00 3.01 to 3.50 3.51 or above Statistically Lower than GPA level 3.01 through 3.51 or above 3.01 through 3.51 or above 3.51 or above Statistically Equal to GPA level 2.51 to 3.00 Statistically Higher than GPA level 2.51 to 3.00 Figure 6 Mean CAAP Performance across Self-reported Levels of Effort Clustered by Self-reported Levels of GPA 2.00 through 3.00 2.00 through 3.50