View Online RESEARCH | Chemistry Education Research and Practice Motivations for and barriers to the implementation of diagnostic assessment practices – a case study Monica Turner, Katie VanderHeide and Herb Fynewever* Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F Received 30th October 2010, Accepted 25th February 2011 DOI: 10.1039/C1RP90019F Given the importance of diagnostic assessment as a well-substantiated pedagogical strategy for use at various educational levels from kindergarten to the undergraduate level, we must consider its lack of implementation in the classroom. The implementation gap is especially severe at the tertiary level, with chemistry and other STEM (science, technology, engineering, and math) instruction being no exception. Yet, some tertiary instructors perform diagnostic assessment. How does this happen? What motivates, enables, and sustains these instructors in their implementation of diagnostic assessment? In this study we collect data on the practices of two tertiary chemistry instructors, including classroom observation, artifact collection, and interview. Analysis shows that these instructors employ several techniques that are consistent with the paradigm of researchbased diagnostic assessment. Perceived motives behind the use of these techniques, as well as perceived barriers to further implementation of diagnostic techniques are reported and discussed. Keywords: diagnostic assessment, undergraduate, case study Introduction What is diagnostic assessment? Because interpretations can vary, we will briefly define what we mean by diagnostic assessment. Black and Wiliam (1998a) used the term, formative assessment, which they defined as “all those activities undertaken by instructors and their students [that] provide information to be used as feedback to modify the teaching and learning activities in which they are engaged” (p. 7). Harlen (1997) referred to diagnostic assessment as the “gathering and use of information about students’ ongoing learning by both instructors and students to modify teaching and learning activities.” From these definitions, and other diagnostic assessment literature, key elements of diagnostic assessment may be identified (Stiggins, 1992; Black and Wiliam, 1998a; Steadman, 1998; Shepard, 2000; Wiliam et al., 2004; Marshall, 2005): • Target: agreement by both instructors and students on learning goals and criteria for achievement; • Measurement: collection of data revealing level of student understanding and progress towards learning goals; • Feedback: provision of effective feedback to students (and to instructors); • Adjustment: adjustment of teaching and learning strategies to respond to identified learning needs and strengths. Diagnostic assessment serves a formative purpose, and in this way is in contrast with summative assessment. Diagnostic assessment takes place while student learning is still happening – for example students make their thinking Calvin College, Grand Rapids, MI,USA e-mail: 142 | Chem. Educ. Res. Pract., 2011, 12, 142–157 visible to the instructor and the instructor gives feedback to the students while the learning is still taking place. This timing allows for diagnostic assessments to be used to steer students and teachers’ actions to enhance student learning and teaching effectiveness. Summative assessment takes place after the learning is finished – for example when an exam at the end of a course measures what learning already happened. With summative assessments it is too late to steer student or instructor action to accomplish any formative purpose. Rather, summative assessments are primarily used to find out what learning has happened and to assign grades or meet certain accountability demands of an external body. From the literature it is clear that diagnostic assessment is effective. The field is mature enough that there have been a number of review articles and meta-analyses of empirical studies. Several of these works quantified the impact of formative assessment techniques by calculating the effect size ranging from 0.25 to 0.95 (Kluger and DeNisi, 1996; Black and Wiliam, 1998a; Nyquist, 2003; Hattie and Timperley, 2007; Shute, 2008). In particular, Black and Wiliam’s review (1998a) spans over 250 individual studies to give a comprehensive description of the evidence of the effectiveness of diagnostic assessment. Their review found that a common thread in many successful educational innovations was the aim to strengthen the frequent feedback given to students. The article provides concrete examples of controlled experiments that show how timely feedback to students leads to substantial learning gains. Several studies reviewed provide evidence that formative assessment is directly linked to learning gains and that the gains are, in fact, “significant and often substantial” (p. 3) with typical effect sizes of 0.4 to 0.7. A more recent book by Hattie takes on the mammoth task of synthesizing the educational research behind over 800 meta-analyses of some 50,000 studies representing millions This journal is © The Royal Society of Chemistry 2011 Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F View Online of students involved in 138 distinct interventions (Hattie, 2009). Consistent with the reports above, Hattie’s work determines that interventions that hinge on feedback to students have an effect size of 0.73 and those that focus on formative evaluation to teachers have an effect size of 0.90. These effect sizes are greater than those of concept mapping (0.57), inquiry-based teaching (0.31), class size (0.21), problem-based learning (0.15), or teacher content knowledge (0.09). Hattie contended that most of the effective teaching and learning methods are “based on heavy dollops of feedback” (p. 173). Further, some of the largest effect sizes were found when teachers were required to use evidence of student learning to decide their next steps (p. 181). In this paper we concern ourselves primarily with chemical education and with the tertiary (i.e. college and university) level of education. Given these two specialization constraints, there is much less literature concerning the effectiveness of diagnostic assessment. Nevertheless, there is still strong evidence that diagnostic assessment is being done and having a positive impact on student learning in college level chemistry. A commentary by Brooks et al. (2005) pointed out that many of the successful teaching techniques in tertiary chemistry rely upon frequent performance-related feedback to students (a component of diagnostic assessment). Notably, Wilson and Scalise (2006) have successfully adapted for college level chemistry a diagnostic assessment system for which empirical evidence shows significant learning gains over control treatments (Wilson and Sloane, 2000). Although there is certainly a need for further study, the evidence to date suggests that diagnostic assessment is effective in college-level chemistry instruction. There is a problem. While we can be confident that diagnostic assessment is effective, research also suggests that there is an implementation gap. As is often the case in education, just because a technique is proven to be effective, we cannot assume that practitioners will adopt it. Myles Boylan, a program officer for the National Science Foundation (of the United States) has said “In almost every discipline, I could point to a variety of really effective … instructional practices, and say that if we could magically click our fingers and get everybody using them, there would be a huge improvement in undergraduate education that would happen instantaneously. But we’re nowhere near that” (quoted in Brainard, 2007). For diagnostic assessment in particular, inventories of teachers’ beliefs show that most instructors believe assessments can push students to learn, yet inventories of teachers’ practices show that most teachers limit their assessment techniques largely to marking papers for the purpose of arriving at grades (Hattie, 2009). According to the National Research Council (of the United States) “In many classrooms opportunities for feedback appear to occur relatively infrequently. Most teacher feedback – grades on tests, papers, worksheets, homework ... represents summative assessments that are intended to measure the results of learning. After receiving grades, This journal is © The Royal Society of Chemistry 2011 students typically move on to a new topic and work for another set of grades. (But) feedback is most valuable when students have the opportunity to use it to revise their thinking as they are working.” (Bransford et al., 2000, pp. 140-141). While the literature provides evidence for the existence of an implementation gap, there is only limited information on exactly how large this gap is. If a recent survey of college-level physics instructors is any indication, many college instructors are not even aware of the concept of diagnostic assessment, and truly few are implementing it in a way that is true to best practices (Henderson and Dancy, 2007). This survey included a sample of 722 physics faculty to measure their awareness of and use of reformed teaching practices (in general – not necessarily diagnostic assessment techniques). This study showed that physics faculty have a relatively high awareness of research-based instructional techniques. In fact, 63.5% of the faculty surveyed stated they were familiar with Peer Instruction (Mazur, 1997), a diagnostic assessment technique implemented through conceptual questions (often with ‘clickers’) and pair-wise peer discussions. Awareness does not necessarily indicate use, however, and only 29.2% of those surveyed currently use Peer Instruction. Furthermore, only 16.9% of those who do use Peer Instruction reported that they use the technique “as described by the developer” while 35.9% made “minor modifications” and 41.0% made “significant modifications.” Additionally, 32.3% discontinued using Peer Instruction after only one semester. And finally, we are not aware of any survey or other empirical study that quantifies the rate of implementation of diagnostic assessment in chemistry at the college level. It is telling, however, that at the 2008 Biennial Conference on Chemical Education (BCCE, 2008) many sessions, posters, and papers advocated teaching informed by assessment, but very few framed these methods within an overarching paradigm of diagnostic assessment. Examination of the presentations made at two sessions, both called “Assessing to Generate Learning Centered Undergraduate Chemistry,” reveals the focus of nearly every talk was either summative or program assessment (Ziebarth et al., 2009). This suggests that in the chemistry community at the tertiary level, the word ‘assessment’ generally has two meanings. The first is summative assessments, such as have already been discussed. The second is program assessment, such as a department might do at the urging of the administration in order to improve overall programs. Upon considering the implementation gap in the tertiary level, one must remember that most tertiary instructors have no pedagogical training, and most tertiary instructors have a high degree of autonomy with little if any requirement for ongoing professional development. Both of these differences could contribute to a lack of diagnostic assessment in the tertiary context. Yet, diagnostic assessment does happen with some tertiary instructors. How does this happen? What motivates, enables, and sustains these instructors in their doing of diagnostic assessment? And to the extent that diagnostic assessment is limited at the Chem. Educ. Res. Pract., 2011, 12, 142–157 | 143 View Online tertiary level, what are the barriers perceived by instructors preventing further implementation? These questions form the basis of our research. The purpose, then, of this research project was to explore the views of tertiary chemical educators regarding diagnostic assessment in order to gain a better understanding of why diagnostic assessment is often poorly implemented at the tertiary level despite the strong evidence for its effectiveness. Thus, the investigation was guided by the following question: Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F Research question: What do professors with a reputation for good teaching do that is consistent with diagnostic assessment, why do they do it, and what barriers to diagnostic assessment do they face? Theorectical framework To determine our theoretical framework, we will begin with the problem: Why is there an implementation gap for tertiary instructors’ use of diagnostic assessment? As noted previously, this question intentionally focuses on instructors’ motives and perceptions, because implementation at the tertiary level is ultimately sustained only with substantial ‘buy-in’ from the faculty (Henderson, 2010). It follows that we choose a theoretical framework that uses a second-order perspective, i.e, we will describe the phenomenon of diagnostic assessment as it is perceived by the research subjects, even if the researchers may not agree (or even feel comfortable) with these descriptions. For this work, we will employ the phenomenographic framework (Marton, 1981, Marton and Booth, 1997). Phenomenography is a relational (or non-dualist), qualitative, second-order perspective that aims to describe the key aspects of variation of the experience of a phenomenon (Trigwell and Prosser, 2003). In other words, through phenomenography, researchers are able to apprehend through their subjects’ diverse perspectives the complex, relational, lived experience of the phenomenon for those involved. Each characteristic of phenomenography is appropriate for our research question: Relational (no n-dualist) Our research assumes that the instructors are not separate entities from the phenomenon of diagnostic assessment. Rather, meaning is constituted in the relationship between individual instructors and the phenomenon, which in this instance includes instructorstudent dynamics as well. Qualitative Our research is methodologically qualitative. As described in a later section, we rely primarily on the qualitative analysis of interview data. Second-order In a first-order approach, the researchers describe the phenomenon as they perceive it. We primarily use a second-order approach because instructor implementation of diagnostic assessment relies on the instructor’s perceptions and motivations. However, we recognize, as is true of all qualitative research, that these 144 | Chem. Educ. Res. Pract., 2011, 12, 142–157 perceptions will nevertheless be reported through our interpretation of them and will take precautions to bracket our interpretations as much as possible as discussed below. Phenomenography is a research tradition used especially in the context of educational research and health care research; it is often used to examine variation of experience of a certain phenomenon by students and teachers (for a review article, see Orgil (2007)). As a research tradition, phenomenography has been successfully used by others studying university faculty conceptions of teaching (Kember, 1997; Trigwell and Prosser, 2003). The categories of description that result from phenomenography can be used as the basis for quantitative instruments, such as the Approaches to Teaching Inventory (Trigwell and Prosser, 1993). We anticipate that our research will lead to future research that aims to develop an instrument for approaches to assessment. Such an instrument could be used with research into the correlational relationships between phenomena (e.g., how student approaches correlate with teacher approaches) and studies on the impact of professional development (i.e., instruments can be used preand post- professional development to measure change). Sample The two chemistry professors observed in this study are part of a larger, ongoing study ranging across STEM disciplines. Because we wanted to see evidence of diagnostic assessment, it was important to have professors who were likely to use it often. For this reason, we chose subjects with a reputation for quality teaching. Further, of necessity we chose subjects who were teaching during times that the observer (HF) happened to be free. These selection criteria led us to recruit our two subjects, one who taught general chemistry courses, and one who co-taught an introductory chemistry/materials science course for engineers. Names of professors are pseudonyms to preserve their anonymity. Both subjects taught at the same small, four-year liberal arts college located in the mid-western United States. The college has an average of 22 students in each class. Professor Peterson has over 35 years of teaching experience. He entered academia after getting his PhD and is currently in his third teaching position. He is a past recipient of the annual college-wide faculty teaching award, the highest teacher honor given by the college. He usually teaches general (i.e. first year) chemistry and second year organic chemistry. The class observed for this study was one of two sections of a second semester general chemistry course he taught that semester. There were 55 students in this particular class. Professor Evans teaches both chemistry and engineering, with nearly 10 years of experience teaching introductory and upper level chemistry courses, as well as introductory engineering courses. This is his second teaching position after having received his PhD. During this investigation, he co-taught an introductory level chemistry/materials science course for engineers with an engineering professor. There were 37 students in this particular class. This journal is © The Royal Society of Chemistry 2011 View Online Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F Table 1 Interview protocol examples: four elements of diagnostic assessment Target: The instructor communicates Interview prompt examples learning goals and criteria for achievement How do students become aware of student learning goals for the class? – How do you communicate your expectations? Are there other ways that students learn about them? Measurement: The instructor collects How do you know if your students are ‘tracking’ with you and the rest of the class? What evidence do data revealing level of student progress you have that the students are learning? What information do you have on where they are not meeting learning goals? How do you know if this information is reliable? Feedback: The instructor provides How do students learn about their progress in the class? What happens inside the classroom to make feedback and receives feedback them aware? What happens outside of class time to make them aware? If they know they are lagging, how can they figure out the ‘next steps’? What role do you think that feedback plays in your teaching and learning? - How do you receive feedback from students? How do you use feedback from students? How do students receive feedback? Adjustment: The instructor makes How do you make decisions about teaching strategies? How do you decide that works and what doesn’t work? Think of the most recent homework assignment that you gave. Would you give it again next changes in teaching choices. The student year? Why or why not? Think of the most recent class period. Would you make any changes before next makes changes in learning strategies. year? What would you change and why? Why do you think students respond to feedback the way they do? What do you do to help students make changes in how they go about learning? What do they do? How would you explain the relative success (or lack of success) of your approaches? Data collection Field observations Over the course of one semester, one of the authors (HF) observed one class taught by each participating instructor once every two weeks. During each observation, he used a semi-standardized field protocol to take detailed notes of classroom events and dynamics; the disposition and activities of students, and instructor methods, especially as related to diagnostic assessment. The protocol identified salient process dynamics, techniques, and barriers. These observations were linked conceptually to the interview protocols to permit maximum triangulation of data and methods. The observations were non-obtrusive to the extent that the researcher kept quiet throughout. The observer took contemporaneous field notes as well as postobservation notes to highlight what to ask subjects about during the interviews. We note that each course had as a co-requisite a onceweekly laboratory section. Although we expect that the lab component of the course likely did contribute to the diagnostic assessment experienced by the students, we chose to focus exclusively on the lecture portion of the courses in this study. Semi-standardized in terviews The researcher conducted approximately 30-60 minute-long semi-structured interviews with each professor as soon after the observation as possible. The goal of the interviews was to discuss the teaching practices observed and to try and get at the professors’ ideas about diagnostic assessment based on their actions in the classroom and data collected thus far (Table 1). The interviews played a key role in this investigation in providing an accurate picture of the professors’ opinions that cannot be gathered from observation alone, and also to assist in the authors’ interpretation of the observations made. During the interviews, the interviewer did not tell the professors what type of teaching practices in particular he was looking for, nor did he volunteer suggestions for changing teaching in the future. This journal is © The Royal Society of Chemistry 2011 For this style of research, the interview unfolds as a dynamic iterative process with the interviewer presenting questions in a non-linear fashion in response to the interview subject’s responses and reasoning. This “conversation with a purpose” (Orgil, 2007) continues until the interviewer feels confident that he or she has not only gained the information desired, but feels confident in his or her interpretation of the subject’s responses and reasoning. The depth of the interview derives in large part from the probes used to capture the details and nuances of the subject’s knowledge and perceptions. As part of the reflective process, the interviewer continues to probe for ideas until the interviewee has nothing more to add. Sample interview questions and accompanying probes are provided as illustrations in Table 1. We note that the interview questions are constructed to make tight connections with research question and elements of diagnostic assessment (i.e. target, measurement, feedback, and adjustment). Collection of artifacts Researchers collected and analyzed artifacts related to classroom instruction, with special emphasis focused on materials used during periods of observation, including handouts, syllabi, student work, graded student work, etc. A sampling was taken of all documents passed from the instructor to the students and vice versa. Special attention was given to any examples of feedback, such as, through goal setting or grading comments. We noted opportunities missed for feedback as well. Review of these artifacts informed the interviews as previously noted. Data analysis The analysis of the data needed to accurately interpret the meanings behind each professor’s view of diagnostic assessment, as seen in their interviews. Thus, three researchers analyzed the interview transcripts in such a way as to maintain reliability and validity. We used the computer program called HyperRESEARCH to code each interview. In order to be faithful to the ideas presented by Chem. Educ. Res. Pract., 2011, 12, 142–157 | 145 Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F View Online Fig. 1 The ten parts of diagnostic assessment. Each object is scaled relative to other objects of the same shape with respect to the frequency with which it was mentioned by the professors. the subjects, we chose to develop codes based on patterns in the data rather than use a list of pre-set codes. There was an exception to this, however, in that the researchers had already studied the literature and agreed on a definition of diagnostic assessment and the four primary elements of diagnostic assessment (given in the introduction). These elements served as a framework while coding, and to this extent, our codes were not purely emergent from the data. To begin the analysis of the transcribed interviews, we individually read through four randomly chosen interviews to understand the larger meaning behind each interview (Creswell, 2003, p. 183). This allowed us to gain a general sense of each professor and his possible conceptions of diagnostic assessment. We individually coded the interviews, and then collaboratively coded them, discussing each code until consensus was reached. During this process, we also became normalized in our coding procedure. After coding this sample of interviews, we developed an initial set of codes and code descriptions. This iterative process involved reading through the interviews several times and involved the consolidation of each researcher’s codes to develop specific codes to match patterns we began to notice. We then went back to the data and made sure that the codes still fit where they were initially applied. At this point we coded the rest of the interviews, with two authors coding each interview individually, and then collaboratively discussing each code as before, until consensus was reached. As an additional check on validity, participants were provided with a near-final draft of this manuscript, to give them an opportunity to comment on their perception of the accuracy of the findings. Limitations As is to be expected with a case study methodology, our main limitations stem from our very small sample size. We present this work with the caveat that attempting to 146 | Chem. Educ. Res. Pract., 2011, 12, 142–157 generalize to other instructors of chemistry is not appropriate. We do hope, however, that this work provides a depth of understanding of the two cases and is illustrative of what diagnostic assessment can look like. Results and discussion All of the codes generated for these cases can be assigned to one of the ten axial categories represented in the composite concept map (Fig. 1). The four main elements of the diagnostic assessment cycle are seen on the squares placed along the main cycle. The cycle created by these four elements is hindered by the three types of barriers listed on the octagons in the center of the circle. Other aspects of teaching that drive or detract from the diagnostic assessment cycle are listed on arrows placed around the main cycle. We will define each of these categories below, illustrating with examples and discussing the role each concept plays in our overall model of diagnostic assessment and the relative prominence of each category in our case data. To be clear: the four main elements of the diagnostic assessment cycle were taken from our analysis of the literature as a priori axial categories. The remaining axial categories and all of the codes that fit within all categories were emergent from the data and serve as our primary research results. As we will see in the data, it makes sense to represent the central circle as a cycle because each element therein supports and leads naturally into the other elements on the circle. Briefly, an instructor using the cycle will make it clear to the students what that learning goals are (target). This sets the stage for what should be measured in terms of how the students are progressing towards the goal (measurement). Once the measurement is complete, the results of that measurement serve as feedback both to the students and the instructor (feedback). Acting upon that feedback will involve making adjustments to both what the This journal is © The Royal Society of Chemistry 2011 View Online teacher does and what the students do (adjustment). The adjustments should then be done in a way that reinforces what the original target was, which brings the course back to the target. This cycle can continue iteratively until desired student mastery is reached on a given topic, and can be used repetitively as new content is introduced. Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F Four key elements of diagnostic assessment cycle Table 2 shows the ten most commonly mentioned codes, ranked from most frequent at top, relating to the diagnostic assessment cycle and where they fit into each of these categories. Fig. 2 shows the relative frequency for how often the codes within each category were used. In the following section each of these four elements of diagnostic assessment will be explained in further detail. Table 2 Ten most commonly used diagnostic assessment techniques Rank Technique 1 Asks questions 2 Adjusts teaching based on measurements Provides feedback students can act on to 3 move forward Diagnostic assessment geared toward the 4 majority of the class 5 Provides opportunity for peer-assessment Provides written feedback beyond a 6 grade 7 8 9 Target The target element of the diagnostic assessment cycle consists of the agreement by both professors and students on learning goals and criteria for achievement. Consistent with the literature (Ludwig et al., 2011), our subjects made assessment and content expectations clear in the syllabus, which they handed out at the beginning of the semester, but they infrequently revisited target expectations throughout the semester. When they did revisit target expectations, it was often to communicate process expectations with regard to how students do the homework and assessment expectations explaining what tests will be like. An example of how the professors used syllabi to make assessment expectations known to students is seen in a quote from Professor Evans: Professor Evans: “In the syllabus, we promise the students that when we make a test, 1/3 of it will be problems from the homework, 1/3 of it will be problems from previous tests which they have all access to, and 1/3 will be problems they haven’t seen before, new problems.” The professors also used verbal feedback on grading choices to make process expectations clear, for example, by describing how students should show their work when writing solutions for a quiz or exam. Professor Peterson had a situation where several students fortuitously got nearly the right answer for a quiz question even though they followed the wrong technique. As a result, he verbally explained to the class why he took points off from several papers that had the correct answer if they did not show their work. Professor Peterson: “I’m trying to make the point that I really don’t care too much, in general, about answers. I care everything about how we get to the answer. I’ll make that point a number of times. I’ve already made it before in class. Because that’s something a lot of them don’t see at all. They just think an answer, that’s all we’re really after. And that’s, of course, not true.” While both of the professors did discuss making their target expectations clear to students, target expectations was the element of the diagnostic assessment cycle that was least frequently mentioned by the interviewed professors (9% of coded statements regarding the four elements of diagnostic assessment). This journal is © The Royal Society of Chemistry 2011 10 Engages in dialogue with students Communicates process expectations to students Provides verbal feedback to the whole class Differentiates based on student needs Key Element measurement adjusts teaching feedback measurement, adjusts teaching feedback feedback measurement, feedback target feedback adjusts teaching, measurement Fig. 2 Frequency with which each category of diagnostic assessment was coded in the interview data. These are axial categories, each with many codes. Measurement The measurement element of the diagnostic assessment cycle occurs when the professor provides an opportunity for students to make their thinking visible. This allows the professor to measure the students’ level of understanding with respect to the agreed upon learning targets. Measurement can occur in a variety of ways, both formal and informal (Brooks et al., 2005). Formal methods of measurement used by our subjects included requesting student evaluation on teaching, homework, quizzes, test, and classroom response systems (or ‘clickers’). Professor Evans was very deliberate in his use of clickers as a measurement tool to gauge student understanding: Professor Evans: “I have to work to get the students to give me stuff so that I know where they're at. But it is important, and more and more I've been trying to use clickers, which I love, as an instantaneous way to assess where a class is at: building much more time for questions and going through problems together, being much more interactive as we go through problems, finding out where they're getting stuck, what they're interested in, what they disagree with.” Chem. Educ. Res. Pract., 2011, 12, 142–157 | 147 Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F View Online The professors also often used more informal means of measuring student understanding, such as, listening to student comments in class and office hours, eavesdropping on students in class discussions of the content, and noting student non-verbal communication. Professor Peterson often used student non-verbal communication as his means of gauging student understanding: Professor Peterson: “You can just tell when you're just blowing everybody out, even your best students, it shows up. So obviously, feedback may not even have to have words with it. Just their facial expressions can tell me that it's not going where I want it to go.” Of all the elements of diagnostic assessment described by the professors interviewed, measurement was the most frequent focus (39% of coded statements regarding the four elements of diagnostic assessment). Feedback The feedback element of the diagnostic assessment cycle occurs when the results of a measurement are used to inform an individual of their level of performance. This can occur in two distinct ways. The first way feedback occurs is when the professor provides effective feedback to students informing them of their level of achievement in relation to the agreed upon target, and provides them with enough information so that they know what the next step is in the learning process (Hattie and Timperley, 2007). In our data, feedback to students sometimes occurred verbally and sometimes in written form. Verbal feedback was sometimes given to individuals and sometimes to the class as a whole. Written feedback usually consisted of comments written on submitted written work such as homework or tests. The second way feedback can occur is when students give the professor feedback on his or her teaching (Fuchs and Fuchs, 1986). In our data, this occurred sometimes verbally through comments in class or in office hours or through a written evaluation distributed to students informally by the professor or formally by the college. We note, however, that college administered evaluations were optional, except at the end of a semester, which is too late to be formative, except for subsequent semesters. Professor Peterson would regularly use summative assessments, such as quizzes and midterm exams in a formative way. For example, when handing back written assessments, he would reserve a significant portion of class time to discuss the results in the hope of helping the students learn from their mistakes. Professor Peterson: “I know when you go through tests and they didn't so well on this problem or that problem, I hope when we take the time to go through that test and say, okay, most of the class missed this question, now why did we miss this question, that they're tuned in and they wouldn't repeat that mistake if they had it again down the pike.” Professor Evans also made a concerted effort to give feedback to his students, but his focus was more often on giving feedback to students as they worked through problems (e.g. in class, laboratory, and office hours) rather 148 | Chem. Educ. Res. Pract., 2011, 12, 142–157 than feedback after an assignment had been turned in. Professor Evans: “Usually I try to be very conscious about the feedback to the students, the timeliness of it especially. Because I've come to realize that for the bulk of students, who can't fully see a problem from its beginning to its end all on their own, and be confident that they're done, they really need, ideally, moment to moment feedback: ‘am I on the right track here?’ ‘what's going on?’, ‘do I have the right answers?’; because a lot of students will spin their wheels. They'll work hard even, but they won't be moving in the right direction at all. And they won't figure that out for a long time.” Feedback was the second largest focus of the professors we interviewed with regards to diagnostic assessment (33% of coded statements regarding the four elements of diagnostic assessment), being mentioned just slightly less often than measurement. Adjusts te aching Adjusting teaching based on data is the fourth element of the diagnostic assessment cycle. In general, adjusting teaching occurs when a professor has taken a measurement of student understanding and then adjusts his or her teaching strategies in order to accommodate the identified learning needs and strengths of the students in the class (Steadman, 1998). Some examples from our data of this occurring in the classroom were when a professor altered the teaching and learning activities or the topics addressed in a class period or spent more or less time than originally planned on an activity or topic based on the students’ needs. Adjusting teaching took place immediately or with a delay. Immediate adjustment of teaching took place when a professor took a measurement and reacted to it in the same class period. Professor Peterson gives an example of this in the following quote: Professor Peterson: “I will definitely - I often, well, not often, but if I’m doing a difficult concept I can just tell right from their faces they’re not getting it. I’d probably tell them. I’d just stop and say, ‘Okay, guys, I know you’re not getting it. I can see it on most of your faces, so let’s get this thing resolved. Let’s start asking questions. What’s not sinking in?’[I] just deal with it.” Another way in which adjusting teaching occurs is when a professor took a measurement and used it to change future classes. This sometimes meant that they revisited a subject, changed how they taught it in the next period, or made a note to adjust it in future semesters. Professor Evans explained how feedback from students helps him adjust many aspects of his teaching over short and long time: Professor Evans: “Feedback from students in previous years has helped me figure out how things can go, and I have tweaked and optimized from workshops to lectures to labs to other things. And I know there is less feedback in terms of seeking clarification about confusions, because I have ironed most of that out. When there was confusion about a question on a lab or whatever it was, I have gone back and changed the wording so that it was clearer. I have even done it between one section and the next. So a lot of that has been alleviated.” This journal is © The Royal Society of Chemistry 2011 View Online ‘Adjusting teaching based on data’ was mentioned in 19% of coded statements regarding the four elements of diagnostic assessment, about half as often as ‘measurement’ and ‘feedback’ but twice as often as ‘target’. Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F Barriers The barriers to the diagnostic assessment cycle that the professors mentioned can be divided into three areas: student, instructor, and situational (Fig. 3). All the barriers can prevent the full implementation of diagnostic assessment because the professor, the students, or classroom circumstances cause some hindrance to one or more of the four main elements mentioned earlier. Table 3 lists the barriers most frequently listed in the interviews. Because none of the top ten barriers were instructor barriers, we also include the highest-ranking instructor barrier for comparison. Student barriers The student barriers refer to those barriers caused by students’ attitudes or their behaviors. Consistent with the literature (Hesse, 1989) student barriers were the most common type, constituting 52% of all coded statements regarding barrier to diagnostic assessment. Due to these barriers, the professors in our study sometimes felt that they could not measure the students’ level of understanding, provide effective feedback for the students, or that students may not use the feedback. At times, the professors were still able to implement diagnostic assessment to a certain extent, but believed that diagnostic assessment could not occur as well as it would if the student barriers were not there. Often a student barrier was cited as the reason behind a decision regarding how to implement a diagnostic assessment. Other times, a professor stated that a student barrier stopped him from implementing a particular diagnostic assessment strategy at all. The following quote is an example from Professor Evans illustrating a common barrier to him, that of students not paying attention in class. In this quote Professor Evans acknowledges that when he answers one student’s question, the two-way communication might not reach other students who are not paying attention. This barrier is revealed if a student who was not listening asks the same question again. Professor Evans: “When I hear a question that comes back again, what I’m listening for is whether this student is basically asking the exact same question again or are they asking it at least in a way that demonstrates they’re aware that we’ve already been talking about this. If I hear them ask it in a nuanced way or building way or a way that connects with in any way what we said before, even if it is, ‘I know you’ve gone over this before, but is this the same as [before]?’ or whatever, then I’m much more likely to spend more time on it because then I feel like they’re asking for further clarification and it’s probably representative of more in the class. But if it’s a totally redundant question, I tend to think they just didn’t hear it the first time. So I’m not going to fully elaborate again because I’m guessing that’s not representative of the rest of the class.” This journal is © The Royal Society of Chemistry 2011 Fig. 3 Frequency with which each category of barrier to diagnostic assessment was coded in the interview data. These are axial categories, each with many codes. Table 3 Top ten barriers to diagnostic assessment Rank Barriers Professors do not have enough time for diagnostic 1 assessment 2 Students don’t pay attention in class 3 Students don’t put in required effort to do their work 4 Students are apathetic about the class 5 Students ignore feedback from the professor 6 Students don’t communicate their thinking 7 Students don’t know how to learn Large quantity of content limits implementation of 8 diagnostic assessment 9 Students don’t take initiative to get or give feedback 10 Students that are doing very poorly lack hope 23 Instructor’s feedback is too vague to provide a path forward Type Situational Student Student Student Student Student Student Situational Student Student Instructor Instructor barriers The instructor barriers are those barriers caused directly by the professor. The literature suggests that these barriers may come from a simple lack of training for faculty in how to use diagnostic assessment (Stiggins, 2002; McNamee and Chen, 2005). For our subjects, who were selected based on their reputation for exemplary teaching, these barriers were far less numerous (18% of coded statements regarding barriers to diagnostic assessment) than student and situational barriers. However, the fact that this type of barrier to diagnostic assessment exists even for these ‘successful instructors’ should be noted. In the interviews, the professors mentioned only two types of instructor barriers. The first one mentioned was when, in the moment, the instructor was not sure enough of the content to provide students with immediate feedback. The second type of barrier was the instructor’s feedback being vague, so the students could not necessarily move forward as much as they would if his feedback were more specific. This came up when the interviewer (HF) discussed papers that Professor Evans had marked to hand back to his students: HF: “When you do something like this, you have a couple of ‘minus ones’ [i.e. points taken off]. Do they know what these ‘minus ones’ are for? Professor Evans: “I don’t know. Basically when I have to make a correction to how they use a formula and I put a Chem. Educ. Res. Pract., 2011, 12, 142–157 | 149 View Online one here, then I put a minus one because it wasn’t [crystal structure] 110, it was [crystal structure] 111. So I made the correction. That’s a minus one. Here it’s not supposed to be A. It’s supposed to be D. There’s a minus one.” HF: Okay. Professor Evans: “Do they know? I don’t know. But I try to put the minus one right next to whatever correction I had to make to make their calculation work.” Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F Situational barriers The final type of barrier is the situational barrier (30% of all coded statements regarding barriers to diagnostic assessment). This area encompasses all those barriers mentioned that are not directly caused by the students or professors. Consistent with the literature (Black and Wiliam, 1998b; OECD, 2005), these barriers came about due to the inevitable circumstances of teaching, such as a large number of students, a large quantity of content, and a wide distribution of students’ abilities. Of the situational codes in our study, the most frequently mentioned was ‘time constraints’, meaning that the professor did not feel he had enough time to do some part of the diagnostic assessment cycle well or even at all. Professor Peterson provides the following explanation why time constraints make implementing diagnostic assessment so difficult. Professor Peterson: “There's always a tension between what you want to cover in class. Are we willing to not cover a lot of material in a chapter so that we can go back and forth in a feedback thing?… So if you've got to cover it all, then, obviously, you just don't have a lot of time for some of these other things like discussion.” Driving forces Informed or uninforme d techniques Besides the four elements of diagnostic assessment and the barriers to its implementation, there are also certain driving forces that either propel the cycle forward or detract from it (Fig. 4). As with any technique, the diagnostic assessment techniques that a professor uses can either be well informed or uninformed, depending on the professor’s grounding in pedagogical content knowledge (Mishra and Koehler, 2006). In our data we see that uninformed techniques that detract from the cycle (36% of coded statements regarding driving forces for diagnostic assessment) were drawn from the professor’s intuition, or occurred when the professor did not have a reason for his actions. Simply relying on one’s intuition was one of the most common forces distracting from the cycle. This can look like the professor replacing measurement and feedback from students with intuition on what teaching adjustments to make. The following quote illustrates how a professor might use his intuition rather than make a measurement or seek feedback. Professor Peterson: “I'm not one that thinks about what I do too much which is probably not good. I don't know. I just kind of do what feels good. Yeah. I don't sit back and analyze things. I just don't. I just never have.” 150 | Chem. Educ. Res. Pract., 2011, 12, 142–157 Fig. 4 Frequency with which each category of driving force for diagnostic assessment was coded in the interview data. These are axial categories, each with many codes. We saw that informed techniques (37% of coded statements regarding driving forces for diagnostic assessment) that drove professors forward in the diagnostic assessment cycle were those that drew from some body of knowledge outside of the professors’ own intuition. When using informed techniques, each professor in our study based his decision to do diagnostic assessment on sources, such as the professor’s experiences with previous classes, a previous measurement the professor took at another time, or information from external resources, such as colleagues or literature. These informed techniques drove the cycle forward, because the professor was using a concrete measurement of student learning. Thus, he could address the students’ needs more accurately than if he was relying solely on his intuition. In the following example, Professor Evans adjusted his teaching based on his past experience with students, using his experience as a form of measurement to base his decision to adjusting his teaching. Professor Evans: “There’s a history here as we’ve had issues in previous years. We did make previous tests available, because some students had them from brothers and suitemates and what-not. Other students complained because there would be too much repetition. So we committed to making them available, but then we also committed to making partially new material for every test so that it becomes a bit of a standard.” Positive at titude tow ards di agnostic assessment The final driving force found for the diagnostic assessment cycle was a positive attitude towards the implementation of diagnostic assessment (27% of coded statements regarding driving forces for diagnostic assessment). Beliefs that a diagnostic assessment strategy is important, or that it should be implemented, will further steer professors towards the cycle because of the increased professor ‘buy-in’ of the idea of effective diagnostic assessment (Shepard, 2000). In our data, we saw that this driving force most commonly encompasses the belief that diagnostic assessment was working, which served as a motivation for the professor to continue formative strategies they had already been implementing, as seen in the quote below. HF: “So when do you think feedback is useful? You mentioned when they're having a hard time for example.” Professor Peterson: “Well heck, it's always useful.” This journal is © The Royal Society of Chemistry 2011 Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F View Online Fig. 5 Diagnostic Assessment Cycle, Motivations, and Barriers for Professor Peterson Case studies Both professors discussed the diagnostic assessment cycle to varying degrees, with the aforementioned emphasis on ‘measurement’ and ‘feedback’. However, the specific techniques that professors used in order to implement the four key elements provide a window into better understanding each professor’s approach to diagnostic assessment. The cycle of the four key elements to diagnostic assessment from Fig. 1 has been personalized in the case studies below for both of the professors, using the codes most frequently mentioned by each professor and the codes most specific to the professor. The context of these codes within the interviews reveals the professors’ various motivations for doing diagnostic assessment techniques and reasons for not doing them. It should be noted, though, that the motivations are not always fitting with the idea of diagnostic assessment because at times a professor may implement a technique, but not with the desire to assess his students. For example, a professor may construct a quiz that closely mimics the homework problems not because he wants the students to learn from feedback on the homework, but rather because he wants to be blameless if some students do not do well on the quiz (i.e. the students knew what was coming and cannot accuse the professor of surprising them). The motivations are noted by the arrow pointing to the key element that the professor implemented. The arrow pointing away indicates the barriers for that professor. This journal is © The Royal Society of Chemistry 2011 Case Study: Professor Peterson (Fig. 5) Diagnostic as sessment tech niques us ed by Prof essor Peterson Flexible le sson plans One formative aspect of Professor Peterson’s teaching is his willingness to adjust his teaching based on students’ needs. The main way he does this is through being flexible with his daily lesson plans. He goes into every class period with a general idea of what he wants to cover, but he does not strictly follow a lesson plan. Because of this flexibility, he is able to adjust the time he spends on certain topics and the way he covers the content based on his sense of the students’ needs. Professor Peterson is versatile in the way he adjusts his teaching, and tries to uses a variety of techniques to help students understand a particularly difficult topic. Questioning A second way that Professor Peterson fosters a formative environment is through asking questions and getting students to ask questions. Professor Peterson asks his students a lot of questions, and often gives them significant wait-time (Rowe, 1972) to think about the questions, because he wants to measure student understanding. He also encourages his students to ask him questions, so that he knows when they are getting stuck and what part of the subject matter is confusing them. He then uses the information he gains from asking questions and from student questions to help him know how best to adjust his teaching. Chem. Educ. Res. Pract., 2011, 12, 142–157 | 151 Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F View Online Atmosphere of en gagement and p eer-assessment Professor Peterson explicitly focuses on engagement of students during class to foster measurement and feedback. To set the stage for engagement, Professor Peterson intentionally creates a comfortable, non-threatening atmosphere in his classes that encourages student questions. He also tries to answer student questions in a formative way, with explanations or more questions to give students an opportunity to think about the issue instead of just giving them short answers that don’t make them think. In addition to questioning, Professor Peterson also gets students involved through peer-assessment by having the students in the class answer their peers’ questions rather than answering the questions himself, and when he has students work through problems in small groups. Clear ass essment exp ectations Professor Peterson makes his assessment expectations abundantly clear. One of the ways he does this is through making a big point of directly basing his quizzes and exams on homework problems. These problems are web-based and graded automatically. Additionally, Professor Peterson makes complete solutions available to the students after the assignments are due. The assignments can then be formative in that students’ thinking is measured and they receive feedback through the grading and posted answer keys. Professor Peterson does expect the students to learn from this homework and be able to solve similar problems on quizzes and exams. On one occasion, when returning a quiz, he explicitly showed the students what homework problems the quiz problems came from, “I told them the quiz would come from the problem sets, and here, ‘I’m showing you exactly where it came from.’” This cycle of repetitive assessment and feedback is a recurring theme to Professor Peterson’s teaching. What motivates Professor Pe assessment? terson to use diagnostic Professor as provider Throughout his discussions about diagnostic assessment techniques, Professor Peterson demonstrated an overall characterization of the professor as the provider (Harden and Crosby, 2000). Many of his motivations for doing diagnostic assessment revolve around the idea that his role is to provide what the students’ need to learn. This can mean providing opportunities for involvement so he can gauge student thinking, creating a welcoming atmosphere so students have opportunities to make their thinking clear, or making expectations clear so that students know a path forward to achieving learning goals. For example, Professor Peterson was often observed asking questions in class, but often he would pause until it was clear that several of the students were ready to answer rather than just taking the first hand. In response to why he does this, he said: “…I want a dialogue, right. Maybe some of them thought a little bit there, ‘I wonder what are we looking at here,’ … It gets at least some dialogue going, maybe not between me and everybody, but at least some of the kids are thinking. The fact that four or five of them are 152 | Chem. Educ. Res. Pract., 2011, 12, 142–157 answering questions means that just I made them do some thinking.” By asking questions with a prolonged wait-time, Professor Peterson provides space for the students to think and not just listen to a lecture (Rowe, 1972). Additionally, he often answers students’ questions in a formative way by not giving a quick answer but instead asking the student more questions or turning the question over to the class. Again, he provides for the students by giving them many opportunities to build their understanding during class time. Blame av oidance Professor Peterson’s motivations to be a provider stem in part from a defensive compulsion to have the students, rather than him, be to blame when they fail (Hesse, 1989). If he fulfills his role of providing what the students need, then he cannot be blamed for any student failures. This is clearly seen in his discussion on how students should know how to prepare summative assessments such as quizzes: If they blew it, that’s just fine. I don’t - I’m not here to win [the] popular [vote] - if they didn’t do well, ‘You are to blame, don’t blame me. I didn’t pull anything unusual that you didn’t expect here. It was laid out exactly as I told you it was going to be laid out. And if you screwed up I hope the blame comes back to you. Don’t make it my fault, because I made it pretty clear, and here it is.’ That’s okay. If they’re upset that’s just fine. Professor Peterson is a provider of tools and opportunity to succeed, but the students must have initiative and responsibility to take advantage of what is provided. Barriers to d iagnostic as sessment p erceived b y Profes sor Peterson Students i gnore feedb ack One of the most significant barriers Professor Peterson sees to the successful use of diagnostic assessment strategies is when students do not take advantage of the feedback that is provided to them. This is a corollary to his motivation to provide them with feedback (e.g. graded homework) – if they do not pay sufficient attention to the feedback given, they will not be able to learn from their mistakes. Professor Peterson: “After all these years of teaching it still amazes me how some kids do as poorly as they do. Especially, given that they knew it was coming. Came out of problem sets. They had complete solutions to those in advance. We have had kids [scoring] 10, 15 points out of 33. I just have to conclude they’re not working, that’s all”. HF: “So, that’s what you would attribute it to?” Professor Peterson: “I don’t know what else I can conclude. I mean, they … yeah. I think so, I mean, I’m not a hundred percent [sure], but certainly a high majority of them, I think, are doing poorly because they’re just not putting forth what I consider the amount of effort that you need to.” This is a barrier that Professor Peterson sees as beyond his control in that it originates from the students lack of a responsible attitude. This journal is © The Royal Society of Chemistry 2011 Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F View Online Fig. 6 Diagnostic Assessment cycle, barriers, and motivations for Professor Evans. Coverage p ressure A significant barrier perceived by Professor Peterson is the pressure to cover a large amount of material in a limited amount of time (OECD, 2005). He indicates that the coverage pressure prevents him from doing more of the diagnostic assessment techniques that he values, and prevents him from pursuing other diagnostic assessment techniques that he is aware of. For example, achieve sufficient efficiency; he sometimes relies on intuition rather than using a diagnostic assessment technique to make a measurement of how well students are doing in achieving the learning outcomes. HF: “Did you have any feeling yesterday for how many were following that bit [a topic being discussed in class]?” Professor Peterson: “You see, I guess this is where - this is probably a place where I could benefit by having some clicker [i.e. classroom response system] stuff. … Maybe that’s the direction I should go with some of this stuff. Then I could say, ‘Okay, if I’m going to spend ten more minutes on clickers during a class, okay, then what do I chop out?’ …. Yeah, so, did they all get it? I don’t know. Maybe… So, how many were with me? I think most of them. But, you can kind of tell. ” HF: “What are the hints?” Professor Peterson: “Oh, you can see it on their faces.” In this dialogue we can see that Professor Peterson sees some diagnostic assessment techniques as potentially useful, but not worth the extra class time that he thinks they This journal is © The Royal Society of Chemistry 2011 would use. He is also confident enough in his own intuition that the benefit added from taking a measurement may not be worth the extra time he assumes it would take to make the measurement. Case study: Professor Evans (Fig. 6) Diagnostic assessment techniques used by Professor Evans Professor Evans uses a rich and varied arsenal of diagnostic assessment techniques. In this paper we will discuss those he discussed most often, including: clickers, peerassessment, holistic office hour sessions, and repetitive assessment/feedback cycles. Clickers In using clickers, Professor Evans accomplishes all four aspects of the diagnostic assessment cycle simultaneously (MacArthur and Jones, 2008). During class time, Professor Evans projects a clicker question, most often a content question, but sometimes a question about such things as student expectations for the course or their study habits. The questions themselves provide the students with an idea of the target for the course in that it’s clear what the students are expected to be able to accomplish. After the students answer the question with their clickers, the results are posted as a histogram. This information is valuable to Professor Evans as a measurement of whether the students are achieving the learning outcome. This serves as feedback for him to determine the extent to which he should adjust his teaching to forge ahead or have the class Chem. Educ. Res. Pract., 2011, 12, 142–157 | 153 View Online Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F dwell on the topic at hand. The clicker questions are also formative for the students in that they receive timely feedback on whether or not their answer fits with the answers of their peers. As the question is discussed and, in the case of objective questions, the correct answer is revealed, students can use that information as feedback to adjust their subsequent learning. Peer as sessment Several times during the course of the semester, Professor Evans spent the full class period having the students complete what he calls ‘workshops’ in which they work in groups of three to answer questions and solve problems regarding an application of chemistry to materials science (e.g. one workshop observed dealt with solid state phase diagrams and the manufacture of Samurai swords). As the students work in their groups, Professor Evans circulates about the class, listening, answering questions, and making occasional statements to the group as a whole. These workshops serve several formative purposes. As the students share their ideas with peers, they get feedback from each other, which then helps them to adjust subsequent learning (Mazur, 1997). As Professor Evans eavesdrops on students and answers questions, he gathers a measurement of where the students are having trouble and this serves as feedback for what clarifying announcements to make as the groups continue in the workshop. Holistic office hours In the interviews, Professor Evans placed a frequent emphasis on the formative role that oneon-one interactions during office hours can play. Office hours are unique in that they are an opportunity for an instructor to discuss holistic issues with the student, such as study habits, time management, prioritization, and adjusting to the freedom that comes with being a first year college student (the engineering class that Professor Evans was teaching was more than 90% freshmen students). Professor Evans recalled more than one occasion where he and the student were able to discuss unsatisfactory progress in the class in terms of content expectations, process expectations, and many aspects of the student’s situation including method of studying, time spent studying, good habits (e.g. sleep and exercise), and possible distractions from study, such as paid work or unproductive activities (e.g. video games). This sort of holistic measurement and feedback to the students is different for each student based on their personal situation; it is only practical in such a private setting. Repetitive ass essment/feedback cycle Similarly to Professor Peterson, Professor Evans structured the graded aspects of the class around a repetitive assessment/feedback cycle (Pellegrino et al., 2001). He presented the immediate aim of the homework as a preparation for taking homeworkbased quizzes and examinations. This was communicated to the students to give them a path forward for study, so that they knew what to expect. Keys for the homework were posted so that the students could compare their completed homework against the correct solutions and get feedback in 154 | Chem. Educ. Res. Pract., 2011, 12, 142–157 this way. This method of posting a written key is used not only for homework, but also for all written work, including exams. What motivates Professor Evans to assessment? use diagnostic Professor a s gui de a nd advi sor Professor Evans’ motivation to use diagnostic assessment techniques can be understood within a characterization of professor as a guide and advisor to his students (Harden and Crosby, 2000). He wants to do all he can to advise his students and guide them as they take his class. Clearest evidence for this comes from Professor Evans discussion of test results with his meetings with students during office hours. “I ask them a lot of questions like, ‘How many were dumb mistakes? How did you feel when you left the test?’ And even on a question-by-question basis, especially on the ones they got wrong. Focusing like, ‘What were you thinking on this question? Why do you think you got it wrong? Did you think you had it right when you left the test? Did you know you hadn’t studied the right stuff?’ Their answers provide a lot of insight into how well prepared they were, how much test anxiety they experienced, and just where their problem-solving skills are at. So usually based on that, if I assess it is dumb mistakes then we usually talk more about getting enough sleep, getting some exercise, being mentally prepared. What can the student do to eliminate dumb mistakes and practice being more careful doing the homework? If it was it lack of preparation, then I might say ‘You could really just study the notes more because these are the types of problems you got wrong.’ If it was it an inability to solve problems - if their problem-solving skills are really lacking then I really dive into how are they doing the homework. What adjustments they could make and how they’re approaching the homework itself, because that’s where they will improve their problem-solving skills. At that point I’ll ask them, ‘Do you have a picture in mind when you start these types of problems or do you have a mental picture that’s guiding you?’ And then we differentiate between problems they’ve seen before and what they haven’t seen before. I just keep asking them a lot of questions to try to really get at what derailed the student on a particular problem and on a particular type of problem. One often sees a pattern in a test. Like all the quantitative problem-solving ones, those are the ones that they got most of their points off. Or all the short answer conceptual ones. That’s where they got more of their points off or that type of thing. So it’s a dialogue that kind of goes back and forth. I usually in the end giving them advice about how to adjust their discipline in approaching their homework, how they can try to improve some areas that they’re - in terms of problem-solving, how they can best work on areas of weakness, how they can better prepare. Things like that ” This journal is © The Royal Society of Chemistry 2011 Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F View Online Clear expectations Another way in which Professor Evans is motivated to use diagnostic assessment techniques is through his desire for each student to be able to achieve the learning goals with the help of clear process and content expectations (Ludwig et al., 2011). He uses homework and quizzes together, where the homework is graded only on completion but the ability to do the homework correctly is measured through quizzes based on that homework: “They have to hand in the homework, 10 points of it is just for doing it, and not based on correctness and just a few points for correctness. We're trying to steer away from over-rewarding them for correct answers at this point. And steer them much more towards doing the homework in the right way. Then an additional 10 is the quiz the next day which will just take some of those problems and the reason for that, is I've found that I want to steer students toward how to do homework which is you should do homework in a way so that you can reproduce the results by yourself in a quiz situation the next day. If you can do that, then you've understood the homework problem. It's not just about getting the right answer once. We've had a huge problem in the past where groups of five to ten students will get together on Wednesday night, the one smart student has worked through most of it and everyone is just copying like mad, and learning very little from the homework, jumping through the hoop but not getting anything out of it. So we've really tried to defang the homework in terms of points for correctness, so they're not as motivated to do that. ” This quote illustrates again how Professor Evans guides his students, this time through designing class policies that hold them individually accountable and guide them towards good study habits. He has adjusted his policies in response to what the class needed to have a better chance of being guided to success. Barriers to d iagnostic as sessment p erceived b y Profes sor Evans Similarly to Professor Peterson, Professor Evans finds the most significant barriers to his use of diagnostic assessment are when students do not take advantage of the guidance that he offers and that he just does not have enough time to give more guidance than he does. Additionally, he must sometimes cut diagnostic assessment events short when he is running out of classroom time. Students do not ta ke adva ntage of fe edback opportunities After the students in his class received a midterm exam back, Professor Evans invited those who struggled to come by office hours to receive help, but only about five out of one hundred students came. When asked if he had hoped for more he said: “There are more students than that that need help. Do I hope for more? That's a hard one for me to answer. If I can help them, I would hope that they would come to my office. …. But there's probably some that just don't accept the invitation because they’re still intimidated, This journal is © The Royal Society of Chemistry 2011 don’t want to come talk a professor about it because they’re afraid that the professor will make them feel bad or dumb because they didn’t do well on the test. So I guess I would hope for more because there are more that could use the help.” Here Professor Evans identifies the reluctance of the students to seek out his guidance as a barrier to his giving them the diagnostic feedback that they could benefit from. Time constraints On the other hand, even though Professor Evans clearly is willing to give detailed guidance to any student who comes to his office, he also knows that lack of time would be a barrier to making this work (Black and Wiliam 1998b; OECD, 2005) even if he removed the barrier by requiring students to make one visit to his office. “I could imagine building something into an assignment where they were required to do it once, somehow, the first time was sort of required, like you must come ask some question and come in a group of three or something. … I don't know, I haven't tried that stuff. … I'm not sure it's a good idea, it would be a big time commitment on my part, I don't know how the students would perceive it. … With 111 students it's a big task. I wouldn't want them all to have to come one at a time, because that would take up the whole week.” We see that part of his time constraint here originates with the large number of students that he was teaching during the semester he was a participant in the study. Although each section of students was a reasonable size (less than 40), he was teaching three sections for a total of 111 students. Time constraints were also sometimes perceived by Professor Evans to be a barrier to diagnostic assessment in the classroom (Black and Wiliam, 1998b; OECD, 2005). For example, even though Professor Evans often used clickers to good effect, he only sometimes followed it with telling the students to turn to their neighbor and discuss what answers they had and why. When asked “How do you decide when to do it and when not to do it?” Professor Evans responded, “Yesterday there was a bit of a time limitation, which I knew I had almost exactly 50 minutes of stuff [content to present].” On another occasion Professor Evans might even decrease the amount of time students have to ‘click in’ their answers: HF: You had a clicker question on [the solubility of] potassium iodide, lead nitrate. You put 10 seconds on. You had 24 students click in. That wasn’t the whole class. Right? Professor Evans: No. Probably not. HF: Why so short and why so few students? Professor Evans: I was probably feeling a little pressed for time. Conclusions and implications for further research In order to conclude this paper, we invite the reader to consider again the problem that motivates this research and our research question. The problem: there is an Chem. Educ. Res. Pract., 2011, 12, 142–157 | 155 Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F View Online implementation gap in that many college-level chemistry instructors do not fully take advantage of the diagnostic assessment paradigm in their teaching. The research question: for those professors who do use diagnostic assessment, what do they do, what motivates them to do it, and what barriers do they face? Our hope is that if we can answer our research question we can shed light on how to address the problem of the implementation gap. For example, if we can illustrate how successful chemistry instructors do use diagnostic assessment, this could serve as a foothold for others to recognize what current practices they could invigorate or new practices they could initiate to bring their teaching more in line with the diagnostic assessment paradigm. Below we highlight some of the results of these two case studies, to emphasize that these footholds are apparent. It is clear that in these two cases, the professors repeatedly integrated diagnostic assessment into their courses and integrated the various elements of diagnostic assessment together. Successful diagnostic assessment has several elements that play into and support each other. Both professors in this study engaged in all four elements of the diagnostic assessment cycle in this way. Reflecting on case studies, we can find examples of how one element of the cycle leads into the other. For example, when the students are aware of the target through the syllabus or verbal comments made in class or even through a promise that they will be held accountable for the homework on a test, then the students know what to aim for and this sets up the opportunity for measurement. In the measurement, whether it be through using clickers, or collecting written work, or using good wait time along with questions in class, the professors in this study found out where their students were at and that equipped them to be able to give feedback. For the feedback, whether it be through written comments on written work, or through observing and commenting on peer interactions in class, the professors gave feedback to students and received feedback from students so that both parties could make adjustments. When making adjustments by keeping lesson plans flexible, or giving more time to a topic in response to clicker question results, the instructors reinforce for the students what the target expectations are by dwelling on important content until the learning outcomes are achieved. For the professors in this study, the cycle continued until an acceptable level of mastery was met in the class and the cycle was revisited as new material was presented. It is also clear that there are many techniques that may be seen as otherwise common teaching techniques that have been used by these two professors in a way that is consistent with the diagnostic assessment principle. Successful diagnostic assessment can happen with small changes to common teaching techniques. For example, from Table 1, the technique most discussed in the interviews was ‘asking questions’. Research has shown that many teachers ask questions during class, but that many also do so without sufficient wait-time (Rowe, 1972). By making a small change and consciously waiting after asking a question, an 156 | Chem. Educ. Res. Pract., 2011, 12, 142–157 instructor can turn questions into valid measurement events where everyone in the class has enough time to engage with the question and commit to an answer. The instructor can collect the data from the measurement in a low tech way, such as the verbal discussions that Professor Peterson had, or in a high tech way, such as using clickers like Professor Evans did. We assert that the techniques used by these professors (Table 1) are not terribly distant from the teaching of many other professors who may not even be aware of the diagnostic assessment paradigm. These should be accessible to any instructor who is committed to introducing more diagnostic assessment into their classroom. Many of the barriers perceived by the professors studied here are probably unavoidable; at the same time it is encouraging to clearly see that diagnostic assessment can happen despite perceived barriers. The barrier cited most often, lack of time to perform diagnostic assessment, is of course the barrier cited for many things left undone in life. Even so, as these two professors demonstrated, when motivated to make diagnostic assessment a priority it can be done. The long list of examples of diagnostic assessment performed by these two instructors shows the extent to which the techniques can be incorporated despite time constraints. Further, with a very small amount of additional time (e.g. increasing wait-time from 10 to 20 seconds a few times per class period), diagnostic assessment can be increased with very little sacrifice of content coverage. The many ‘student barriers’ listed suggest that diagnostic assessment is not as effective as it could be for those students who don’t pay attention, don’t put in the required work, are apathetic, ignore feedback, etc. This of course does not imply that quality pedagogical choices are wasted on the remaining students. Indeed, the body of research on the effectiveness of diagnostic assessment suggests that it can play an important role in removing many of these student barriers by, for example, engaging students to make them more likely to pay attention in class and to overcome their apathy towards the class. It is interesting to note that some barriers perceived as significant by one of the professors was not mentioned by the other. This suggests that the barriers we perceive may be overcome by the insights of our peers. This is a dialogue that has not happened between Professors Peterson and Evans yet. Still, as research such as this is expanded, we anticipate that common barriers can be better understood, as will the ways to overcome them. It is informative, but not surprising, to note that different professors can implement best practices in different ways. This suggests that qualitative research, such as this, can uncover and illustrate good practices that can inform struggling and successful instructors alike. There is more than one excellent way to teach even within a somewhat constrained paradigm of teaching. And while high quality teachers will likely come to these paradigms even without ever having any formal training in teaching, observing and reflecting on good practice can help us to further conceptualize what works and why. This journal is © The Royal Society of Chemistry 2011 View Online Acknowledgments We would like to thank Professors Peterson and Evans for generously giving of their time to make this research possible. Downloaded on 03 May 2011 Published on 21 April 2011 on | doi:10.1039/C1RP90019F References Angelo T. A., (1990), Classroom assessment: improving learning quality where it matters most, New Direct. Teach. Learn., 42, 7182. Atkin J. M., Black P. and Coffey J., (Eds.), (2001), Classroom assessment and the National Science Education Standards, Washington DC: National Academy Press. BCCE, (2008), Biennial Conference on Chemical Education, Bloomington, IN. Black P. and Wiliam D., (1998a), Assessment and classroom learning, Assess. Educ., 5, 7-74. Black P. J. and Wiliam D., (1998b), Inside the Black Box: raising standards through classroom assessment, Phi Delta Kappa, 80, 139-148. Brainard J., (2007), The tough road to better science teaching, Chron. High. Educ., 53, 1-3. Bransford J. D., Brown A. L. and Cocking R. R., (2000), How people learn: brain, mind, experience, and school. Washington, DC: National Academy Press, pp. 131-154. Brooks D. W., Schraw G. and Crippen K. J., (2005), Performancerelated feedback: the hallmark of efficient instruction, J. Chem. Educ., 82, 641-644. Chappuis S., (2005), Is formative assessment losing its meaning?, Educ. Week, 24, 38. Creswell J. W., (2003), Research design: qualitative, quantitative, and mixed method approaches, 2nd ed. Thousand Oaks, CA: Sage. Fuchs L. S. and Fuchs D., (1986), Effects of systematic formative evaluation: a meta-analysis. Except. Child., 53, 199-208. Harden R. M. and Crosby J. R., (2000), The good teacher is more than a lecturer – the twelve roles of the teacher, Med. Teach., 22, 334347. Harlen W., (2003), Enhancing inquiry through formative assessment, San Francisco: Exploratorium Institute for Inquiry. Retrieved August 2007 from Harlen W. and James M., (1997), Assessment and learning: differences and relationships between formative and summative assessment, Assess. Educ., 4, 365–379. Hattie J., (2009), Visible learning: a synthesis of over 800 metaanalyses relating to achievement, London: Taylor and Francis. Hattie J. and Timperley H., (2007), The power of feedback, Rev. Educ. Res., 77, 81-112. Henderson C. and Dancy M. H., (2007), Barriers to the use of researchbased instructional strategies: the influence of both individual and situational characteristics. Phys. Rev. Spec. Top.: Phys. Educ. Res., 3, 1-14. Henderson C., Finkelstein N. and Beach A., (2010), Beyond dissemination in college science teaching: an introduction to four core change strategies, J. Coll. Sci. Teach., 39, 18-25. Hesse J., (1989), From naive to knowledgable, Sci. Teach., 56, 55-58. Kember D., (1997), A reconceptualisation of the research into university academics' conceptions of teaching, Learn. Instr., 7, 255-275. Kluger A. N. and DeNisi A., (1996), The effects of feedback interventions on performance: a historical review, a meta-analysis, and a preliminary feedback intervention theory, Psych. Bull., 119, 254-284. Leahy S., Lyon C., Thompson M. and Wiliam D., (2005), Classroom assessment minute by minute, day by day, Educ. Lead., 63, 18-24. This journal is © The Royal Society of Chemistry 2011 Ludwig M., Bentz A. and Fynewever H., (2011), Your syllabus should set the stage for assessment for learning, J. Coll. Sci. Teach., 40, 20-23. MacArthur J. R. and Jones L. L., (2008), A review of literature reports of clickers applicable to college chemistry classrooms, Chem. Educ. Res. Pract., 9, 187-195. Marshall J. M. (2005), Formative assessment: mapping the road to success, Princeton Rev. (accessed May, 2010) Marton F., (1981), Phenomenography – describing conceptions of the world around us, Instruct. Sci., 10, 177-200. Marton F. and Booth S., (1997), Learning and awareness, Maywah, New Jersey: Lawrence Earlbaum. Mazur E., (1997), Peer instruction, Upper Saddle River, NJ: Prentice Hall. McNamee G. D. and Chen J. Q., (2005), Dissolving the line between assessment and teaching, Educ. Lead., 63, 72-76. Mishra P. and Koehler M. J., (2006), Technological pedagogical content knowledge: a framework for teacher knowledge, Teach. Coll. Rec., 108, 1017–1054. Nyquist J. B., (2003), The benfits of reconstructing feedback as a larger system of formative assessment: a meta-analysis. Unpublished master’s thesis. Nashville, TN: Vanderbilt University, as cited in Wiliam D., An integrative summary of the research literature and implications for a new theory of formative assessment, in Handbook of formative assessment, (2010), Andrade H. L. and Cizek G. J. (eds) New York: Routledge, pp. 1840. OECD: Organisation for economic co-operation and development, (2005), Formative assessment: improving learning in secondary schools, Paris. Orgil M., (2007)., Phenomenography in theoretical frameworks for research in chemistry/science education, Bodner G. and Orgil M., (eds.) Upper Saddle River, NJ: Pearson Prentice Hall, pp. 132-151. Pellegrino J., Chudowsky N. and Glaser R., (2001), Knowing what students know: the science and design of educational assessment, Washington, DC: National Academy Press. Rowe M. B., (1972), Wait-time and rewards as instructional variables, their influence in language, logic, and fate control, National Association for Research in Science Teaching, Chicago, IL, ED 061 103. Shepard L. A., (2000), The role of assessment in a learning culture, Educ. Res., 29, 4–14. Shute V. J., (2008), Focus on formative feedback, Rev. Educ. Res., 78, 153-189. Steadman M., (1998), Using classroom assessment to change both teaching and learning, New Direct. Teach. Learn., 75, 23-35. Stiggins R. J., (1992), High quality classroom assessment: what does it really mean? Educ. Meas.: Iss. Pract., 11, 35–39. Stiggins R. J., (2002). Assessment crisis: the absence of assessment for learning. Phi Delta Kappa Intl, 83, 758-765. Trigwell K. and Prosser M., (2003), Qualitative differences in university teaching, in Access and Exclusion, M. Tight (ed.) Oxford: JAI Elsevier. Wiliam D., Lee C., Harrison C. and Black P., (2004), Teachers developing assessment for learning: impact on student achievement, Assess. Educ.: Principles, Policy Pract., 11, 49-65 Wilson M. and Scalise K., (2006), Assessment to improve learning in higher education: the BEAR assessment system, High. Educ., 52, 635-663. Wilson M. and Sloane K., (2000), From principles to practice: an embedded assessment system, Appl. Meas. Educ., 13, 181-208. Ziebarth S.W., Fynewever H., Akom G., Cummings K. E., Bentz A., Engelman J. A., Ludwig M., Noakes L. A. and Rusiecki E., (2009), Current developments in assessment for learning in universities and high schools in Michigan: problems and perspectives in mathematics and science education, J. MultiDisc. Eval., 6, 1-22. Chem. Educ. Res. Pract., 2011, 12, 142–157 | 157