JIM RIDGWAY, MALCOLM SWAN AND HUGH BURKHARDT ASSESSING MATHEMATICAL THINKING VIA FLAG 1. ABSTRACT Teachers of undergraduate mathematics face a range of problems which include an increasing number of less well qualified students, and increasing academic diversity in the student population. Students face courses which are radically different from mathematics courses encountered in school; they often face assessment systems which are ill-aligned to course goals, and which use assessment methods likely to encourage a surface rather than a deep approach to learning. The paper describes materials developed by the MARS group for the US National Institute for Science Education for use on their FLAG Web site. Collections of assessment materials can be downloaded free of charge which assess a range of valuable process skills in mathematics – proof, reasoning from evidence, estimation, creating measures, and fault finding and remediation. They are designed to offer a wide repertoire of assessment styles, as part of a process of encouraging a broadening of teaching and learning styles. 2. CHALLENGES FACING TEACHING A number of challenges face teachers of undergraduate mathematics. Several authors have documented a downward drift of entry requirements for mathematicsbased subjects such as engineering and physics (e.g. Hunt and Lawson, 1996; London Mathematical Society, 1995; Sutherland and Dewhurst, 1999; Sutherland and Pozzi, 1995). Serious conceptual problems have been documented in students who seem appropriately qualified (e.g. Armstrong and Croft, 1999). A further difficulty for teachers in higher education is the increased demand for mathematics as a component of other courses; more and more disciplines use mathematical tools as part of their repertoire, notably the social sciences. Both of the above factors make teaching far more difficult, because of a rise in the heterogeneity of the population to be taught, and the corresponding challenges to teaching posed by a general lack of confidence in mathematics. There are important differences in the nature of school mathematics and undergraduate mathematics. Mathematics as a discipline in its own right requires an intellectual approach which is quite distinct from school mathematics; this give rise to problems for students. A particular characteristic of university mathematics is its emphasis on proof and rigour – notions which may not figure large in school studies (e.g. Tall, 1992). 423 Derek Holton et al. (Eds.), The Teaching and Learning of Mathematics at University Level: An ICMI Study, 423—430. © 2016 Kluwer Academic Publishers. Printed in the Netherlands. 424 JIM RIDGEWAY, MALCOLM SWAN AND HUGH BURKHARDT Brown, Bull and Pendlebury (1997) surveyed courses and assessment systems in higher education, and demonstrated a mismatch between stated course aims and the system of assessment in place. Assessment systems too often reflected traditional modes of assessment used within the organization, or in the discipline, and a response to the challenges of assessing large numbers of students. In particular, dependence on multiple choice and short answer questions encourages students to adopt a 'surface' approach to study (in contrast to a 'deep' approach where understanding is paramount) see Biggs (1999), and Marton and Saljo (1976). Each of these aspects – reduced levels of student competence in mathematics on entry to university, students with increasingly diverse academic backgrounds, an intellectual transformation between school and university, and a poor match between curriculum intentions and assessment schemes – shows the need for serious attention to be paid to a growing problem in undergraduate mathematics. This paper describes some recent attempts to improve the quality of assessment systems, and thereby to improve the quality of undergraduate mathematics. Some grounds for optimism are contained in evidence that educational attainment can be raised by better assessment systems (Black and William, 1998; Torrance and Pryor, 1998). Such assessment systems are characterized by: a shared understanding of assessment criteria; high expectations of performance; rich feedback; and effective use of self-assessment, peer assessment, and classroom questioning. The work reported here sets out to provide tools for academic staff to develop appropriate assessment schemes, and arises from a collaboration between the Mathematics Assessment Resource Service (MARS) and the College Level 1 group at the National Institute for Science Education (NISE), University of Wisconsin-Madison. MARS is an international collaboration, based at Michigan State University, with groups at Berkeley, Nottingham, Durham and other centres. MARS supports programmes of improvement in mathematical education by offering assessment systems designed to exemplify and reward the full spectrum of mathematical performance that such programmes seek to develop. NISE has developed FLAG – a Field-Tested Learning Assessment Guide, delivered via the Web. The overall ambition for FLAG is to enhance the first science, mathematics, engineering and technology learning experiences for all college students. Its purpose is to offer a readily accessible, free, up-to-date resource of classroom-tested assessment tools for instructors who have an interest in sharing and implementing new approaches to evaluating student learning. Each of the techniques and tools in this guide has been developed, tested and refined in classrooms and teaching laboratories at colleges and universities. MARS has created the mathematics component of FLAG. The FLAG Web site has a number of components, which include: a basic primer on assessment; a tool based on Bloom's taxonomy (e.g. Bloom, Hastings and Madaus, 1971) to help instructors match their course goals with appropriate Classroom Assessment Techniques (CATs); CATs themselves, which are selfcontained modules that introduce techniques for assessing conceptual, attitudinal and performance-based course goals, all of which can be downloaded in a form ready for use; links to related Web sites; and an annotated bibliography. ASSESSING MATHEMATICAL THINKING VIA FLAG 425 Traditional testing methods often have provided limited measures of student learning, and equally importantly, have proved to be of limited value for guiding student learning. The methods are often inconsistent with the increasing emphasis being placed on the ability of students to think analytically, to understand and communicate, or to connect different aspects of knowledge in mathematics. Because assessment exemplifies what is to be learned, all changes in educational ambition require associated changes in assessment. 3. THE CLUSTERS OF TASKS The assessment tasks developed for FLAG are primarily concerned with sampling critical mathematical thinking rather than testing performances of technical skills or the acquisition of mathematical knowledge. They focus on the flexible and accurate use of relatively simple mathematics – an important capability which many undergraduates find difficult. They are best worked on by small groups in a collaborative, discursive atmosphere. Five types of activity have been developed, 'Finding and fixing faults'; 'Creating measures', 'Making plausible estimates', 'Convincing and proving' and 'Reasoning from evidence'. These are described and illustrated more fully, below. These constitute valid, coherent activities which are commonly performed by mathematicians, scientists, politicians and critical observers of the media. They are less commonly found in mathematics classrooms. 3.1 Finding and fixing faults Identifying and fixing faults is an important survival skill for everyday life, and an important metacognitive skill applied by every mathematician to their own work. Students are shown a number of mistakes which they are asked to diagnose and rectify. These require students to analyze mathematical statements and deduce from the context the part which is most likely to contain the error (there may be more than one possibility), explain the cause of the error and put it right. Such tasks can be quite demanding for students. It is often more difficult to explain the cause of another's seductive error than to avoid making it oneself. An elementary (genuine) example is shown below. Shirts The other day I was in a department store, when I saw a shirt that I likes in a sale. It was marked “20% off”. I decided to buy two. The person behind the till did some mental calculations. He said, “20% off each shirt, that will be 40% off the total price.” (Incident in local supermarket.) 426 JIM RIDGEWAY, MALCOLM SWAN AND HUGH BURKHARDT 3.2 Creating Measures We constantly construct measures for physical and social phenomena and use these to make decisions about our everyday lives. These can vary from measures of simple quantities such as 'speed' or 'steepness' to complex and subjective social ones such as 'quality of life'. All measures are constructions and are thus open to criticism and improvement. When is a measure appropriate for its purpose? While it is possible to define a range of alternative measures for any concept, they will differ in utility; some clearly will prove more useful than others. A 'Creating Measures' task consists of a series of questions which prompt students to evaluate an existing plausible but partial or inadequate measure of an intuitive concept and then invent and evaluate their own improved measure of the concept. A B Figure 1. Which island is most compact and why? B A C D E F Figure 2. Compactness of island shapes Examples we have used include defining measures for: the 'steepness' of a staircase; the 'compactness' of an island; the 'crowdedness' of a gathering of people; and the 'sharpness' of a bend in a road. The first part of each task invites the student to rank-order examples, using their own intuitive understanding of the concept. For example, in one task, we offer students two 'islands', A and B (see Figure 1), and ask students to say which they think is most 'compact' and why. ASSESSING MATHEMATICAL THINKING VIA FLAG 427 The second part of the task provides students with an inadequate measure of the concept and asks them to order a set of 'islands', using this measure and explain why it might be inadequate. For the 'compactness' example, we ask students to use the measure "Compactness = Area ÷ Perimeter" on island shapes A to F (see Figure 2). At first sight, this may seem a reasonable measure, but islands with similar shapes but different sizes show that this measure is not independent of scale and so a dimensionless measure needs to be sought. The third part of the task invites students to devise and use their own measure on the given examples and the final part asks them to scale their measure so that it ranges from 0 to 1. 3.3 Making plausible estimates ('Fermi Problems') Enrico Fermi posed problems which at first seemed impossible but, on reflection, can be solved by making assumptions based on common knowledge and following simple-but-long chains of reasoning. Fermi used such problems to make the point that problem solving is often not limited by incomplete information but rather by an inability to use information that is already available within the immediate community. These tasks assess how well students can: decide what assumptions they need to make; make reasonable estimates of everyday quantities; develop chains of reasoning that enable them to estimate the desired quantity; ascertain the reasonableness of the assumptions upon which the estimate is based. For example: The population of the USA is approximately 270 million. How many dentists are there in the USA? 3.4 Convincing and proving These tasks are of two types. The first type presents a series of statements that students are asked to evaluate. These typically concern mathematical results or hypotheses such as 'the square of a number is greater than the number'. Students are invited to classify each one as 'always true', 'sometimes true' or 'never true' and offer reasons for their decision. The best responses contain convincing explanations and proofs; the weaker responses typically contain just a few examples and counterexamples. These tasks vary in difficulty, according to the statements being considered and the difficulty of providing a convincing or rigorous explanation. They also provide useful diagnostic devices as the statements are sometimes typical student 'misconceptions' which arise from over-generalizing from limited domains. Sample statements are: When you add two numbers, you get the same result as when you multiply them. (a + b)2 = a2 + b2. The centre of a circle that circumscribes a triangle is inside the triangle. Quadrilaterals tessellate. 428 JIM RIDGEWAY, MALCOLM SWAN AND HUGH BURKHARDT A shape with a finite area has a finite perimeter. Attempt 1: Assuming that ab ab 2 a b 2 ab (a b) 2 4ab a 2 b b 2ab 4ab Attempt 2: For all positive numbers ( a b)2 0 a2 a b b 0 ab 2 a b ab ab 2 So the result is true. a 2 b b 2ab 0 ( a b) 2 0 Which is true for positive numbers, so the assumption was true. Attempt 3: The area of the large square = ( a b) 2 . . The unshaded area = 4ab. Clearly a b a (a b) 2 4ab (a b) 2 ab ( a b) or ab 2 So the result is true. or b Figure 3. The relation between the arithmetic mean and the geometrical mean. The second collection involves the students in evaluating 'proofs' that have been constructed by others. Some of these are correct and some are flawed. (For example, in one question, three 'proofs' of the Pythagorean theorem are given). The flawed 'proofs' may be inductive rather than deductive arguments that only work with ASSESSING MATHEMATICAL THINKING VIA FLAG 429 special cases, arguments that assume the result to be proved, or arguments which contain invalid assumptions. There are also some partially correct proofs that contain large unjustified 'jumps' in reasoning. In these tasks, students adopt an 'assessor' role and attempt to identify the most convincing proof and provide critiques for the remaining attempts. For example, one task provides three attempts to prove that the arithmetic mean is greater than or equal to the geometric mean for any two numbers (see Figure 3). 3.5 Reasoning From Evidence These tasks ask students to organize and represent a collection of unsorted data and draw sensible conclusions. For example, students are given a collection of data concerning male and female opinions of two deodorants. The experiment has been designed to test two variables; the deodorant name/packaging (Bouquet and Hunter) and the fragrance (A and B). Both forms of packaging are tested with both forms of fragrance. Thus 'Bouquet A' and 'Hunter A' both contain exactly the same fragrance, A. The results are given on a five-point scale (from 'Love it' to 'Hate it'). The data are presented to students as a collection of 40 unsorted responses which they have to analyze. Students are asked to present their findings in the form of a short report saying which fragrance and name are likely to be the most successful if they are to be sold to females and males. 4. CONCLUDING REMARKS Undergraduate teaching is problematic; shifts in student characteristics require shifts in teaching styles. These CATs are designed to offer a variety of assessment styles, as part of a process of encouraging a broadening teaching and learning styles. Each cluster of tasks requires students to demonstrate some core mathematical skills: proving; reasoning from evidence; estimating; creating measures; and fault finding and remediation. Mathematical challenges are presented at a variety of difficulty levels, and so these CATs can be used with groups of students with greatly differing experiences of mathematics. 430 JIM RIDGEWAY, MALCOLM SWAN AND HUGH BURKHARDT REFERENCES Armstrong, P., and Croft, A. (1999). Identifying the learning needs in mathematics of entrants to undergraduate engineering programmes. European Journal of Engineering Education, 14(1), 59-71. Biggs, J. (1999). Teaching for Quality Learning at University. Buckingham: SRHE and OUP. Black, P. and William, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1), 773. Bloom, B. S., Hastings, J.T. and Madaus, G.F. (1971). Handbook on Formative and Summative Evaluation of Student Learning. New York: McGraw Hill. Brown,G., Bull,J. and Pendlebury,M (1997). Assessing Student Learning in Higher Education. London: Routledge. Field-Tested Learning Assessment Guide (FLAG). URL: http://www.wcer.wisc.edu/nise/cl1/. Hunt, D. and Lawson, D. (1996). Trends in mathematical competency of A level students on entry to university. Teaching Mathematics and its Applications, 15(4), 167-173. London Mathematical Society (1995). Tackling the Mathematics Problem. London: London Mathematical Society. Marton, F. and Saljo, R. (1976). On qualitative differences in learning: I - outcome and process. British Journal of Educational Psychology, 46, 4-11. Mathematics Assessment Resource Service (MARS). URL: http://www.educ.msu.edu/mars/. Sutherland,R., and Pozzi, R. (1995). The Changing Mathematical Background of Undergraduate Engineers. London: The Engineering Council. Sutherland,R., and Dewhurst, H. (1999). Mathematics Education Framework for progression from 16-19 to HE. University of Bristol, Graduate School of Education. Tall, D. (1992). The transition to advanced mathematical thinking: functions, limits, infinity and proof. In D.A.Grouws (ed.) Handbook of Research on Mathematics Teaching and Learning, pp. 495-511. New York: Macmillan. Torrance, H. and Prior, J. (1998). Investigating Formative Assessment. Buckingham: Open University Press. MARS is funded by NSF grant no. ESI 9726403; NISE is funded by NSF grant no. RED 9452971. Jim Ridway University of Durham, England jim.ridgway@durham.ac.uk Malcolm Swan University of Nottingham, England malcolm.swan@nottingham.ac.uk Hugh Burkhardt University of Nottingham, England hugh.burkhardt@nottingham.ac.uk