Report from the 2013 IAEA Conference Tel Aviv October 2013 IAEA International Association for Educational Assessment The IAEA is committed to improving the quality of education on a global basis through sharing • Professional expertise • Research • Training They also produce a research journal. Theme of the 2013 conference was Educational Assessment 2.0: Technology in Educational Assessment 2013 Conference Themes • The use of technology in large scale assessment • Assessment design • How technology can be used to enhance the validity of assessment • Trends in the use of technology in assessment • Test development and measurement • Automated Essay Scoring • Digital gaming 2013 Conference Themes This presentation focuses on two key note speakers (Richard Luecht and Randy Bennett), covering four of those key themes: • Technology in large scale assessment – computer based assessment (CBE) • Assessment design • Automated Essay Scoring • Digital gaming Assessment Design Richard Luecht Richard Luecht is a Professor of Educational Research Methodology at the University of Greenborough. He focused on evidence–based assessment design for formative assessments, technology-enhanced test design, computerbased test design, and cognitive science in assessment. He used the Common Core Standards in the US to illustrate his points. Some key points • The student should be at the centre of assessment – not technology • Instructionally sensitive assessment – assessment that should respond to student’s learning • Assessment should be on demand with immediate feedback • Assessment should be engaging and intrinsically motivating “Just because we can use technology does not mean we have to or that it makes the assessment better” Richard Luecht Computer Based Examinations • Cito in the Netherlands have been using CBE for over 10 years • In 2014 it is expected that 30% of their examinations will be e-assessments • Developed to encourage more schools to use technology • Have introduced a rigorous 6 year programme that every subject is required to go through in order to become an e-assessment Computer Based Examinations The Cito 6 year programme used a testing and piloting approach: • • • • A proof of concept is developed The examination is tested in some schools Dual assessment is offered Finally they move to computer based assessments only – no paper exams At each stage there is a go/no go decision Each stage also focuses on whether the examination provides sufficient added value for the student. Cito have four conditions for an examination to be digitalised 1. By digitalising the examination it adds sufficient value for the student 2. There is an adequate level of support 3. Sufficient funding and technical feasibility 4. There is sufficient security built in Cito have found that over time, digital examinations enable alignment to the student’s real world through contextualisation of the examination and using a medium they prefer. Cito are also looking at testing skills such as listening and viewing skills in languages. Security A note about security: • A fallback position is needed in case of network failure • Items can be encrypted • The examination can block other items • User authorisation is needed • Can introduce a timelock on content • Must be scaleable CBE - Denmark Denmark has online assessment and allows candidates to access the internet (but not for all examinations) Questions require the candidate to demonstrate: • in-depth analysis skills, • presenting a supported perspective, • critical thinking skills, and • use of resources. For example, the Economics examination enables downloading of examination resources 6 days prior to the examination. CBE - Israel Israel examinations are matriculation (end of school) examinations. They introduced Computer Based Examinations in 1999 • In 2013 online matriculation examinations were offered in English, History, Bible, Chemistry, Biology, Geography and Bio-technology • Moved to web based examinations in 2011 • The Bagrut examinations reflects contemporary teaching and learning methods • Their examinations make extensive use of simulation based assessment Simulation Based Assessment (SBA) • Context rich • Allows the candidate to control the stimulus • Enables analysis of dynamic phenomena • Enables data development and data processing Simulation Based Assessment (SBA) The following slide provides an example of a Physics question. • The candidate manipulates the skateboarder to start their ride from various points on the half pipe. The graph records the skateboarder’s kinetic energy and potential energy. Subsequent questions require the candidate to analyse the results and compare results when other variables are introduced. Simulation Based Assessment (SBA) The following slide provides an example of a Geography question. • • • • • Candidates were asked to provide rationale for specific natural disasters that occurred in Mexico. They can use the two animations on the left and open up a series of maps. Both the animations can be altered as the students introduces different conditions or over time What the slide does not show is that Q3 asks the students to develop a means of preventing some of the disasters and the question has a series of links to a series of resources which in turn have links as well. The experience uses the internet through a secure extranet. There is no right or wrong way to answer the question and the candidate has to use the resources to develop their thesis. Data about candidate response time and keyboard strokes are also gathered and used by examiners to improve the simulations Simulation Based Assessment (SBA) Some examples of how technology can support the identification of cognitive processes • Tracking the steps a student used to get an answer • Measuring the response time for certain behaviours • The strategies a student uses when they respond to a task can also indicate their level of understanding All of these are measureable through SBA Enables the examiner to track and assess the thinking process rather than just the outcome of these processes “Embedding a task in realistic scenarios may help students make the connections between targeted skills and conditions of use in real-world problems – this is assessment as learning” Moshe Decalo (2013), Israel Centre for Educational Technology Lessons Learned from introducing CBE • Requires change at all levels • Adapting existing systems originally designed for paper based systems can be problematic • Running dual systems can be cumbersome and produces some duplication • Some do not regard Computer Based Examination as a ‘serious’ examination Lessons Learned from introducing CBE • Cito found the support needs for students, markers etc, decreases rapidly over time. • Israel and Netherlands developed their own software, and found it challenging to continuously adapt it to meet changing technologies. • Israel used geography examination to test whether PBE or CBE provided students with a distinct advantage and found it was not statistically significant. Automated Essay Scoring Automated Essay Scoring (AES) is about the computer marking an essay or extended prose. The computer ‘learns’ what to look for in a particular essay. Automated Essay Scoring (AES) • There are a number of AES programmes available • Earlier versions counted words • Programmes now use sophisticated sets of algorithms to determine trends and usage of pre-determined features or traits such as: o Grammatical conventions – measure error rate, usage, spelling and capitalisation errors o Usage – compares the vocabulary usage with that of high or low quality essays written on the same topic o Fluency and organisation – measures essay organisation, discourse elements and the relationship of these discourse elements, style and sentence variety o Content – measure vocabulary level and essay length Automated Essay Scoring (AES) • Research indicates there is a high correlation between a human marker and a computer HOWEVER most of the research has been undertaken by the AES vendors. • A computer score is, at best, a prediction. Questions are raised about the “hidden judgement”. Such as, whether the marker ‘likes’ the essay or not? Is objectivity the best way to assess an essay? • To date, there have been no significant studies across different population groups. Automated Essay Scoring (AES) There are three different models to introduce AES: 1. A human marks, then the computer undertakes quality assurance 2. Both a human and the computer mark, and then scores are compared 3. A computer marks, then a human undertakes quality assurance Automated Essay Scoring (AES) • In America, Utah and Louisiana use AES, and Florida is looking to introduce it this year for a new state wide writing test. o Florida will use version 2. (from the previous slide), and use a second human marker only where there is a significant difference in the marks given by the first marker and the computer. • Most jurisdictions that use AES are not relying solely on the computer to mark student material. • The longer term vision is that AES will replace the need for any writing test as the computer will constantly assess a student’s writing and can be modified to provide formative feedback. Digital Gaming Digital Gaming • Digital games provide students with a natural learning experience within an informal context. • Research indicates that students are able to learn important cognitive social skills through gaming • Games engage students for long periods of time Digital Gaming What the research has found • Failing up – games provide a safe environment for students to fail. Students tend to continue trying until they succeed. Failure is used as a learning process. • Games provide the notion of an epic win – they create a sense of urgency. Stressful but provides a deep focus on a challenging problem. • Students are willing to work hard if given the right challenge Digital Gaming What the research has found • Games provide students with an opportunity to think “outside the box” – they creatively solve problems. • Games empower students – they inherently trust the gaming environment. • Gamers make sense of their experience together – collaborative problem solving Digital Gaming - Examples Sim City – A player builds a city, takes on the role of mayor and has to balance economics with the happiness of citizens. They need to ensure the city has enough power, water, roads and services (ie, police, health), and attract businesses and tourism. They manage taxes and trade and outgoings as well as the cost of new developments. Digital Gaming - Examples Civilisation – A player starts with a basic village and builds a civilisation of many cities. They manage resources, fight off barbarians, maintain happiness and organise trade, diplomacy and alliances with other civilisations, keeping civilisation safe, as well as conquering civilisations. Digital Gaming - Examples Minecraft – allows a player to build constructions out of textured cubes in a 3D procedurally generated world. Other activities include exploration, gathering resources, crafting and combat. Players must find their own building supplies and food, and find resources to craft tools while avoiding moving creatures such as zombies or giant spiders. Minecraft is now part of the Swedish curriculum and is being used as part of teaching programmes in UK and Israel. Where to from here – Randy Bennett Randy Bennett is the Norman Fredriksen Chair in Assessment Innovation in the Research and Development Division at the Education Testing Service. Since the early 1980s his research is focused on integrating advances in cognitive science, technology and measurement to create new approaches to assessment. Key points: • Education must remain relevant, • It is changing to include the development of new skills and to allow individuals to personalise their experience. • Education is happening any time and anywhere. Where to from here – Randy Bennett Randy Bennett outlined 13 considerations: 1. Education assessment must provide meaningful information to the following groups: o Education policy makers for effectiveness of the process, and preparedness of the populations – because they are accountable for the process. o Teachers and students for feedback and to plan further instruction. o Parents so they understand their children’s progress. Where to from here – Randy Bennett 2. Must satisfy multiple purposes: o An assessment built for one purpose won’t necessarily be suited to other purposes o Multiple purposes might best be served by different related assessments designed to work in synergistic ways. 3. Need to use modern conceptions of competency as a design basis: o 21st century skills o Using technology for domain-based problem solving Where to from here – Randy Bennett 4. Align test and task designs, scoring and interpretation, with modern conceptions o Simulations – better replicate the real world contexts under which the integrated competencies need to be demonstrated o Discrete tasks o Automated Essay Scoring in conjunction with human makers Where to from here – Randy Bennett 5. Adopt modern methods for designing and interpreting complex assessments o Creating opportunities to observe performance o Connecting the observations to meaningful characterisations 6. Tests of the future will make better account of context o Currently ignore the social learning and teaching environment in an attempt to produce inferences, generalisable across contexts. o How a student performs, and the score achieved, is a fact. Why the student performed that way is an interpretation requiring knowledge of context. o If the assessment is embedded into learning, then account of the context becomes unavoidable. Where to from here – Randy Bennett 7. Design for fairness and accessibility o Equal opportunity for individuals is a social value. o That social value has been reflected in standardized tests to varying degrees. o As long as assessments are used, fairness will be an issue. 8. Design for positive impact o Assessment should be designed to be a valuable learning experience – assessment as learning – preparing for it, experiencing it. o Assessment should model good teaching and learning. Where to from here – Randy Bennett 9. Educational assessment must be designed for engagement o Equal opportunity for individuals is a social value. o Assessment results are more likely to be measured if students give maximum effort. o Engagement should be enhanced by posing problems, motivational feedback, use hardware that students prefer, use multimedia and gaming elements. o Embedding assessment into ‘games’. Where to from here – Randy Bennett 10. Respect privacy o Assessment information can be gathered ubiquitously, continuously, and by stealth. o Individuals must know when they are being assessed and for what purpose. o Care is needed – could negatively affect teaching and learning if every action has a consequence 11. Incorporate information from multiple sources o A single test cannot measure a competency domain with sufficient breadth or depth o All assessments have limits o Results from multiple assessments, using multiple methods, and integrated, are more likely to provide meaningful characterisation of individuals and institutions. Where to from here – Randy Bennett 12. Gather and share validity evidence o Legitimacy is granted to a consequential assessment programme by the user community and the scientific community connected to it 13. Use technology to achieve substantive goals o Technology is used to enhance assessment and through that teaching and learning o To do what can’t be done as well (or not at all) using traditional testing methods such as: o measuring existing competencies more effectively and efficiently o measuring new competencies including ones that require use of technology o having a positive impact In conclusion The conference provided much to consider for the introduction of computer based assessments: • Importance of not taking a big bang approach – not all levels or all subjects at one time • The need to ensure there is a good infrastructure for examination development, delivery and marking • e-Assessment needs to reflect what is happening in the classroom