General Issues and Principles notes

advertisement
AMEP Assessment Task Bank
Professional Development Kit
General Issues and Principles of Assessment
Notes to accompany the PowerPoint for the AMEP Assessment Task Bank Professional
Development Kit. Developed by Marian Hargreaves for NEAS, 2013.
Slide 1: Front page
Slide 2: Workshop aims
To review



the key issues in assessment
the principles of assessment and
how they apply in the context of the AMEP
Part 1
Slide 3: Key issues





The stakeholders: who has an interest in this activity of assessment?
The different purposes of assessment: why do we assess anyway?
The stages of assessment: when do we assess?
How do we assess?
The focus of assessment: what do we assess?
Slide 4: The Stakeholders (and the stakes). Who?

For language assessment stakeholders usually include the participants at every level
involved in the process:

Of course – the students – they want to know how they are doing and be able to
prove their ability. For students, the stakes are very high and may determine the course and
direction of their lives.

The teachers – they also want to know how the students are doing, how they can
best improve their teaching practice and assist the students in their achievement. For
teachers the stakes are not so high, the assessment practice may have more impact on their
professional development and teaching style.

The management – which needs evidence that the program is working and there is
value for money. The stakes here may be quite high as funds (both from sponsors and from
fees) may depend upon good results.

The administrators who deal with the daily keeping of records and process the
evidence provided by assessment. The stakes are not high for administrators – it’s just a part
of the job.

DIAC – the sponsors paying for the program, in this case the Adult Migrant English
Program (AMEP), and the development of the Assessment Task Bank (ATB). The personal
© NEAS Ltd 2014
1
nature of the stakes does not apply here, but decisions made regarding the future of the
program may be affected by the results of assessment.

The rest of the world which takes the results of assessment and makes key
decisions, about employment, funding, development etc. The stakes are even less personal
here, but hindsight can trace the effect of assessment.
Slide 5: Different purposes of Assessment: (Why?)

Prior to – diagnostic or placement: to see what class a student should be put into; to
find out a student’s strengths and weaknesses.

As – formative: to see how students are progressing.

Of – summative: to see what students have actually learned and achieved.
The aim of assessment for the Certificates of Spoken and Written English (CSWE) is
summative. But assessment of learners within the Adult Migrant English Program also
includes diagnostic and formative. It is an area for discussion and development in support of
classroom practice.

For – as a continuous and integral part of the inclusive, student-centred learning
process.
Slide 6: Stages of Assessment: ( When?)
The Teaching Learning cycle
Talk to the diagram as much as you feel necessary/useful.
Give as handout.
© NEAS Ltd 2014
2
© NEAS Ltd 2014
3
Slide 7: The focus of assessment: (What?)
Always a key question, what exactly is it that we are trying to assess? This is a question that
should be given serious consideration and regular review – it can be too easy to make
assumptions.
For example a simple reading text may contain cultural norms which, while very familiar to
teachers, may be startlingly new to migrants.
A simple listening question may require an answer that actually involves a lot of writing.
What have you been teaching/want to teach? A description for a Cert I Learning Outcome is
very different if you want to describe a place, from describing a person.
What does your centre want?
This is intrinsic to the validity of assessment, so there is more on this later.
Slide 8: How do we assess?
Styles
•
Self assessment
•
Formal tests/exams
•
Informal, in-class and/or observation
•
Group assessment
•
Peer assessment
•
Continuous
Different styles suit different types of assessment.
Slide 9: examples of different Cert modules and suitable types of assessment
But, for the achievement of CSWE learning outcomes, modules, and the completed
Certificate, group and peer assessment are not appropriate.
What do we assess: see the curriculum, and the specifications of DIAC.
Part 2: Underpinning principles
Slide 10:
Best practice in assessment
Identify clear learning outcomes - assessment should be explicit – students should know
what is expected of them and ensure that both student and teacher play a pivotal role in
focusing learning and teaching on intended learning outcomes.
Promote active student engagement in learning.
Recognise and value student diversity.
Provide an opportunity for success for all students.
Embody high quality, timely feedback. Produce grades and reports of student learning
achievements that are valid, reliable and accurate.
© NEAS Ltd 2014
4
Meet expectations and standards of national and international stakeholders, where
appropriate.
Require the involvement of leaders and managers to achieve quality enhancement and
continuous improvement.
Slide 11: The Cornerstone Principles
Validity
Reliability
Practicality
Slide 12: Validity
Does the assessment task measure what you want it to measure?
Slide 13
There are several types of validity, but construct validity is often used as the overarching
tenet.
The construct of a test is the theory that the test is based on. For language tests therefore,
this is the theory of language ability. Construct validation is about investigating whether the
performance of a test is consistent with the predictions made from these theories.
So, for us using the CSWE framework, the relevant theory is that of social,
communicative language.
When we talk about validity and in particular about construct validity, we are usually referring
to the macro skills and whether the test is actually assessing those skills. For example, if we
are talking about the construct validity of a reading task in a proficiency test, we might start
by trying to define the reading skills and strategies that learners use to read a text. The
task will then be designed with those skills and strategies in mind and will attempt to target
them specifically in order to measure a learner’s ability in reading. If the task is designed for
a specific curriculum with itemised criteria, then the task must specifically target and measure
those criteria.
Hence the CSWE Learning Outcomes.
Cognitive related validity is concerned with the extent to which the cognitive processes
employed by candidates are the same as those that will be needed in real-world contexts
beyond the test. These contexts are generally known as Target Language Use (TLU). This
often crops up in the pursuit of Authenticity, both in performance and in the texts used in
tests.
Context related validity is concerned with the conditions under which the test is performed
and includes aspects such as the tasks themselves, the rubrics and the topic as well as the
administration conditions. The question of Fairness in a test comes under context related
validity.
© NEAS Ltd 2014
5
Face validity is where a “test looks as if it measures what it is supposed to measure”
Hughes, 2003:33). This is very popular with stakeholders who may not know a great deal
about assessment or validity itself.
Slide 14: Reliability
Is the assessment consistent across tasks and raters and assessors?
Are the conditions of administration consistent across assessment occasions?
Reliable tests produce the same or similar result on repeated use. Test results should be
stable, consistent and free from errors of measurement.
There are two types of reliability in methodology. Internal and External. I am mostly going to
consider internal reliability and begin with the reliability of the task.
This is where criteria become important. Criterion referenced tests are ones in which
candidates are assessed against specific target behaviours. Test scores are then an
indication of what a candidate can and cannot do.
Reliability is also affected by the setting and maintaining of standards. Wherever multiple
versions of a test are produced it is important to be able to show that these different versions
are comparable and that the standard (in terms of proficiency level) has been maintained.
This means ensuring that all the tasks used to assess a particular LO are equivalent.
While this is hard to achieve, we need to come as close as possible to equivalence. For
example,
Slide 15: Example: Cert II C2 Participate in a spoken transaction for information/goods
& services.
In a low level speaking task, if the prompt includes a list of information which structures the
task, then the prompt for all tasks for that LO should include the same amount and type of
information as that included in the list in the speaking task.
For example, see how without the prompts, the second version of this speaking task would
make it much more difficult for a learner.
Key role for the ATB – to maintain consistency between tasks and reliability.
The conditions under which the task is administered also need to be equivalent. For
example, these include the amount of time allowed for pre-reading, the use of adequate
equipment for playing listening tasks, the availability of dictionaries etc.
However, any test score will reflect a proportion of other factors, known as error.




the day of the test session (the weather, administration etc might be different),
the individual candidate may vary through tiredness, loss of motivation, stress etc ,
the markers or the test version may perform differently,
other factors beyond our control (like a traffic accident outside).
Reliability wants to measure real ability, need to reduce other factors as much as possible.
© NEAS Ltd 2014
6
Slide 16: Scoring
At this point I would like to clarify the distinction between rating and marking.
Marking applies to the receptive skills, reading and listening.
ATB assessment tasks have a marking guide or answer sheet. Theoretically this can be
used by almost anybody and should if correctly designed, with all possible, acceptable
answers ensure marking reliability. However, there is enormous diversity among AMEP
teachers both in experience and in expectations. ATB answer sheets therefore now have,
wherever possible, explicit details related to the criteria of exactly what answers are
acceptable and which questions, or how many questions learners have to get correct in
order to achieve the Learning Outcome.
Rating applies to the productive skills, writing and speaking, and it is much more difficult to
ensure consistency. Scoring a performance is much more subjective. The criteria in the
assessment grids helps a great deal, and recording is a must, but still problematic.
The other type of reliability that we therefore need to take into account is that of the raters
themselves, ie rater reliability.
There are two types: inter-rater reliability and intra-rater reliability.
Inter-rater reliability is whether two raters rating the same performance are rating with the
same degree of severity. It can very difficult to know whether there is consistency across
ratings.
Intra-rater reliability refers to whether the same rater rates all the learner performances with
the same degree of severity.
Slide 17: Practicality
How practical is it to develop, administer and mark the tasks?
This is a major issue and an integral part of deciding the usefulness of a test (Bachman &
Palmer). Factors that need to be considered include the resources necessary to produce
and administer the test (which includes marking, recording and giving feedback).
Slide 18: Impact and Fairness: important aspects in current theories of validity.
Impact - consequence of taking that test and the effect of passing or failing, both on the
candidate themselves and on educational system and society more widely. The effects
and consequences of a test include the intended (and hopefully positive) outcomes of
assessment, as well as the unanticipated and sometimes negative side-effects which tests
might have. For example, the washback effect from the introduction of a new test may affect
(positively or negatively) the way in which teachers teach.
Many language tests, especially summative tests, are very high stakes tests: the
consequences of passing or failing will affect the candidate’s entire life.
Examples of a high-stakes test could include a driving test – passing a driving test changes
your whole life. If you fail, you can take it again of course, but you will not be able to do
© NEAS Ltd 2014
7
things that you may have been counting on, eg drive your children to school, drive yourself to
work and thus avoid a long and tedious journey by public transport, go on holiday.
In the AMEP context, time is very limited. Clients want to make the most of their 510 hours of
English tuition, pass their tests and get their certificates. If they fail an assessment, especially
towards the end of term, it may not be possible for them to do another for some time,
especially if it does not fit in with the rest of the class. If a TAFE course for example requires
a particular LO/assessment, then the client may not be able to enrol in the TAFE course for
another 6 months or even a year.
It is therefore very important that the test be a fair one:





A fair test should not discriminate against sub-groups of candidates or give an
advantage to other groups.
A test should be kept confidential and secure so that candidates cannot see the
questions in advance.
Results should be clear and easy to understand.
A test should be fair to those who rely on the results in addition to candidates, such
as employers who need to know that the test is consistent and accurately reflects the
ability being tested.
A fair test should have no nasty surprises for the candidate – the form and structure
of the test should be familiar and the content appropriate to what has been taught.
Rubrics should always be both clear and appropriate.
Rubrics can be understood to be either or both the instructions of a task, and the scoring
standard for that task. Here, we are largely talking about the instructions to teachers and to
students.
Each assessment task should have clear and consistent instructions to teachers for the
administration of the task, and the scoring/marking of the task, including the answer key if
relevant.
The instructions for the learners should also be clear, appropriate and consistent. The
language of the instructions should be unambiguous and at a level of language below the
level being tested. They should not be longer than necessary, and the learner should have
the opportunity to clarify any possible confusion or misunderstanding before the assessment
begins.
For example, the instructions for a writing assessment should not constitute a reading
assessment in themselves. The questions in a listening assessment should be understood
before the assessment begins and the recording is played. If the assessment is to be done in
sections, the procedure should be clearly explained. If dictionaries are allowed, the learners
should know this, and also know what type of dictionary is allowed – can they use electronic
dictionaries, for example?
Slide 19: Task-based language performance assessment (TBLPA).
© NEAS Ltd 2014
8
This approach to assessment looks at the way in which language is used to show
competency. It is dependent upon consistency of performance with more than one event
being used to determine ability.
There are two approaches in this theory. The first uses construct validity to show that the
learner has a language ability, and authenticity/generalisability to extrapolate the ability in the
area of target language use (TLU). The second approach uses content validity to show that
the learner can do specific real-life tasks, for example function in the role of an air traffic
controller. The significance of TBLPA is that both approaches are relevant for learners in the
AMEP; not only do they have to perform specific tasks, but they also have to demonstrate
that their language skills apply in a more general sense to Target Language Use.
Slide 20: Authenticity
Situational and interactional
It is generally accepted that assessment tasks and items (ie the questions) should represent
language activities from real life. This relates to the Target Language Use mentioned earlier
(situational authenticity).
Interactional authenticity is the naturalness of the interaction between the test taker and
the task and the mental processes which accompany it. That is, is it for the test taker, a
relevant task or just a meaningless exercise.
For example, a task based on listening for specific information can be made more
situationally authentic if an everyday context, such as a radio weather forecast, is created.
It can be made more interactionally authentic if the test taker is given a reason for listening –
eg planning a picnic that week and must select a suitable day.
It may be necessary to adapt material and activities to learners’ current level of language
proficiency. This will reduce the authenticity of the materials, but the situations that the
learners engage in and their interaction with the texts and with each other can still be
authentic.
We will consider authenticity a bit more when we look at tasks themselves.
Slide 21: Questions
Questions are an essential part of assessment, but the style of questioning varies according
to the macroskill being assessed. Question items, their advantages, disadvantages and
appropriate use are therefore addressed in a separate module.
In brief, question items for reading and listening include:
•
•
•
•
•
•
•
•
Multiple choice questions (MCQs)
Grid completion
Summary cloze (gap fill)
Short answer questions
Sentence completion
Matching
Ordering
Information transfer
© NEAS Ltd 2014
9
Question items for speaking generally include the How, what, why, when, where, etc
question forms.
Slide 22: Feedback
Arguably the whole point of assessment is in the feedback.
All assessment, including summative assessment for achievement, should provide an
opportunity for feedback as an important part of the learning process. Feedback should be
given following every occasion of assessment.
Feedback can be given either in writing or verbally, but should be confidential at all times,
except where a whole class activity is undertaken, in which case anonymous feedback
should be given to the whole class. However, as it is very important to keep all documents in
a summative assessment task set secure for future assessment purposes, answer keys
should not be given out, nor learners allowed to keep their corrected response sheets.
Slide 23: and finally You need to be humble to write assessment tasks! Even the best assessment task writer
will write bad items on occasion.
Here are some points which are useful to remember when writing items.
• do the task yourself. It isn’t enough just to look it over.
• ask your peers to do the task and give their feedback - nobody writes good items alone.
• don’t be defensive about your tasks – we all write bad tasks. Accept suggestions for
improvement.
• get feedback from respondents on what they think the task and items are testing.
• check respondent feedback with the LO specifications.
• check overall compliance with specifications.
• make sure that the test method is familiar to learners.
• pre-test the task on learners.
• check whether the language of the items is easier than the language of the text.
• check whether all possible and plausible answers have been included in the answer key.
• make sure that the item is contextualised.
• check how true to life the item is and whether it looks like a real world task.
Evaluation
Absolutely essential : do the task yourself.
© NEAS Ltd 2014
10
When the task has been written, checked, piloted, modified and if finally ready for the
Assessment Task Bank – get the proof reader to do the task themselves. They usually turn
up something!
Use a checklist to ensure all aspects of the task are covered. Headings in the checklist are:
• the conditions under which the task is administered
• the characteristics of the task
• the features of the task. For example, for a listening task, these would be:
- The text
- The items
- The text/item relationship
- The answer key.
Going through the evaluation checklist and the specifications for the learning outcome in
the curriculum should identify problems in the task that need to be addressed or indicate that
the task is unsalvageable and should be abandoned.
Ask other teachers to look at your new task.
Pilot the task with students
Piloting every task with learners is a vital step in the task development process. Piloting
enables you to check whether the task works and elicits performances that are
compatible with the performance criteria. It establishes whether the task requires further
modification or if it should be rejected.
Piloting should be done with at least five learners under uniform conditions. Where possible
obtain a range of learner profiles. Vary the language background of learners, their age, their
level of formal education and so on.
Give learners the possibility of commenting on the task. This feedback should be
considered in addition to the learners’ performance in the task.
Modification
After the task has been done and evaluated, it should be modified. Problems which have
been identified during task evaluation should be rectified during this stage according to the
feedback received. The revised task is then ready to be done again, either by peers or by
learners.
Slide 24: References
Alderson, J. C. (2000). Assessing reading. Cambridge, Cambridge University Press, 69.
Bachman , L.F. & A. S. Palmer (1996) Language Testing in Practice, Oxford University
Press,.
Brindley, G.P. (1995) Language Assessment in Action. NCELTR.
Hughes, A. (1989). Testing for language teachers. Cambridge, Cambridge University Press,
pp. 59–140.
Khalifa, H. & C. Weir (2009) General Marking: performance management, in Examining
Reading: Research and practice in assessing second language reading (Studies in
Language Testing, vol 29, pp. 276-280.
Manual for Language Test Development and Examining (2011) Language Policy division,
Council of Europe.
© NEAS Ltd 2014
11
Messick, S (1989) Validity, in Linn, R.L. (Ed) Educational Measurement (3rd Ed.) New York;
Macmillan, pp.13-103.
Mislevey et al. (2001) Design and Analysis in Task-Based Language Assessment. Language
Assessment.
Shohamy, E. (1985). A practical handbook in language testing for the second language
teacher. Israel: Internal Press.
Weir, C. J. (1997). The testing of reading in a second language. In C.Clapham & D. Corson
(Eds) Encyclopedia of Language and Education (pp. 39-50). Dordrecht: Kluwer Academic
Publishers.
© NEAS Ltd 2014
12
Download