File

advertisement
An Analysis of a Set of Three Tests Measuring Reading Comprehension
Through the Awareness and Understanding of Some English Vocabulary.
Gloria Bello
EDU 570 Assignment 1
Professor Ken Beatty
July 2nd, 2015
An Analysis of a Set of Three Tests Measuring Reading Comprehension
Through the Awareness and Understanding of Some English Vocabulary
1. Introduction
The purpose of this paper is to analyze the effectiveness of three subtests of a
progress test which defined “as part of an ongoing assessment procedure during the course
of instruction” (Bailey, 1998, p. 39) in order to assess the students´ vocabulary awareness
and comprehension as well as reading comprehension. This will allow me to reflect upon
the students´ skills, processes and knowledge that I want to assess, as well as the
identification of the most pertinent design (C tests, cloze passages, filling the blanks,
multiple choice) or scoring method to measure the language constructs. The practice of
knowing how to design appropriate tests, identifying the learning and context needs of the
students, supports the improvement of the teaching and learning practices. It also
encourages classroom research processes re
garding performance assessment, validity, and reliability of the test that are usually applied
to score the students,
calculate item facility and item discriminability, differentiate the
impact that can make a direct and indirect tests of language skills. Finally, the analysis of
the test construction strive on principles that guide a communicative language test
(Sevignon and Berns, 1984). Knowing these principles may teach me: to build tests from
existing theory and practice; to avoid to start from nowhere; to choose better content to
promote communicative activities to promote motivation, substantiality, integrity and
interactivity; to do everything to encourage the test takers´ best response such as better
instructions of what and how to do it, assertive time expectation for completing the test,
reference material needed in the test, etc. and finally, this work may help me to identify
how the effect of a test has on the school teaching practices. Teachers can transcend from
knowing how to better create language English test to assess our own classroom practice as
well as other teachers´. Thus, this work can help other teachers to improve their test
building.
2. Subjects
The subjects for this test are sixth grade students whose language abilities do not
even belong to A1 level according to the Common European Framework of Reference
(CEFR), just a low percentage of these learners can understand, spell and read short
sentence. The group is made up by boys and girls around 10 to 14 years old. They study in
the afternoon shift and most of them, according to a survey run last year, belong to social
strata 1 and 2. They are dedicated to the school academic studies; just a little percentage
work in the other shift of the day. Most of them live with their own families but a 16% of
the learners are living in ICBF institutions; these are State places where the students are
protected if their parents do not take care of them. They all live in Bogotá but not all are
from this city. There are some students who come from other Colombian cities like
Medellín, Valledupar, or Magdalena. They moved to Bogotà due to family issues or
Colombian political problems such as displacement by the “guerrilla”. Regarding their
learning, the sixth grade students are approached to artistic and sport interests but reading
or writing are not their main focus neither their interest to study English. From around 25
students per classroom, just 1 or 2 students really like English and act accordingly.
3. Constructs
The constructs measured in these tests include Vocabulary awareness, Reading
comprehension and Vocabulary comprehension.
3.1 The First Construct: Vocabulary awareness. It talks about the use of vocabulary
acquisition strategies, especially this called “planning strategy”. (Nation, 2001). According
to Flores (2008) “the learners choose what to focus on and when to focus on it”. This fits
in the test in in the way students are able to apply an strategy (picture exploitation) in order
to be aware of the vocabulary they need to use to be able to write the sentences.
3.2 The Second Construct is Reading comprehension. It can have a varied set of definitions,
but for the purpose of the assessment of this test, I will take reading comprehension as “the
process of constructing meaning by coordinating a number of complex processes that
include word reading, word and word knowledge, and fluency.” (Vaughn and Boardman,
2007, p.2). Additionally, Snow (2002) and Grellet (2006) state that reading comprehension
is also a process of extracting and constructing meaning through a continuous interaction
between the written language and the reader (Ajideh, 2003). Additionally, Nunan (1999)
highlights that this interaction involves linguistic (bottom-up) and background (top-down)
knowledge and Mikulecky, (2008) explains that the background knowledge activation can
be done when the readers compare information in the text to their own prior experience
and knowledge, “including knowledge of the world, cultural knowledge and knowledge of
the generic structure”(Gibbons, 2002, p. 93)
This theory is observable in doing the subtest of reading comprehension subtest
given that students need to construct meaning by interacting with the text including word
reading of a mutilated paragraph that will allow the readers to identify the words that the
text has and does not have in order to construct the message. This process of word reading
and word recognition for constructing meaning will be activated through the use of the
students´ bottom up and top-down reading processes, it means that students´ knowledge is
activated due to their content schema and their paradigmatic competence (Berns and
Sevignon, 1984), prior experiences and formal knowledge students have gotten from the
classroom instructions and activities regarding colors, body parts and cloths.
3.3 The Third Construct: Vocabulary comprehension. “Vocabulary in learning a language
is more than the use of words but the understanding and use of lexical chunks or units, for
instance “good afternoon” “Nice to meet you” are sentences that have more than one word
and which are constituted as vocabulary
with a formulaic usage. In other words
“...vocabulary can be defined as the words of a language, including single items and
phrases or chunks of several words which covey a particular meaning, the way individual
words do. Vocabulary addresses single lexical items—words with specific meaning(s)—but
it also includes lexical phrases or chunks. (Lessard-Clouston, 2013)
The Lexical Approach clearly states that learning a language should not be seen
from grammar to vocabulary, it suggest that learning a language goes beyond. “Language
consists not of a traditional grammar and vocabulary but often of multi-word prefabricated
chunks” (qtd. In Thornbury 4).
This fits in the test in the way that the vocabulary is seen as part of the formulaic
language used for communication. Students use the vocabulary to express a message of a
situational content. In this way, the lexical approach is observable.
4. Purpose of the tests
The entire test seeks to find out how much students have achieved in the English
subject during the second academic term (colors, clothing, and body parts) after continuous
classroom instructions and after running a Cambridge EFL Test (Cambridge English: Key
for Schools, 2006) intended to examine the schools´ students current English knowledge
applied at the beginning of this academic year. Observing the skills reading and writing
skills, it was evident that students could not recognise nor spell and write words, phrases
and short sentences from a text. So, a very basic level of vocabulary comprehension and
sentence writing were seen in the test results. Therefore, it was considered necessary to
encourage students to develop
Vocabulary Awareness, Reading Comprehension and
Vocabulary Comprehension, being these the three constructs of the test development
project. Following we can see more specific purposes for assessing the three mentioned
constructs:
4.1.
To determine students’ awareness of the vocabulary learned in class such as colors,
body parts and some garments by writing some sentences observing Cesar´s
Appearance.
4.2.
To evaluate the grammatical structures students are using when writing the previous
sentences: person + verb + complement.
4.3.
To check the students´ understanding of the vocabulary by using it into a specific
context: My Best Friend Carla.
5. The Three tests
The following is the first draft of the three tests. The aims of the tests are to measure
the sixth graders´ awareness and comprehension of basic vocabulary like colors, body parts
and cloths as well as to identify its use into specific written contexts. The complete test is
available in the next link, although it will be shown and explained within this section of the
paper.
https://docs.google.com/forms/d/1FUvmF8fsthLQUqxXEb572j46BG69Do04AWMB5VyXJo/viewform?usp=send_form
5.1.
Subtest 1. Writing About Cesar´s appearance: a subjectively-scored test with 10
items, without choices for the student (Appendix A).
The goal of this part of the test is to identify the students´ level of awareness of
the vocabulary studied in class related to physical appearance in third person and in
simple present tense using an image of a person as the input material to understand
the meaning.
5.2.
Subtest 2. Reading About my Best Friend Carla, an objectively-scored test with 10
items of another type, such as a Fill in the Blanks (Appendix B).
The goal of this part of the test is to encourage students to read and comprehend a
mutilated passage in which they need to add the vocabulary of colors and body parts
to give meaning to the entire paragraph.
5.3.
Subtest 3. Identifying Tatti´s Appearance: an objectively-scored test, multiple
choice test with 10 items, four choices for each item (Appendix C).
The goal of this part of the test is to identify the students´ level of comprehension
of the vocabulary studied in class through fixing the words into specific contextual
situations. Students need to choose a word in order to give meaning to each of the
10 sentences.
6. Scoring procedures for Subtest 1
The following table provides markers with instructions on how to score each test
item of the subtest 1 which assess students´ vocabulary awareness through sentence
writing. For doing it, the analytic scoring approach will be used. It will evaluate the
“students´ performances on a variety of categories” (Bailey, 1998, p. 190) such as content,
organization, vocabulary, language use and mechanics. “The weights given to these
components follow: Content: 13 to 30 points possible; organization: 7 to 20 points possible;
vocabulary: 7 to 20 points possible: language use: 5 to 25 points possible; and mechanics: 2
to 5 points possible” (Bailey, 1998, p. 190). So, this assessing system will be used since the
students will write isolated sentences instead of a composition. Therefore, it is essential to
identify the writing components of the students´ sentences specially those components with
higher scores like content and language use.
Following, you can see the descriptors for each category:

CONTENT: Students are able to describe Cesar´s appearance.

ORGANIZATION: Students write organized sentences following the grammatical
structures: person + verb + complement.

VOCABULARY: Students use the vocabulary learned in class such as the body
parts, colors and cloths.

LANGUAGE USE: Students write coherent sentences in English to describe a
person.

MECHANICS: Students use the capital and lower case letters and a period
correctly.
These descriptors as well as the categories and the score ranges are more clearly stated in
the excel charts created to evaluate each students´ sentences. An example of the writing
assessment of a student is provided (Appendix D).
7. Pre-piloting the three subtests
The three tests were pre-piloted with two teachers.
Teacher 1: Yeny Franco.
She studies Modern Languages (Spanish - English) and has experience in pedagogy and
second language teaching. She has worked as an English and Spanish teacher during 10
years in private schools and she is currently teaching these two subjects in a public school
in the south of Bogotá. She has teaching experience with kids, children and nowadays with
teenagers. Teacher Franco is currently doing a Master degree in Teaching English for SelfDirected Learning. Part of their insights about teaching is that effective teaching involves
the interaction of three major components of instruction: learning objectives, assessments,
and instructional activities and effective teaching involves adopting appropriate teaching
roles to support the students´ learning goals.
Teacher 2: Yesenia Moreno.
She is graduated from the Universidad de la Salle where she received the degree of
Bachelor of Modern languages. She has taught English-Spanish in private schools with
students from kindergarten, primary and high school. She has a 10-year experience as a
Spanish and English teacher, being eager to improve
her academic and professional
practices as well as her English proficiency. For her, it is important to teach not only
English grammar knowledge but values like respect, honesty, cooperation and integrity.
She considers discipline and self-reflection vital strengths to be able to reach personal
goals.
The following are the comments that teachers made. Most of them are the same for
the three subtests; therefore, they are grouped as comments for Subtest 1, 2 and 3. Just
some comments were given just for the subjective test.
7.1
Subtest 1, 2 and 3

Given that the students are demonstrating their English reading performance, the
instructions should be in English such as commands.

English instructions can be complemented with examples for a better
understanding and for a positive washback.

There should be a subtitle that indicates the skill that is going to be evaluated
for example: vocabulary section .
7.2
Subtest 1

There should be 10-line structured so that students can write the sentences. It is
necessary to explain that the students should write a minimum of 10 complete
sentences in English, since students tend to write in Spanish although they are doing
an English task.

It is important to analyse if the presentation of the subtest 1 is appropriate for the
students ‘English level: true beginners’ sixth graders.
8. Administering the test
The test was administered to 12 students at a public school in Bogotá, Colombia. It
was taken by the students referred in section two “subjects” of this paper on May 28th,
2015. The exam was administered virtually using a Google form document
by the
Technology Teacher of the school since I am in a post natal medical leave. So, there was
some help from my colleague to be able to administer the exam. As a result, I could have
the students´ responses easy and quickly. Some of the students sent answers are seen in
Appendix E.
Regarding
the issues while taking the exam, the teacher did not report any
inconvenience neither with the exam itself nor with the conditions in which the test was
taken.
9.
Results from the three subtests
The three subtests were scored by the researcher. The subjectively-score subtest was
also scored by a colleague, an experienced teacher who has worked with different kinds of
population: kids, teenagers and adults. She has taught Spanish and English in private and
state schools and currently, she is teaching in a public school where a bilingual program
under the National bilingualism Policies has been structured; therefore, she and some
foreign teachers have organized programs to increase the students´ English proficiency.
The results of the scoring are as follows:
9.1
Subtest 1
According to the Analytic Scoring Criteria used to evaluate the students 10sentence writing, in which different categories were assessed -content, organization,
vocabulary, language use and mechanics and the descriptive statistics analyzed from
the subtests´ results (Appendix G), I could identify that the average score is 50
which says that most of the students did not reach a high score out of 100. Since the
middle score is 42 and the most frequent score is 33, I would say that the students
did not do well when writing. It is important to clarify that this subtest is scored out
of 100 points distributed in this way: “Content: 13 to 30 points possible;
organization: 7 to 20 points possible; vocabulary: 7 to 20 points possible: language
use: 5 to 25 points possible; and mechanics: 2 to 5 points possible” (Bailey, 1998, p.
190).
Comparing these results with the ones obtained by the rater 2(Appendix H),
it is evident that the results were kind the same, the range of the scores were 57,
one point more than the rater 1. The most frequent score and the middle score are
exactly the same. So, for helping the students improve the writing, it would be
important to clarify the subtest instruction more and emphasize the importance of
the writing categories of the Analytic Scoring approach.
Besides, I can say that these results are reliable since the value gotten from
the Cronbach alpha´s analysis is 0.77, which tells a greater value of consistency or
reliability with which the two raters evaluate the same data. In other words, there is
a consistency interrater reliability in the two raters´ scores.
Finally, the frequency polygon (Table 1) clearly states a high numbers of
students in the lowest grades. It means that the higher the values are, the less
number of students are. It is important to find out what is the most problematic
issues in the students´ sentence-writing taking into account the five writing
components (Analytic Score System).
9.2
Subtest 2
Taking into account the frequency polygon (Table 2) of the answers gotten
in this objectively-scored test with 10 item in which the students need to choose a
word from 4 options in order to fill in the blanks of a mutilated paragraph, I could
identify as in the subtest one that most of the students got wrong answers when
reading and filling in the paragraph. In this part of the test, it is observable even
lower grades than in the first one. The highest score was 60 while in the subtest one
is 80.
In such a way, the descriptive statistics for this subtest state that being the
average score is 18, 20 the most frequent score achieved by the students and 20 as
the middle score
(Appendix J) indicate that the students’ knowledge about the
description of physical appearance in which they needed to use vocabulary such us
colors, body parts and cloths is too low to be able to fill in the paragraph. It may
suggest that students do not understand the vocabulary that has been learned in
class. It could increase the possibility to have better grades clarifying the importance
of observing the input material (the image of the little girl) very carefully or by
changing the input material used for this subtest.
Besides this, the Item Facility chart of the subtest 1 (Appendix K) shows
that the most of the items yield enough variance which is very useful, except
number 11 and 18 which are of acceptable difficulty; however, items number 12,
13, 15 and 19 look with a doubtful at 0.16. Just two of the items are in the middle
difficulty range. As a conclusion, I could say that it would be better to rewrite most
of the items, especially the one that got 0, since just two of the 10 items got an
acceptable difficulty range 0.33 and 0.41, and ideal value would be from 0.15 and
0.85 (Oller, 1979 cited by Baily, 1998) .
In order to provide a more detailed analysis of the subtest items, I would
describe the item discrimination I.D. (Appendix L) between high and low scorers
(Bailey, 1998) of this subtest. In looking at the data, it seems that all the items need
to be rewritten for a criterion-reference test since some items´ values are 0.0 and the
rest are -0,3 it means that they do not discriminate between high and low scorers at
al. Oller (1979) cited by Bailey (1998) clarified that “I.D. values range from +1 to 1, with positive 1 showing a perfect discrimination between high scorers and low
scorers, and -1 showing perfectly wrong discrimination. An I.D. of 0 shows no
discrimination or no variance whatsoever” (p.135). So, to improve the I.D. of this
subtest, the options to be chosen can be colors, parts of the body and cloths instead
of just one category. It may confuse the students because it turns to be more
difficult. Observing the Distractor analysis chart (Appendix M), I can observe that
all the options were chosen excepting the C for number 17 and the distractor A for
item 20. In the philosophy of the multiple-choice format, this shows that these
distractors should perhaps be changed, especially A for the item 20 given that being
this the correct answers, nobody chose it.
Looking at the Response Frequency Distribution (Appendix N), can give me
a better idea of how the distractors are functioning. So, the results here support
those in I.D. chart in which none item discriminates at all. In this data, I can see
that except item 18, no one in either group (high and low scorers) chose identical
answers. However, it is important to emphasize that despite these results, the low
scorers seem to have chosen the correct answers while in the high scorers nobody,
except one person did so. In sum, the high scorers and the low scorers differed
substantially the distractors they chose and therefore, to make certain revisions as
the why just one person in the high scorers chose the correct answers while in the
low scorers at least one person chose most of the correct answers.
9.3
Subtest 3
Analyzing
the frequency polygon (Table 3) of this
objectively-scored
subtest with 10 items in which the students need to complete a sentence by
observing the input material and choosing a word from 4 options of the vocabulary
studied to give sense and meaning to the sentence; they basically needed to use the
vocabulary studied in contextual sentences, I could identify as in the subtest one
and two that just lower students got correct answers when responding to the task. It
is observable that even the highest score was gotten in this part of the test, there
were more wrong answers in this part than in the others.
Besides this, the statistics for this subtest (Appendix J) show a significant
different among the mean of the subtest 2 in comparison with the mean of the
subtest 3. It suggests that the students´ performance is higher in this part of the test.
However, by analyzing the median and mode, I found that there was the same low
score as the most frequent score gotten by the students but the median increased a
little bit more. All this indicates that the students’ vocabulary comprehension
(colors, body parts and cloths) continues being too low to be able to reach the goal.
In order to help students to increase such as results, I consider important to teach
students how to do picture exploitation which is an strategy used for the input
material of the three subtests. They perhaps did not observe carefully the images.
Additionally, the data of the Item Facility of the subtest 1 (Appendix K)
shows that most of the items have enough variance, except number 22 and 30
which are of very low difficulty; the rest of the are in the middle or acceptable
difficulty range. So, except these mentioned items, most of this subtest does not
have to be rewritten nor revised since better scores were obtained by the students.
In order to continue analysing this subtest items, I found that the item
discrimination I.D. (Appendix L) between high and low scorers (Bailey, 1998) do
not highly discriminate. It shows that just a few of the items are fairly solid but in
general the items like 21, 22 24, 25, 27 and 30 do not show discrimination,
therefore, they need to be revised. Though, observing the Distractor analysis of the
subtest (Appendix M), I can observe that all the options were chosen excepting the
option B for the item 30. This shows that with the exception of just one item, most
of the items show perfect discrimination.
The Response Frequency Distribution
(Appendix N) specifies how the
distractors are functioning. In this data, I could see that there was not any distractor
that was not chosen by low or high scorers. And differently to the subtest 2, there
were more scorers from both groups that chose identical answers. However, item 30
showed only borderline variance. As a conclusion, the high scorers and the low
scorers did not differ substantially but it is important to make some revisions
regarding the distractors that nobody chose like A and D for item 22, a for item 24,
c for item 25, B for item 27 and A and B for item 30.
9.4
The Entire Test
Looking at the frequency polygon of the entire test (Table 4) I
could
identify that most of the correct answers are located within 0 to 100 out of 300
points and very few students reached 50% of the correct answers, it means 150
points.
The descriptive statistics (Table 5) corroborates
what the frequency
polygons shows. As the mean as the mode and median suggest that most of the
students got third quarter of the total score of the test. So, there is a constant amount
of incorrect answers in the three subtests. It makes me think that not only the entire
test should be reviewed but also the conditions in which it was presented –run by
another different teacher, there was not any person capable person in the subject that
could answer the students´ test doubts, there was not extra explanation about the
instructions of the test, especially what picture exploitation may imply.
And although most of the scores are significantly situated in one part of the
polygon, the standard deviation or the average amount of difference (Bailey, 1998)
of the entire test suggests that the scores are not compacted, they are distributed
among the lowest scores.
10. Conclusion
In conclusion, I have to say that this original test development assignment was
tremendously stimulating to start thinking about how complex and serious the process of
creating and evaluating the language tests is.
First of all, I would like to reflect on the importance of the test instructions. They
should be as clear and direct as possible. Since the students’ English level is very low, it
may be helpful to have clear examples of the expected answers located in the instruction
part.
Another important issue observed in the test development is the stimulus material
that was used and the corresponding tasks that students needed to do. The results of the test
showed that there should be internal and external factors, in all the subtests, that contributed
for the poor results of most of the students (Table 4). One way to improve it is to model
how to analyze the pictures of the subtests during the classroom instructions so that the
students know what exactly they need to watch and write. Besides this, I could not
administer the test in the classroom. It might have contributed since students´ doubts could
not be clarified by the Technology teacher who run the test. Sometimes poor results are not
because of their lack of knowledge but because of their lack of understanding of test
instructions.
The results from piloting the tests also showed that there should a different
alternative to evaluate the 10-sentence writing, since the Analytical Scoring Approach was
not practical nor and lacks of positive washback. The task as well as the scoring system can
be used for an independent writing activity, since the results are too specific and students
should go back to receive specific feedback as well. So, a more direct writing task and
scoring system should be used in a progress test.
Additionally, it is also important to create the items carefully so that the answers do
not affect the face and content validity (Brown, 2005). They could be rewritten using
different kinds of categories within the four options of the answers with acceptable degree
of difficulty based on the current students´ English level.
In terms of Wesche’s (1983) four components of a test, these tests had non-linguistic
stimulus materials such images. Students needed to analyzed the images in order to do the
tasks, they could be writing sentences, completing a mutilated paragraph or completing
some sentences.
It may alert me that extraneous factors may be affecting the students’ results, it is
maybe because I was absent or because I did not explain the test, it was a colleague given
to my post natal medical leave. , the options: chose different categories based on the
students English level. The test was too high for the students’ current English level
although the vocabulary was taught in the classroom.
References
Ajideh, P. (2003). Schema theory-based pre-reading tasks: A neglected essential in the ESL
reading class. The Reading Matrix, 3(1). Retrieved from
http://www.readingmatrix.com/articles/ajideh/article.pdf
Alderson, J. C., & Urquhart, A. H. (1984). Reading in a foreign language. London, UK:
Longman.
Bailey, K. M. (1998). Learning about language assessment: Dilemmas, decisions, and
directions (p.189) Pacific Grove Calif.: Heinle & Heinle Publishers.
Brown, J.D. (2005). Testing in language programs: A comprehensive guide to English
language assessment. Chapter 10: Language Test Validity. New York, NY: McGraw-Hill. p.
108.
Cambridge English: Key for Schools. (2006). Retrieved on February 27th
http://www.cambridgeenglish.org/images/165870-yle-starters-sample-papers-vol-1.pdf
from
Flores, M. (2008). Exploring vocabulary acquisition strategies fro EFL advanced
learners.
Retrieved
on
June
2nd
.
2015
from
http://digitalcollections.sit.edu/cgi/viewcontent.cgi?article=1192&context=ipp_collection
Gibbons, P. (2002). Scaffolding language, scaffolding learning: Teaching second
language
learners.
Retrieved
on
may
15th
from
http://www.heinemann.com/shared/onlineresources/e00366/chapter5.pdf
Grellet, F. (1981). Developing reading skills: A practical guide to reading comprehension
exercises. Cambridge, UK: Cambridge University Press.
Lessard-Clouston, M. (2013). Teaching Vocabulary.
Retrieved from on april 5th, 2015,
from http://www.academia.edu/2768908/Teaching_Vocabulary
Mikulecky, B. (2008). Teaching reading in a second language. Retrieved from on april 5th,
2015,
from
http://www.longmanhomeusa.com/content/FINAL-LO%20RES-Mikulecky-
Reading%20Monograph%20.pdf
Nation, I. (2001). Learning vocabulary in another language. Cambridge: Cambridge
university Press.
Nunan, D. (1999). Second Language Teaching & Learning.
Heile, Cengage Learning.
Snow, C (2002) Reading for Understanding: toward a research and development
program in reading comprehension. RAND Corporation.
Swain, M. (1984). Large-scale Communicative language testing: A case study. In S.J.
Savignon, S.J. & M.S. Berns (Eds.), Initiatives in communicative language
(pp. 185-201). Reading, MA: Addison-Wesley.
teaching.
Thornbury, S. (1998). The Lexical approach: A journey without a map? Modern English
Teacher. Retrieved on June 6th, 2015.
Appendices
Appendix A
Subtest 1
Appendix B
Subtest 2
Appendix C
Subtest 3
Appendix D
Analytic Approach chart to evaluate the sentence writing from the subtest 1.
Appendix E
A sample of a students´ Analytic Approach Assessment to evaluate his 10-sentence writing
from the subtest 1.
Appendix F
Some of the students´ subtests answers sent through Google doc.
Appendix G
Results of the subtest 1 using the Analytic Scoring Approach and its corresponding
descriptive statistics
student 1
student 2
student 3
student 4
student 5
student 6
student 7
student 8
student 9
student 10
student 11
student 12
MEAN
MEDIAN
MODE
RANGE
VARIANCE
STANDARD DEVIATION
Subtest 1 Results
Analytic Scoring Approach
43
42
51
85
35
37
73
48
39
33
33
89
50,66666667
42,5
33
89 – 33: 56
373,3888889
20,18250067
Appendix H
Results of the subtest 1 by the rater 1 and the rater 2 and its corresponding descriptive
statistics
student 1
student 2
student 3
student 4
student 5
student 6
student 7
student 8
student 9
student 10
student 11
student 12
MEAN
MEDIAN
MODE
RANGE
VARIANCE
STANDARD
DEVIATION
Subtest 1:Rater
49
33
60
67
33
33
90
42
47
33
33
42
46,83333333
42
33
99 – 33: 57
287,6388889
Subtest 1:Rater 2
43
42
51
85
35
37
73
48
39
33
33
89
50,66666667
42,5
33
89 – 33: 56
373,3888889
17,71405879
20,18250067
Appendix I
Cronbach´s Alpha to for the rater 1 and 2´s results of the subtest 1 to identify the consistency or reliability with which the two raters
evaluate the same data.
Original Test Development Assignment
Appendix J
Descriptive Statistics of the Subtests 1 and 2
student 1
student 2
student 3
student 4
student 5
student 6
student 7
student 8
student 9
student 10
student 11
student 12
MEAN: average
MEDIAN
MODE
RANGE
VARIANCE
STANDARD
DEVIATION
Individual
Scores of
Subtest 2
10
0
0
60
10
20
0
30
20
30
20
20
18,3333333
20
20
0 - 60
263,888889
Individual
Scores of
Subtest 3
20
50
20
80
20
40
50
40
30
20
30
40
36,6666667
35
20
20 - 80
288,888889
16,9669911 17,7525073
Appendix K
The Item Facility chart of the Subtests 1 and 2
Items Subtest 2
11
12
13
14
15
16
17
18
19
20
Items Subtest 3
21
22
23
24
Students who answer it
correctly
1
2
2
4
2
5
3
1
2
0
Students who answer it
correctly
3
6
5
4
I.F.
0,083333333
0,166666667
0,166666667
0,333333333
0,166666667
0,416666667
0,25
0,083333333
0,166666667
0
I.F.
0,25
0,5
0,416666667
0,333333333
Original Test Development Assignment
25
26
27
28
29
30
5
4
3
3
5
6
0,416666667
0,333333333
0,25
0,25
0,416666667
0,5
Appendix L
The Item Discrimination chart of the Subtests 1 and 2
Items Subtest 2
Low
Scorers
High Scorers (bottom 3)
(top 3) with
with
correct
correct
answers
answers
I.D
11
0
0
0
12
13
14
15
16
17
18
19
20
0
1
0
0
0
0
0
0
0
1
0
2
1
2
0
1
1
0
-0,333
0,333
-0,667
-0,333
-0,667
0
-0,333
-0,333
0
Items Subtest 3
21
22
23
24
25
26
27
28
29
30
Low
Scorers
High Scorers (bottom 5)
(top 5) with
with
correct
correct
answers
answers
0
1
0
2
1
0
0
1
1
1
2
0
1
1
1
0
2
1
1
1
I.D
-0,333
-0,667
0,333
-0,333
0
0,667
0
0,333
0,333
0
Original Test Development Assignment
Appendix M
The Distractor Analysis Chart of the Subtests 1 and 2
Answers for sub test 2
A
Item
B
C
D
11
3
6
2
1+
12
4
2+
4
2
13
2+
6
2
2
14
1
4
3
4+
15
2+
3
4
3
16
5+
2
4
1
17
3+
6
0
3
18
2
8
1
1+
19
2+
5
3
2
20
0+
8
2
2
Answers for sub test 3
21
2
4
3
3+
22
1
6+
4
1
23
1
2
5+
4
24
1
3
4
4+
25
2
5+
2
3
26
3
3
4+
2
27
3+
2
4
3
28
3+
2
6
1
29
3
1
3
5+
30
1
0
5
6+
Original Test Development Assignment
Appendix N
The Response Frequency Distribution Chart of the Subtests 1 and 2
Response Frequency Distribution of Subtest 1
Item
11
12
13
14
15
16
17
18
19
20
A
B
C
D
High Scorers
1
2
0
0*
Low Scorers
1
1
1
0
High Scorers
0
0*
2
1
Low Scorers
1
1
0
1
High Scorers
1*
2
0
0
Low Scorers
0
1
1
1
High Scorers
1
2
0
0*
Low Scorers
0
0
1
2
High Scorers
0*
1
1
1
Low Scorers
1
1
1
0
High Scorers
0*
0
3
0
Low Scorers
2
0
0
1
High Scorers
0*
2
0
1
Low Scorers
0
2
0
1
High Scorers
1
2
0
0*
Low Scorers
1
2
0
0
High Scorers
0*
2
1
0
Low Scorers
1
2
0
0
High Scorers
0*
2
1
0
Low Scorers
0
2
1
0
Response Frequency Distribution of Subtest 2
Item
21
22
23
24
25
26
27
28
29
30
High Scorers
Low Scorers
High Scorers
Low Scorers
High Scorers
Low Scorers
High Scorers
Low Scorers
High Scorers
Low Scorers
High Scorers
Low Scorers
High Scorers
Low Scorers
High Scorers
Low Scorers
High Scorers
Low Scorers
High Scorers
Low Scorers
A
B
C
D
0
1
0
0
1
0
0
0
1
0
0
0
1*
1
1*
0
0
1
0
0
2
1
0*
2
0
1
1
1
1*
1
1
2
0
0
1
0
0
1
0
0
1
0
3
1
1*
0
2
1
0
0
2*
0
1
1
1
3
1
0
2
2
0*
1
0
0
1
2
0*
1
b
2
0
1
1
1
0
0
2*
1
1*
1
Original Test Development Assignment
Tables
Table 1
Frequency Polygon for Subtest I Vocabulary Awareness: describing Cesar´s Appearance
Table 2
Frequency Polygon for Subtest II Reading Comprehension: Reading about my Best Friend
Carla
Table 3
Original Test Development Assignment
Frequency Polygon for Subtest III Vocabulary Comprehension: Identifying Tatti´s
Appearance
Table 4.
Frequency Polygon for the Entire Test: What does he/she look like?
Frequency Polygon of the Entire Test
9
8
7
6
5
Number of students
4
3
2
1
0
0 - 50
51 - 100
101 - 150
151 - 200
201 - 250
251 - 300
Original Test Development Assignment
Table 5.
Summary of Descriptive Statistics for Test and Sub-tests I through 3
Statistics
Number of students
Total possible points
Mean (x-bar)
Mode
Median
Range
Variance (s²)
Standard deviation (s)
Entire test
12
300
105,7
83
90,5
73 - 225
1838,4
44,8
I. Vocab.A
12
100
50,7
33
42,5
33 - 89
373,4
II. Reading
12
100
18,3
20
20
0 - 60
263,9
III Vocab. C.
12
100
36,7
20
35
20 - 80
288,9
20,1
17
17,8
Original Test Development Assignment
Table 5
Construct Definition Chart
Subtest
(named by the
construct)
Subtest 1: Writing
About Cesar´s
appearance.
Construct: Vocabulary
Awareness
Subtest 2: Reading
About my Best Friend
Carla
Construct: Reading
Comprehension
Definitions of Construct(s) Assessed
(Add the complete citations for the works
you consulted to find these definitions as a
reference list in Part III below.)
Vocabulary awareness compels the use of
vocabulary acquisition strategies, especially
this called “planning strategy”. (Nation, 2001).
“The learners choose what to focus on and
when to focus on it” (Flores, 2008)
In order to understand better this concept, I
will take into account different definitions of
what Reading comprehension is:
-“The process of constructing meaning by
coordinating a number of complex processes
that include word reading, word and word
knowledge, and fluency.” Vaughn and
Boardman (2007, p.2).
-“The process of simultaneously extracting
and constructing meaning through interaction
and involvement with written language”. Snow
(2002, p.11) .
-“Extracting the required information from it
(referring to a text) as efficiently as possible”
Grellet (2006, p.3).
-Nunan, 1999 defines reading as an interactive
process that involves both linguistic (bottomup) and background (top-down) knowledge.
-“Reading is a conscious and unconscious
thinking process. The reader applies many
Possible Test
Methods
(Item Types)
Subjective
subtest: 10sentence writing
Filling the
blanks: Discrete
Point Approach
Original Test Development Assignment
strategies to reconstruct the meaning that the
author is assumed to have intended. The reader
does this by comparing information in the text
to his or her background knowledge and prior
experience” (Mikulecky, B, 2008)
-As a text participant, the reader connects the
text with his or her own background
knowledge – including knowledge of the
world, cultural knowledge,a nd knowledge of
the generic structure”( Gibbons, 2002, p. 93)
Subtest 3: Identifying
Tatti´s Appearance
Construct: Vocabulary
Comprehension
“Vocabulary in learning a language is more
than the use of words but the understanding
and use of lexical chunks or units, for instance
“good afternoon” “Nice to meet you” are
sentences that have more than one word and
which are constituted as vocabulary with a
formulaic usage. In other words “...vocabulary
can be defined as the words of a language,
including single items and phrases or chunks
of several words which covey a particular
meaning, the way individual words do.
Vocabulary addresses single lexical items—
words with specific meaning(s)—but it also
includes lexical phrases or chunks. (LessardClouston, 2013)
The Lexical Approach clearly states that
learning a language should not be seen from
grammar to vocabulary, it suggest that learning
a language goes beyond. “Language consists
not of a traditional grammar and vocabulary
but often of multi-word prefabricated chunks”
(qtd. In Thornbury 4)
Multiple
Choice: Discrete
Point Approach
Original Test Development Assignment
Table 6
Swain’s (1984) “Principles of Communicative Test Development”
Swain’s
Principles.
Section of
your test
Section 1:
Writing About
Cesar´s
appearance:
Vocabulary
Awareness
Section 2:
Reading About
my Best Friend
Carla: Reading
Comprehension
Section 3:
Identifying
Tatti´s
Appearance:
Vocabulary
comprehension
Start from
Somewhere
The test is built
based on the
role of
vocabulary
acquisition
strategies to
build on
vocabulary
awareness.
The test is built
under the
understanding
of the different
approaches on
reading
comprehension
The test is built
under the
Lexical Theory
and what
implies
vocabulary
acquisition in
the EFL
learning.
Concentrate
on Content
The content of
the subtest is
based on the
interaction
between the
stimulus
material, an
image, and the
students´
knowledge to
be able to write
a description
10 sentences.
It is based on
the interaction
between the
reader and the
text by
choosing the
best option to
complete a
mutilated
paragraph.
It is based on
the interaction
between the
stimulus
material, an
image, and 10
incomplete
sentences.
Bias for
Best
Colorful images
to understand
better the
vocabulary and
the situational
contexts.
Work for
Washback
The test
impacted on the
importance of
picture
exploitation
teaching.
An example of
the possible
answer.
Adequate time
to complete the
task and to
revise the work.
Colorful images
to understand
better the
vocabulary and
the situational
contexts.
An example of
the possible
answer.
Colorful images
to understand
better the
vocabulary and
the situational
contexts.
An example of
the possible
answer.
The test made
an impact on
the students´
conception of
English
language testing
using
technology.
The test made
an impact on
the importance
of teaching
strategies to
improve the
vocabulary, the
chunks and the
reading
comprehension.
Original Test Development Assignment
Name: ______________________________
Date: __________________________
EDU 570: CLASSROOM-BASED EVALUATION
SELF-ASSESSMENT CHECKLIST FOR
THE ORIGINAL LANGUAGE TEST DEVELOPMENT PROJECT
This worksheet should serve as a checklist for you as you complete your original test
development project. It should be copied and pasted into the end of your report, since the
MOODLE will only allow you to upload one file in submitting an assignment. Please
complete it as a self-assessment, using the following symbols:
+ =
 =
- =
NA =
good to excellent work; no questions or doubts in these areas
fair to good work; some doubts and/or some confusion here
poor to fair work; many doubts and/or much confusion
not applicable in this case
These entries will not be “used against you.” The information will be used solely to help us
improve the preparation for this assignment and to guide you in completing it. Please turn in
this checklist at the very end of your completed project.
1. In developing my original language test, I have defined the construct(s) to be
measured.
_____ The construct(s) to be measured is/are clearly defined and I used the professors’
feedback.
_____ The test has been written with some real audience and/or purpose in mind.
_____ I have explained the purpose(s) of my test in the report.
_____ I have included the construct chart as an appendix at the end of my paper.
2. I have drafted a language test with three subtests.
_____At least one subtest is subjectively scored.
_____I have included a multiple-choice subtest consisting of ten items.
_____I have included a key for the objectively scored portion(s) of the test as an appendix.

3. I have designed (or adapted) scoring procedures for the subjectively scored portion.
_____ I have written (or adapted) clear descriptors for the scoring levels.
_____ I have trained my additional rater using the descriptors.
_____ Since I used an existing rating scale, I have explained why I chose it.
_____ Since I adapted an existing rating scale, I have explained the adaptations I’ve made.
_____ I have cited the sources that influenced my scoring procedures.
4. I have developed a key for the objectively scored portion(s).
_____I have taken the test myself.
Original Test Development Assignment
_____I have made certain that there is one and only one correct answer to the items in the
objectively scored section(s)
5. I have pre-piloted my test
_____ ...with at least two native speakers or proficient speakers of the target language.
_____ I have checked their responses against my predicted answer key and obtained their
feedback
about the clarity of the instructions.
_____ I have revised the test as needed and duplicated it.
_____If I designed the test for a class which I myself am not teaching, I have gotten the
teacher’s feedback on the draft.
6. I have administered my revised test in order to pilot test this version.
_____I piloted my test with at least twelve learners of the target language.
_____On the basis of the students’ performance and direct feedback, I have determined the
clarity of the instructions, stimulus material(s) and tasks.
7. I scored the test and analyzed the results.
_____I have drawn frequency polygons representing the students’ scores for each subtest
and for the total test. They are included in an appendix at the end of the paper.
_____I have calculated the descriptive statistics (mean, mode, median, range, variance, and
standard deviation) for each subtest and for the total test, along with the total points
possible for the total test and each subtest. I have and included this information in a
table as an appendix.
_____I have evaluated the students’ performance on the subjectively scored part of the test
using at least two raters (myself and someone else).
_____I have computed ID and IF for each item in the objectively scored subtest(s).
_____ I have computed the average ID,and average IF for the objectively scored subtest(s).
_____The ID and IF information is reported in a table in an appendix at the end of the paper.
_____I have correctly interpreted and discussed the results of each of these analyses.
_____I have computed inter-rater reliability for the subjectively scored portion(s) of the test.
_____I have explained what the inter-rater reliability index tells me about my rating system.
_____I have reported the correlations among the subtests in a correlation matrix in an
appendix.
_____I have correctly interpreted and explained the results of the statistics from the pilot
testing.
8. I have written a concise, coherent and well documented report of my project.
_____The body of my report is no more than five (5) pages long, typed, double-spaced, in
twelve-point font (but not counting the title page, appendices or reference list).
_____Based on my analyses, I have discussed the test's strengths and weaknesses.
_____I have analyzed my test in terms of the four traditional criteria (reliability, validity,
practicality, and washback).
_____I have included appropriate suggestions as to how my test could be improved in the
future.
_____My report clearly locates my work relative to the literature covered in the course and
other appropriate research which I have found.
_____The appendix has my chart about Wesche’s (1983) four components of a language test.
_____I have discussed my test in terms of Swain’s (1984) four principles of communicative
language testing (in a chart in an appendix).
Original Test Development Assignment
_____I have read and properly cited Bailey, Brown, Wesche, Swain and other authors.
_____The construct definition chart, the rating scale, and the test itself are included in
appendices.
_____I have provided a complete and accurate reference list (using APA format).
_____I have personally checked the reference list for completeness and accuracy.
_____I understand that the grade is final and that I may not resubmit this paper to improve the
grade.
_____I have learned something by developing this original test and am proud of my work. =<)
Download