Experiencing Vocabulary Learning Using Small Language Corpora

Experiencing Vocabulary Learning Using Small
Language Corpora
Višnja Kabalin Borenić, Department of Business Foreign Languages
Faculty of Economics and Business, University of Zagreb
Sanja Marinov, Department of Foreign Languages and PE
Faculty of Economics, University of Split
Martina Mencer Salluzzo, Department of Languages and Culture
Vern – University of Applied Sciences, Zagreb
This article researches university students' responses to a set of
exercises based on authentic corpus material. It aims to add to
the database of potential exercises derived directly from corpus
material. The research involved 51 students of business and
tourism who were asked to complete a variety of exercises
derived from corpus material and record their impressions in a
journal. Since they combine quantitative and qualitative data
(students’ success rates and comments), our results provide
reliable guidelines for the design of corpus-based exercises.
Research results revealed that some learners appreciate the
benefits of corpus consultation while others find it too time
consuming or demanding. On the whole, the respondents
recognised the benefits of autonomous learning, intensive
reading and context reconstruction. We found the method
beneficial and practicable for intermediate and advanced level
students provided that it be introduced gradually.
Key words: small language corpus, corpus-based exercises,
vocabulary, university students, journal
Usefulness of corpus data for language teaching has long been
Krishnamurthy, 2001) and corpus informed language teaching
materials are now taken for granted. A modern course book, for
example, is entirely corpus-informed (McCarthy, 2004) and all
major publishers now provide corpus-based dictionaries
(O'Keeffe, A. et al., 2007: 17). Experimenting with direct
application of corpus material and corpus methods in language
classrooms is only relatively a recent phenomenon. Corpora
and concordancing were introduced in the language-learning
environment in 1969 (McEnery and Wilson, 1997: 12) but it
was Tim Johns’ (1986) work and his idea of Data driven
learning in the 1980s that spawned interest and further
empirical research (Tribble and Jones, 1990; Stevens, 1995;
Cobb, 1997). The tool, however, is not yet widely used in the
language classrooms and more empirical research is needed to
help disseminate the idea and encourage the use of corpus-
driven activities. More importantly, the research should
indicate new ways, and new language items that can be
presented in this way to facilitate the application of corpusdriven activities in the classroom. This is exactly the aim of this
paper: to set examples of possible tasks that can be designed
using a small corpus and to analyse how students react to them,
both in terms of their ability to solve the set problems and
opinions/attitudes towards the given type of exercise. In doing
so we hope to bring corpora directly into the classroom to help
teach, explain, or practice particular language items.
Vocabulary teaching and corpora
In order to be able to speak a language well it is essential to
have a wide range of vocabulary. This fact is now taken for
granted by both teachers and learners, but it has not always
been that way. Not so long ago it was grammar that was given
priority and words were seen as mere gap fillers of
predetermined syntactic language structures. It was the careful
study of language corpora that brought evidence of a vague and
almost non-existent borderline between grammar and lexis
(Sinclair, 1991). Carefully sorted corpus concordance lines
highlighted patterns that depend on particular lexical items
rather than syntactic structures and thus revealed that each
lexical item has a little “grammar” of its own. Today, we
development in our minds. Furthermore, regular language
classes cannot cover the huge number of vocabulary items that
students need to learn. Students need to be enabled to do a lot
of autonomous learning and modern language instructors need
to teach them both how to learn vocabulary as well as what to
learn. Presenting corpus data in a variety of tasks can raise
students’ awareness of what there is to learn and how to do it.
Our sample consists of 51 undergraduate university students of
economics and tourism who have been learning English
between 8 and 12 years. To make the sample more
representative of the student population in non-philological
The test
The students were given a test that consisted of three different
exercises based on and derived from a small corpus. The corpus
compiled and used in this study consists of 450 000 tokens and
is therefore classified as a small corpus. It combines one
register and one genre because it includes only the texts from
the area of tourism, or more specifically of the tour guides of
the Mediterranean countries. It was originally compiled as a
source of corpus-derived exercises in a project carried out with
students of tourism (Marinov, 2011) but is now used for
teaching purposes to address particular language issues when
Each exercise aims at a particular language problem that we
believed students at this level of language learning should be
able to cope with and understand but not without some
difficulty. We concentrated on teaching vocabulary as it is
believed that lexical information is much easier for learners to
notice and study (Gaskell and Cobb, 2004). Wanting to include
the elements of "student research" and knowing that students
find coping with the whole range of lines so discouraging that
they prefer and need guidance (Marinov, 2011), we provided
the material in the form of a shortened concordance. Our
research included three different exercises as described below.
Task 1 – the verb “run”
This task consists of an 83-line-long concordance of the verb
“run.” Concordance is a screen display or printout of a chosen
word or phrase in its different contexts, with that word or
phrase arranged down the centre of the display along with the
text that comes before and after it (McCarthy, 2004). The task
concentrates on meanings. “Run” is a word all our students are
familiar with but only with a narrow range of its meanings. The
aim of this exercise is, therefore, to extend this range and
possibly raise the students’ awareness that many other highly
frequent words have additional meanings to be learnt.
In the “Mediterranean Europe” corpus there are as many as 562
tokens of “run” so the concordance had to be shortened. In
order not to lose authenticity of the corpus data by editing it
(Flowerdew, 1996), we used the WordSmith tool feature to
shorten the concordance automatically and this resulted in 83
lines. The concordance was then right-sorted, i.e. arranged
alphabetically to the right of the node word.
Observing the students' responses we wanted to find out the
how many and which of the present meanings the
students managed to identify
which of the meanings were more easily identifiable
Task 2
The students were required to complete 14 gapped sentences
with one of the two phrasal verbs: make for or make up for, and
organize the information in their personal vocabulary files by
defining the meaning and providing an example sentence for
each phrasal verb. Finally, the students were asked to comment
on the task so that we could determine these:
The overall accuracy (score) expressed in percentages.
Which of the phrasal verbs was understood better and
used more correctly.
Whether there is a connection between the score
achieved and the students' perception of task difficulty.
Task 3
In Task 3 students were asked to study the 16 examples and
deduct if there was any difference in usage between made from
and made of. They were then, same as in Task 1, asked to make a
vocabulary file entry for each of the collocations. In the journal
entry, they had to note their impressions and possible difficulties
in solving the task.
Along with the corpus data the students were asked to keep a
journal in which they noted their opinions, feelings, and
difficulties encountered while doing each of the assigned tasks.
The journal consisted of generic questions and questions related to
specific tasks. Students were also asked to explain the path they
were taking while trying to solve the tasks.
Journal questions 1 and 2
The introductory generic questions were as follows: “Have you
already encountered this method of discovering meanings and
studying new vocabulary and its usage?” and
"Do you use search engines? Why? and When?"
As regards the familiarity with similar tasks, the majority of
students (N = 39) responded negatively. When asked whether,
why, and when they used internet search engines most students
(44) answered affirmatively, but indicated different level of
frequency. Judging by the answers obtained, we can conclude that
the majority interpreted search engines as Internet, Google
translator or computers in general. Several students mentioned
that they used search engines to verify expressions they cannot
find in dictionaries. Three students mentioned wanting to see the
context in which a particular phrase or expression is used. Finally,
only one student mentioned looking for a similar authentic
document in the target language. It is obvious that students should
be given some direct, explicit instructions about the differences
between on-line dictionaries and search engines, and should be
taught how to use internet search engines to advance their
language learning.
Task analysis
Task 1 – meanings of “run”
The 83-line-long concordance included ten different meanings/usages
of “run”. Their frequency was established and is presented in the
second column of Table 1.
manage, operate, organize
transport, drive, ride
go berserk
10 to include everything within
a group or type
Table 1: Meanings/usages of the verb ”run“ in the 83-line-long concordance
from the corpus „Mediterranean Europe“
The sample as a whole managed to identify all ten meanings of
the verb but with varying success. The most easily identifiable
senses were 1 and 2, which were also the two most frequent
senses in the concordance. The rate of noticing is obviously
related but not directly to the frequency of occurrence as can be
seen from the example of the next most frequently noticed
sense (expire) which appears only once in the concordance. In
other words, there is no clear and measurable connection
between frequency of occurrence and the rate of recognition of
a particular sense. The rate of recognition can be influenced by
a number of factors such as: an existent passive/active
knowledge of the word/sense, the immediate context, language
proficiency, seriousness with which a student has tackled the
task (motivation, interest, patience) which in itself could be a
topic of another, separate research.
Students have also ”invented“ some meanings of their own.
They treated different uses of the same sense as separate
senses. Most frequently they interpreted the passive usage of
”run“ as in “well-run“ or ”run by“ as separate senses (24%).
Students' comments allowed us to establish how much they
liked the exercise, what were the major difficulties encountered
and strategies used in finding the solutions. Content analysis of
students' responses is presented in Table 2.
General impressions
1 easy/initial problems quickly resolved
2 interesting/useful
3 interesting but ... confusing/difficult/long
Major difficulties encountered
4 difficult to distinguish between the meanings
5 lacking or difficult context
6 understand the meaning but cannot explain
7 time consuming
Strategies used
8 used Internet/dictionaries to find out
9 re-reading
10 careful analysis and concentration (which is good
and helps acquisition)
Table 2: Students' opinions/comments on Task 1: different senses of the verb
Quoted below are two students' exact words which we selected
as extreme examples of the two ends of a spectrum of opinions.
Task 2 – phrasal verbs “make for” and “make up for”
The compounded score for all students revealed a satisfactory
overall accuracy with 80% of all sentences completed correctly.
More than 50% of students made fewer than 2 mistakes. At the
other end of the spectrum, there were 9.8% students with 7 or
fewer correct answers.
Altogether, the students had more difficulty understanding
make for than make up for.
The connection between the score and perception of task
difficulty could only be examined for the 19 students who made
comments about the difficulty. The score and the perceived task
difficulty corresponded in 9 cases only. By contrast, 4 students
with high scores found the task difficult and expressed
uncertainty about their answers and 6 students with very low
scores maintained the task was easy. To conclude, the correct
and incorrect perception of task difficulty and one's
achievement appeared to be equally widespread, which we
found surprising as one would expect a higher level of selfawareness among university students.
Task 3 – difference between made of and made from
Students’ answers were evaluated on a scale from 0 to 3: 0 - no
answer or completely wrong,
1- fair, 2 - good, and 3 - very good. The overview of their answers
and journal comments is presented in the Table 3.
No. of
Perceived level
Strategies applied
of difficulty
No effort made - no
An easy task
difference noticed
Strategies applied led to
An easy task
mistaken conclusions
Difference briefly explained
An easy task
(from dictionary)
Correct definitions provided
with good examples of
usage and explanation of
one’s analytical approach /
the logic applied
A difficult task
Table 3: Students' answers to Task 3: deducting the difference in meaning
between made of and made from
The quality of students’ answers was inversely proportionate to
the perceived level of task difficulty: the students who invested
no or little effort found the task easy, whereas the students who
chose to ponder the sentences and work out the meaning for
themselves found the task difficult but interesting and rewarding.
Clearly, this kind of task is best suited for highly-motivated,
curious, and committed students.
Journal questions 4 and 5
Having completed three different corpus-based exercises the
students were required to outline what they saw as the
particular benefits of this approach to language learning and
suggest potential users. The analysis of students’ answers
regarding the advantages of corpus-based exercises revealed
the following:
1. Praise to the inductive approach and learner autonomy.
A significant number of students (19) appreciated the inductive
approach and learner empowerment. They found that corpusbased exercises developed skills important for increasing the
quality of learning and understanding through parallel
observation of different examples of usage, practical
application of knowledge, autonomy in establishing rules,
creation of meaning from context and discovering relations
between language phenomena.
2. Strong emphasis on the importance of context.
Many students (18) emphasised the importance of context,
especially as it prevents literal translation, underpins deduction
of meaning and enhances long-term memory.
3. Generally positive remarks about the method.
A significant number of students thought that corpus-based
approach enhanced vocabulary learning (12) and was more
interesting than traditional ways of learning (9). A smaller
group appreciated the intense practice, focus on details (4), the
abundance of examples (3) and positive effects of the
interaction between existing and newly acquired knowledge
Finally, as regards the potential beneficiaries of this approach,
students' responses fall into several distinct groups.
1. Emphasis on learners’ desire to learn.
Answers in this group (21) revolve around the idea of the desire
to learn as a prerequisite for employing this method. Seventeen
students would recommend this approach to persons willing to
invest effort and actively engage in building their vocabulary,
either for study or for work, and the remaining four would
recommend it to those whose English is weak but who wished
to learn more.
2. Emphasis on level of English
22 comments (44%) mention or focus on prospective learners'
level of English. Most students who fall into this group believe
this approach suits advanced learners (15), which qualification
can include high school students as well. Three respondents
would recommend it to individuals who are too self-confident
about their level of English. Three students believe even the
beginners in primary education would benefit from this method
and one thinks that both advanced learners and beginners
would find it beneficial. Finally, three students think the
approach would be useful for people with specific vocabulary
3. Emphasis on learners' professional or academic needs
This group recommends the method to individuals who are
interested in the English language itself, who focus on details,
i.e. to language students (3) or to Croatian politicians because
they “constantly embarrass us with their horrible English” (2).
4. Negative attitude
There are three students who would not recommend this
approach to anyone.
Using authentic corpus material is a rather innovative and
insufficiently explored teaching/learning method. In this
research three corpus-based tasks were given to university
students of business and tourism and their responses were
analyzed with respect to both accuracy/new learning, as well as
to their comments about this teaching/learning approach.
All three exercises presented a challenge to most students and
made them develop own strategies for finding solutions and
answers. In line with previous empirical studies our research
has shown that certain learners appreciate the benefits of corpus
consultation while others find it too time consuming or
demanding (Chambers & O’Sullivan, 2004; Chambers, 2005).
Each of the three exercises caused different problems but they
also presented some common ground such as the lack of
context, difficult context, or the need to invest more time and
concentration on intensive reading. For some students the
strategy of intensive and concentrated reading made up for the
lacking context that they thus managed to construct. Intensive
reading is a skill that is rarely practiced in regular language
courses. Therefore, the best practice would be to introduce
corpus and intensive reading gradually, as a long-term process
and an integral part of the overall language-learning process
(Kennedy and Miceli, 2001). The problem of non-existent
context can also be tackled by encouraging students to use
various reference materials and examine longer stretches of
source texts.
A clear focus on independent acquiring of new knowledge has
also been recognised, combining more familiar with less
familiar. The process of language recycling, which helps
shifting passive into the active usage, is thus also initiated. The
depth and long-term knowledge retention is recognized as the
main advantage of this approach. Apart from recognising the
value of learning vocabulary through several contextual
encounters (Cobb, 1997) certain students have also found the
tasks to be challenging and therefore more motivating than
traditional types of exercises. The complexity of the task
increased motivation
for higher
(upper-intermediate or
advanced) level language learners, while it decreased for those
at lower levels.
Finally, appreciation of skills developed in the course of doing
the exercises (parallel observation, practical application of
knowledge, establishing rules, drawing conclusions, deduction
of meaning from context, finding relations between language
phenomena) clearly emphasise the importance of procedural
knowledge that is enhanced by this approach.
Based on the obtained data and students’ journal responses, we
can conclude that the method is beneficial and practicable for
intermediate and advance level students but the corpus data
should be edited in order to suit the particular students’ needs,
abilities and language proficiency. Some challenge should be
provided, to provoke interest and the feeling of success, but the
task should not be too long or difficult, as not to discourage the
students. The list of obtained student opinions about corpusdriven learning can be used as a solid starting point for future
1 Chambers, A. (2005). Integrating corpus consultation
procedures in language studies. Language Learning &
Technology 9 (2): 111-125.
2 Chambers, A., & O'Sullivan, Í. (2004). Corpus
consultation and advanced learners' writing skills in
French. ReCALL, 16(1): 158-172.
Cobb, T. (1997). Is there any measurable learning from
hands-on concordancing? System, 25, 301-315.
4 Flowerdew, J. (1996). “Concordancing in Language
Learning” in Pennington, M. (ed.), The Power of CALL:
97-113. Houston, TX: Athelstan.
5 Gaskell, D. and Cobb, T. (2004). “Can learners use
concordance feedback for writing errors?”. System 32:
301- 319
Johns, T. (1986). ”Micro-concord: A language learner's
research tool“. System, 14 (2): 151-162.
7 Kennedy, C. & Miceli, T. (2001). An evaluation of
investigation. Language Learning & Technology 5(3):
8 Krishnamurthy, R. (2001). “Learning and Teaching
through Context - A Data-driven Approach”. TESOL
Spain Newsletter, Volume 24.
9 Marinov, S. (2011). The role of small specialised corpus
in teaching ESP, unpublished master’s thesis,
University of Zadar.
10 McCarthy, M. (2004). Touchstone - From corpus to
coursebook. Cambridge: CUP.
11 McEnery, T. and Wilson, A. (1997). Teaching and
language corpora, ReCALL 9 (1): 5-14
12 O'Keeffe, A. et al. (2007). From Corpus to Classroom.
Cambridge: Cambridge University Press
13 Scott, M. (2004). WordSmith Tools, version 4, Oxford:
Oxford University Press. ISBN: 0-19-459400-9.
14 Sinclair,
collocation. Oxford: Oxford University Press.
15 Stevens, V. (1995). “Concordancing with language
learners: Why? When? What?”. CAELL Journal, 6 (2):
16 Tribble, C. and Jones, G. (1990). Concordances in the
classroom. London: Longman
17 Willis, D. (1993). Syllabus, corpus and data driven
learning. IATEFL Conference Report: Plenaries