What is a word? - Nationaal Congres Engels

advertisement
CEF
and
the British National Corpus
Bertil Geurts
25 March, 2011
Vocabulary and receptive skills
Which words contribute most to
 listening and reading competence
 for Dutch learners of English as a secondary
language?
Status:
Kick-off of research aiming to label English words
‘A1’, ‘A2’, ‘B1’ or ‘B2+’ for testing and possibly teaching
purposes
Incentive
 Revision of VAS 2 test of English
Reading and vocabulary on three levels: BB, KBGT and HV
 Which words
can year-2 students in BB, KBGT and HV be tested on?
> Aim for ± CEF A2 (BB: A1/A2 … HV: A2/B1—)
> Waystage (van Ek, revised 1991)
Relevance of knowing words for L2 readers
Correlation
 reading (and listening)  word knowledge,
especially at lower levels
Other correlations 
 familiarity with subject matter (CEF)
 (reading and listening) strategies,
probably more so at higher levels

syntax / grammar (disputed)
What is a word?
“The majority of the examples in the dictionary are
taken word for word from one of the texts in the
corpus.”
21 words (tokens)
14 different words (types): 5x the, 2x of, 2x in, 2x word
13 lemmas: the, majority, of, example, in, dictionary, take,
word, for, from, one, text, corpus
13 word families
12 function words
9 content words
Word family
CONCLUDE
concluded, concludes, concluding
conclusion
conclusive
conclusively
inconclusive
III
III
V
VI
V
a foregone conclusion ?
jump to conclusions ?
in conclusion ?
conclude from ?
His essay had a very weak …, which left a poor final impression on the reader.
Can we demonstrate … that the factory caused the pollution?
There was … evidence the two students had committed plagiarism, so they went free.
The author … the article by suggesting topics for further research.
From: www.pbs.plymouth.ac.uk/academicwordlistatuop/
I. 1 – 680
II. 680 – 1720
III. 1720 – 3300
IV. 3300 – 6500
V. 6500 - 14.600
VI. 14.600 - …
(680 words)
(1040 words)
I and II: = 75% of all English usage
(1580 words)
(3200 words)
(8100 words)
(all 20 million words of the corpus)
Input for labelling words
 Breakthrough, Waystage, Threshold and
Key English Test, Preliminary English Test
 CEF(R)
 BNC / Bank of English frequency lists
 Teacher (expert) intuition
B2+
B1
1500
.
A2 900 .
A1 600
Breakthrough, Waystage, Threshold
Council of Europe Word Indexes for A1, A2, B1
(van Ek, 1978; revised 1991)
Lots of words to do with post, army, church, but no
e-mail, internet, cell phone
Too bad:
poste-restante
thingummyjig
Recent vocabulary needed, frequency relevant
Cambridge KET (A2) and PET (B1) updated regularly
CEF on vocabulary
Booklet p. 1
VOCABULARY RANGE
B2
B1
A2
A1
Has a good range of vocabulary for matters connected to his/her field and
most general topics. Can vary formulation to avoid frequent repetition, but
lexical gaps can still cause hesitation and circumlocution.
Has a sufficient vocabulary to express him/herself with some circumlocutions
on most topics pertinent to his/her everyday life such as family, hobbies and
interests, work, travel, and current events.
Has sufficient vocabulary to conduct routine, everyday transactions involving
familiar situations and topics.
Has a sufficient vocabulary for the expression of basic communicative needs.
Has a sufficient vocabulary for coping with simple survival needs.
Has a basic vocabulary repertoire of isolated words and phrases related to
particular concrete situations.
CEF: vocabulary clues
Booklet p. 1
Table 2. Common Reference Levels: self-assessment grid
L
i
s
t
e
n
i
n
g
R
e
a
d
i
n
g
A1
A2
B1
B2
I can recognise familiar
words and very basic
phrases concerning myself,
my family and immediate
concrete surroundings
when people speak slowly
and clearly.
I can understand phrases
and the highest frequency
vocabulary related to areas
of most immediate personal
relevance (e.g. very basic
personal and family
information, shopping, local
area, employment). I can
catch the main point in
short, clear, simple
messages and
announcements.
I can understand the main
points of clear standard
speech on familiar matters
regularly encountered in
work, school, leisure, etc. I
can understand the main
point of many radio or TV
programmes on current
affairs or topics of personal
or professional interest
when the delivery is
relatively slow and clear.
I can understand extended
speech and lectures and
follow even complex lines of
argument provided the
topic is reasonably familiar. I
can understand most TV
news and current affairs
programmes. I can
understand the majority of
films in standard dialect.
I can understand familiar
names, words and very
simple sentences, for
example on notices and
posters or in catalogues.
I can read very short, simple
texts. I can find specific,
predictable information in
simple everyday material
such as advertisements,
prospectuses, menus and
timetables and I can
understand short simple
personal letters.
I can understand texts that
consist mainly of high
frequency everyday or jobrelated language. I can
understand the description
of events, feelings and
wishes in personal letters.
I can read articles and
reports concerned with
contemporary problems in
which the writers adopt
particular attitudes or
viewpoints. I can
understand contemporary
literary prose.
CEF

±  English in Dutch secondary schools
A1
L
i
s
t
e
n
i
n
g
A2
BB1 KB1 GT1 H1
B1
B2
V1
GT2
KB3
V3
GT4
H4
V5
V6
BB1 KB1
R
e
a
d
i
n
g
GT1
H1
V1
KB2 GT2
H2
KB3
GT3
BB4
V3
H4
H5
V6
Booklet p. 2
Frequency
(1)
Booklet pp. 2, 3
http://ucrel.lancs.ac.uk/bncfreq/
Companion Website for:
Word Frequencies in Written and Spoken English: based on the British National Corpus.
(2001) pp. 320, Longman, London.
Word
PoS
a
Det
A / a
Lett
a bit
Adv
a great deal
Adv
a little
Adv
a lot
Adv
abandon
Verb
@
@
@
@
@
@
@
@
abbey
NoC
@
@
@
@
Aberdeen
NoP
%
:
:
:
:
:
%
abandon
abandoned
abandoning
abandons
%
abbey
abbeys
%
Freq
Ra
Disp
21626
268
119
14
104
40
44
12
26
5
1
20
19
1
14
100
100
99
96
100
99
99
98
97
90
47
95
95
34
88
0.99
0.93
0.87
0.95
0.92
0.93
0.96
0.94
0.96
0.93
0.87
0.90
0.90
0.75
0.80
http://www.wordfrequency.info/
Word frequency lists and dictionary
from the Corpus of Contemporary American English
Frequency
(2)
Word
PoS
Freq
the
of
and
a
in
to
it
is
to
was
I
for
that
you
he
be*
with
on
by
at
have*
are
not
this
's
but
had
they
his
from
she
that
Det
Prep
Conj
Det
Prep
Inf
Pron
Verb
Prep
Verb
Pron
Prep
Conj
Pron
Pron
Verb
Prep
Prep
Prep
Prep
Verb
Verb
Neg
DetP
Gen
Conj
Verb
Pron
Det
Prep
Pron
DetP
61847
29391
26817
21626
18214
16284
10875
9982
9343
9236
8875
8412
7308
6954
6810
6644
6575
6475
5096
4790
4735
4707
4626
4623
4599
4577
4452
4332
4285
4134
3801
3792
which
or
we
's
an
~n't
were
as
do
been
their
has
would
there
what
will
all
if
can
her*
said
who
one
so
up
as
them
some
when
could
him
into
DetP
Conj
Pron
Verb
Det
Neg
Verb
Conj
Verb
Verb
Det
Verb
VMod
Ex
DetP
VMod
DetP
Conj
VMod
Det
Verb
Pron
Num
Adv
Adv
Prep
Pron
DetP
Conj
VMod
Pron
Prep
3719
3707
3578
3490
3430
3328
3227
3006
2802
2686
2608
2593
2551
2532
2493
2470
2436
2369
2354
2183
2087
2055
1962
1893
1795
1774
1733
1712
1712
1683
1649
1634
its
then
two
out
time
my
about
did
your
now
me
no
other
only
just
more
these
also
people
know
any
first
see
very
new
may
well
should
her*
like
than
how
get
Det
Adv
Num
Adv
NoC
Det
Prep
Verb
Det
Adv
Pron
Det
Adj
Adv
Adv
Adv
DetP
Adv
NoC
Verb
DetP
Ord
Verb
Adv
Adj
VMod
Adv
VMod
Pron
Prep
Conj
Adv
Verb
1632
1595
1561
1542
1542
1525
1524
1434
1383
1382
1364
1343
1336
1298
1277
1275
1254
1248
1241
1233
1220
1193
1186
1165
1145
1135
1119
1112
1085
1064
1033
1016
995
Tasks and tries
See Conference Booklet CEF-BNC:
A
B
C
D
E
F
page 2
pages 2/3
page 4
page 4  5 – 7
page 5
pages 5  5 – 8
D, E, F
Pick of the plumbers is from GLTL 2010 re-examination: B1 most likely
Robber tries to hold up closed bank is in pretest revised VAS 2: A2
We’re all speaking Geek is from VWO 2010 re-examination: B2/B2+
I speak three languages fluently, am a black belt in karate, play league badminton, sing in a
choir and play the organ at a church. I raise money for two charities and help with autistic
children. I hope to retire from my plumbing work next year, aged 50. And no, Mr O’Neill, I
didn’t have any GCSE’s either.
But in no area of the culture is the collision more intense than over the English language,
for the web has changed English more radically than any invention since paper, and much
faster. According to Paul Payack, who runs the Global Language Monitor, there are
currently 998,974 words in the English language, with thousands more emerging every
month. By his calculation, English will adopt its one millionth word in late November. To
put that statistic another way, for every French word, there are now ten in English.
Download