Type de mot - CLILLAC-ARP

advertisement
A la découverte des corpus:





British National Corpus (BNC)
o Simple Search http://www.natcorp.ox.ac.uk/ sur la page d’accueil, cliquez sur more
pour connaître le nombre de mots et la date du corpus
o VIEW interface http://view.byu.edu/
Business Letter Corpus http://www.someya-net.com/concordancer/
British National Corpus (BNC)
o VIEW interface http://view.byu.edu/ dans “sections” choisissez “spoken”
COMPARA http://193.136.2.104/COMPARA/psimples.php?language=en
Cliquez dans le menu à gauche pour en savoir plus sur les textes inclus dans le corpus
EuroParl http://opus.lingfil.uu.se/cwb/Europarl7/frames-cqp.html
Tapez le mot entre guillemets et cliquez sur run query sur la page d’accueil, cliquez sur
“home” pour trouver le nombre de mot et la date du corpus
Type de mot
Mot à chercher
obsolète
nouveau
commun
rare
Typiquement oral
littéraire
technique
régional
sentimental
religieux
politique
étranger
counterpane
MP3
with
epicure
yeah
amiable
pelagic
lass
darling
rosary
coalition
rapporteur
BNC
BNC spoken
EuroParl EN
COMPARA
EN **
Business
Letter corpus
a. Which is the only corpus that has counterpane? Does it surprise you? Think of some more old-fashioned words and check
in which corpora they appear.
b. Which is the only corpus with MP3? Do the dates of the texts included in the corpus give you any explanation of why this
might be so?
c. Why do you think with appears in all five (sub-)corpora? Why do you think it is more frequent in some than in others?
d. Which is the only corpus that has epicure? Think of some more rare words and check in which corpora they appear. How
big do you think a corpus has to be for you to find rare words in it?
e. Which two corpora do not have yeah in them? Why do you think it does not appear?
f. In which two corpora is amiable more frequent? Can you think of an explanation for this?
g. Which two corpora have the word pelagic? Why is a technical word like this unlikely to be found in the other three (sub)corpora?
h. Which two corpora have the word lass? In which two corpora are regionally marked words like this least likely to be found?
i. Which corpus does not have darling, and which corpus has only one occurrence of this word? Why do you think this is so?
j. Which two corpora have the word rosary? In which of them is the word comparatively more frequent? Why could this be?
k. Coalition appears in all five corpora. In which is it comparatively more frequent? Why?
l. The foreign word rapporteur does not appear in three of the English language corpora, but in one it is exceptionally
frequent? Why?
Read the description of the Business Letter Corpus. Given this information, decide which of the words and expressions
below are likely to be very frequent in the corpus, and which are unlikely to be found in it. When you finish, use the corpus to
test your predictions.
Cheerio
Thank you for
I am pleased to
very funny
I love you
We regret
looking forward to
Who’s there?
soup
Yours sincerely
a. Which of the above words and expressions is the most frequent in the corpus?
b. Which four search terms cannot be found in the corpus?
c. Were all your predictions right? If not, which results surprised you? Why?
Look for “work” and then for “works”, “working”, “worked” in the BNC online.
What can you say about the query results?
Look up the following strings of words in the BNC and write down their frequencies.
What can you conclude from your results?
It
It was
It was okay
It was okay as
It was okay as far
It was okay as far as I
It was okay as far as I could
It was okay as far as I could see
was okay as far as I could see
okay as far as I could see
as far as I could see
far as I could see
as I could see
I could see
could see
see
Below are sequential three-word clusters taken from the sentence As a rule of thumb you need a litre of paint to every 12
square metres of wall. Which clusters are likely to turn up in the BNC? Which are unlikely to be found? Can you guess which
will be the most frequent? Test your predictions and then discuss your results.
as a rule
a rule of
rule of thumb
of thumb you
thumb you need
you need a
need a litre
a litre of
litre of paint
of paint to
paint to every
to every 12
every 12 square
12 square metres
square metres of
metres of wall
look up the following pairs in the BNC: *payed-paid, *pronounciation-pronunciation, *accomodation-accommodation
Using the BYU online interface of the BNC, look up a word like honestly in the sub-corpus of speech and then in the subcorpus of fiction. What statistical conclusions can you draw?
Look up a word like congratulations followed by a preposition in the Collins Wordbanks Online corpus.
What prepositions can be used following congratulations? Can they be used interchangeably?
Download