corpus - languagehelper

April 15th, 2013
Ms. Amany AlKhayat
TLC session: Corpus Linguistics (A Practical Session)
Corpora tasks
1- Can you guess the most common words in English? Write only 3 of them. What's your
evidence? Compare your answers with your partner.
 Check your answers against this evidence from the Corpus of Contemporary American
English (COCA) (This is a corpus of 450 m words
Tip! Why Frequency is important?
Information! Tokens, Types
Types are word-forms and tokens are occurrences of word-forms. So, for example, in the
sentence 'The cat sat on the mat', there are two tokens of the type 'the' and one token each of
the types 'cat', 'sat', 'on', and 'mat'.
2- Take a few minutes to think of which words collocate with these 3 verbs: take, break and
catch. 
3- How do you deal with grammatical problems in class? (E.g., Punctuation, Collocations,
colligation, pronouns, reference resolution, coherence markers…etc.)
April 15th, 2013
Ms. Amany AlKhayat
TLC session: Corpus Linguistics (A Practical Session)
Corpora tasks
4- Now look at the word cloud below and try to decide which words collocate with take, catch
and break.
It’s time for practicing corpora 
Now let's open
Check your answers for questions 2 and 4 using COCA.
April 15th, 2013
Ms. Amany AlKhayat
TLC session: Corpus Linguistics (A Practical Session)
Corpora tasks
Grammar Intuition vs. Corpus data:
Main website:
Quiz Builder
Just a description of corpora used in Lextutor:
Please feel free to search Lextutor or COCA for any words or phrases that you need
information for.