Corpus Linguistics: session 2

advertisement
Corpus Linguistics (2):
The Tools
of the
Tradesession 2
Corpus
Linguistics:
http://tinyurl.com/669o4zt
martin.wynne@it.ox.ac.uk
ylva.berglund@it.ox.ac.uk
Today’s session
• An introduction to some features of
tools
• Demo of different (kinds of) tools
• Hands-on practice with one tool
AIM: Help you know what to look for in a
tool for your work (and what options
there are)
There are different
TYPES OF TOOLS
Different kinds of tools
• Online / offline
• For one particular corpus / for any corpus or
text
• Use straight away / need to prepare corpus
• 'Free' / licence conditions and costs
Different kinds of tools
• Online / offline
• For one particular corpus / for any
corpus or text
• Use straight away / need to prepare
corpus
• 'Free' / licence conditions and costs
Tools may
• have different functions:
concordance, wordlist, statistics,
collocation, keywords…
• handle annotation:
interpret tags, ignore tags, treat tags as
text
• take different text formats:
.txt, .xml, .html
Different tools have different functions.
TYPICAL FUNCTIONS
Concordance
•
•
•
•
Search word + context
Can be displayed as KWIC
Can usually be sorted
Used to see patterns of use
KWIC Concordance
Wordlist
List all words in the corpus
• alphabetically
• by frequency
Used as starting point for further functions
• keywords
• lexical density/readability calculations
Sampler AntConc wordlist
Collocations
Co-occurrence patterns
borrow money
borrow books
borrow a car
May I borrow
(more in Session 3)
Collocates:
adjectives immediately preceding BUSINESS
Corpus of Contemporary
American English
http://www.americancorpus.org/
Visualization
Graphs
Word clouds
Distribution displays
Etc.
Example: BNCweb
borrow
Example: Voyant Tools
http://voyant-tools.org
‘borrow’
Compare your intuition to what you find in the corpus
What is borrowed and by whom?
What words do you expect to find together with borrow?
Can these words be grouped in some way, for example
based on their word class, function, or meaning?
Where would you expect these words (e.g. before or
after borrow? Immediately adjacent or not?)
Who do you think uses the work borrow? In what
context or type of language would you find borrow?
Are there any words that are NOT used with borrow?
AntConc
Download AntConc for free from:
http://www.antlab.sci.waseda.ac.jp/antconc_index.html
(or just search for Antconc)
Use your own texts and corpora. Find some examples
at:
http://www.ota.ox.ac.uk/
Tip of the week
Register to use
the BYU corpora
for free.
http://corpus.byu.edu
Next week (Session 3)
Collocation
Corpus linguists claim to have identified an important
principle is responsible for the creation of much of the
meaning of texts – collocation (co-occurrences). What is
it, and are the claims true?
Optional reading:
* Xiao, Richard, and Tony McEnery (2006). "Collocation,
Semantic Prosody, and near Synonymy: A CrossLinguistic Perspective " Applied Linguistics 27(1): 103129.
http://applij.oxfordjournals.org/cgi/content/full/27/1/103
Corpus Linguistics (2):
The Tools
of the
Tradesession 2
Corpus
Linguistics:
http://tinyurl.com/669o4zt
martin.wynne@it.ox.ac.uk
ylva.berglund@it.ox.ac.uk
Download