Saturday, October 12
Justine Neiderhiser, University of Michigan
Getting Started with AntConc
Concordance programs (or “concordancers”) help you investigate patterns of language use across a large number of texts. AntConc, like other concordancers, allows you to search for all instances of a particular item (e.g., the word
or the phrase
in order to
). You can also use AntConc to find: collocations (words that frequently co-occur such as
); word frequencies (a list of the most frequently occurring words in the corpus); and frequently appearing word clusters (e.g., three, four, or five-word clusters such as
the fact that
on the other hand
due to the fact that
was developed by the corpus linguist Laurence Anthony and can be downloaded for free from his website: http://www.antlab.sci.waseda.ac.jp/software.html
. There are also helpful tutorial videos available halfway down this page: http://www.antlab.sci.waseda.ac.jp/antconc_index.html
Stage 1: Installing AntConc and uploading your corpus
To download AntConc, go to: http://www.antlab.sci.waseda.ac.jp/software.html
Download the version appropriate to your platform. If you’re using Windows, click on 3.
. In the pop up box, click on the
option. Then choose to Save to your desktop. If you’re using a Mac, scroll down and click on AntConc
Once you’ve downloaded the program, open it by clicking on the icon. 4.
Note: You may be prompted to run the program, so click run.)
You are now ready to upload your corpus into the program.
Note: AntConc will not allow you to upload Word or PDF files, so before uploading you first need to convert all your documents to TXT (or plain text) files and give them consistent file titles. To convert MS Word documents to TXT files, open the doc (or docx) file and click Save As, then Other Formats. Then choose Plain Text under “Save as type.” For larger projects, you may want to download a program for about
$20 that will allow you to convert all of your files at one time. To get the program, search for “MS Word Export To Multiple Text Files Software 7.0,” download the free trial version, then purchase the full version if you want to be able to actually use the TXT files it generates.
To upload your corpus, click on
, which is located at the top left of the screen. Then click
… In the window that appears, find the folder that contains your corpus, then highlight it, then click
Your corpus should now be uploaded into AntConc. 1 Materials developed by Zak Lancaster, Assistant Professor of English at Wake Forest University. 1
Congratulations! You are ready to begin concordancing.
Note: Running vertically at the left of the interface are all the files in your corpus. If you uploaded ten papers, for example, you will see ten files. The number 10 will also appear in the Total No. box at the bottom left. Running horizontally at the top of the page are seven tabs: Concordance, Concordance Plot, File View, Clusters, Collocates, Word List, and Keyword List. These are your different search tools. Below, the tools that you will use most frequently are reviewed.
Stage 2: Conducting searches
You can use the
tool to search for a specific language item. For example, you may be curious about how the word
is used in your corpus. To find out, you could type
in the “Search Term” box toward the bottom of the page and find out how many concordance hits were retrieved. a.
Now you can begin to look for patterns. You might want to consider whether the b.
word appears at the beginning of the sentence, the end, or in the middle. For each hit, you will be presented with minimal context—just the clause that contains the item. If you click directly on the highlighted item, you will be taken to the whole document. Often, you will need to consider this larger context carefully when analyzing the function of a particular language feature. To accelerate your analyses, you can use the
tab below the “search term” box. What this tool will do is (a) sort the words to the right or the left of your search item according to alphabetic order and also (b) highlight the words to the right or to the left of the item. To use this tool, first adjust the levels that are located at the very bottom of the screen. If you’re interested in words that appear to the left of
, then adjust the levels to 1L and 2L and 3L. Then press
. This will highlight (in different colors) the first, second, and third words to the left of
tool will show you (a) the particular file or files where your search item is found and (b) the exact location of the item in each file. Notice that each little vertical line represents one hit. (If you click on that hit, you will be taken to its location in the document.) This tool is useful for linking particular language items with their typical locations within a piece of writing. For example, you may discover that sentence-opening
occurs more frequently at the beginning of research articles than in the middle or end. Or you may discover that self mentions (e.g., “I” or “my argument”) occur commonly toward the end of an introductory section. You have several options for using the
tool. As one option, you can research how your particular item in question (for example,
) appears in two or three or four-word clusters (e.g.,
However, I think
In my view, however
). You can adjust the cluster size toward the bottom right: adjust the minimum size to 3 if you’re only interested in clusters of 3 or more words; adjust the maximum size to 5 if you want no more than 5-word clusters. As 2
another option, you can research the most frequently appearing clusters of words in your corpus, without selecting a particular item to research. To do this, you need to tick the
box below the window. Then adjust the N-Gram size (or numbers of words in each cluster) to your preference. A good place to begin is with a minimum of three words and maximum of 5 words. (6-word clusters are quite rare.) Finally, you should also adjust the Minimum N-Gram frequency. Since your corpus is small, you can set this to 5 or 6. This means a cluster will be retrieved only if it appears at least 5 or 6 times in your corpus. The
tool performs a similar operation as the
tool. The difference is that, while the
tool retrieves strings of words that occur together in a series, the
tool retrieves words that are most frequently associated with your search item. For example, the phrases
I believe that
the fact that
may be retrieved as frequently appearing 3-word
, but we would not say that the words
. An example of collocates are
. These two words frequently co-occur (e.g., There is a high probability that it will remain sunny tomorrow), but
We would not say there is a “high chance” that something will happen; instead we might say there is a “good chance” that something will happen. Thus
are collocates. The
tool creates a list of the most frequently occurring words in your corpus. (Most likely, the definite article
is at the top of the list in your corpus.) This tool can be very useful for many research purposes. When analyzing your own writing, for example, it can help you to identify words that you may be
. If, for example, the intensifiers
are high on your list, you may be overusing these words in your academic writing. Now is a good time to start experimenting with the interface. Start by asking yourself what aspects of your corpus, e.g., what grammatical patterns, you are interested in learning more about. Then try to use all five of the tools just reviewed to conduct some mini-investigations. As you are conducting searches, it is natural for your observations to lead you to new questions and thus to new observations. Follow this course of exploration for a while. Frequently, our most interesting observations, questions, and insights about language come about while we are investigating some other area of language! The exercises on the following pages will help you work through searches that might be relevant to your own research questions.
Use the following method to cite/reference AntConc according to the APA style guide: Anthony, L. (YEAR OF RELEASE). AntConc (Version VERSION NUMBER) [Computer Software]. Tokyo, Japan: Waseda University. Available from http://www.antlab.sci.waseda.ac.jp/
Anthony, L. (2011). AntConc (Version 3.2.2) [Computer Software]. Tokyo, Japan: Waseda University. Available from http://www.antlab.sci.waseda.ac.jp/ 3
Practicing Corpus Analysis with AntConc 2
The following steps demonstrate the capabilities of AntConc which can be used with any corpus of text files. You might have an existing corpus of text files which you’d like to use here, or you may need to develop your own. These steps will show you how to compare two separate corpora, so you will need to have at least two separate folders containing one or more text files. For this exercise, you might create a folder with one or more student papers, saved as .txt files, and one or more of your own papers, also saved as .txt files. Any files you are interested in searching will work for the purpose of this exercise.
AntConc has a very intuitive interface. It’s best simply to explore it. It can be helpful to work through these steps with someone as you learn about the capabilities of this software.
Load your first corpus of text files into AntConc. F ILE > O PEN D IR … 2.
Make a frequency list of word types in this corpus. W ORD L IST S S ORT BY TART F REQ
Questions to consider:
How many words are in the corpus (number of word tokens)? What words are most frequent? What might the top 20 words tell you about this corpus? Clone the results (save window) and keep the window on the screen to the side. Empty the working set. F ILE > C LOSE ALL FILES Load your second corpus. F ILE > O PEN D IR … 5.
Make a frequency list of word types W ORD L IST S S ORT BY TART F REQ 2 Materials developed by Nick Ellis, Professor of Psychology and Linguistics at the University of Michigan and adapted by Justine Neiderhiser ( [email protected]
) for CCCC 2013. 4
Questions to consider:
How many words (number of word tokens) are in the corpus? What words are most frequent? What might the top 20 words tell you about this corpus? Clone the results (save window) and keep the window on the screen to the side. From comparing the two frequency lists, make some observations of the differences between your two corpora. These observations will likely connect to differences between the corpora you selected. If you compared two different genres, for instance, you might make observations about generic differences between your texts. If you chose texts from writers with different levels of experience, you might make observations about differences between markers of experienced and novice writers. Sort your second corpus alphabetically. S ORT BY W ORD S ORT Pick a word that you’d like to explore. Scroll up and down. How often does that word occur? What other variants of the word do you see in the corpus? Try to select a word that has multiple forms, for instance, something like
cause, caused, causing.
Click on the word you have selected.
The Concordance window will open and show you a KWIC (Key Word in Context) view of all occurrences.
Sort these occurrences by level: one word to the right, two words to the right, three words to the right. K WIC S ORT 1R 2R 3R S ORT Go back to Word List. Click on a variant of that word, if applicable. The Concordance window will open and show you a KWIC (Key Word in Context) view of all occurrences.
Sort these occurrences by level: one word to the right, two words to the right, three words to the right. K WIC S ORT 1R 2R 3R S ORT Go back to Word List. Try repeating this search with other variants you notice.
Questions to consider:
What are you noticing about the use of this word in your corpus? By considering the contexts within which this word appears, you are actually examining
Searching the full corpus F ILE > C LOSE ALL FILES F ILE > O PEN D IR …CORPUS 1 F ILE > O PEN D IR …CORPUS 2
In the following searches, you will be considering both of your corpora together. To begin, generate a word list of the most frequently occurring words in your corpora. W ORD L IST S ORT BY F REQ S TART 9.
In the Concordance window, search for all forms of your word by using a wildcard (*). For instance, if you were searching
cause, caused, causing
you could find words that begin caus… S EARCH T ERM (Consider wildcards:
) S TART S ORT Consider further the semantic prosody of the word you are searching by comparing its context with the context around other words that might convey the same (or similar) meaning. For example, if you were considering
you might compare its use to the use of
grow, lead to
, etc. in your corpus.
Questions to consider:
What seems unique about the way your word is being used in this corpus? What meanings does it seem to be used to communicate? How does this compare to other similar words in the corpus? Examining collocations can help you summarize the context around your search term statistically. C OLLOCATES S EARCH T ERM
EXAMPLE: CAUS *
F ROM 0 TO 4R M IN C OLLOCATE F REQ 4 S ORT BY S TAT S TART So what are the significant collocates of your term? S ORT BY F REQ (R) S ORT 6
To see the text you are analyzing in context, use File View. F ILE > C LOSE ALL FILES F ILE > O PEN D IR …C ORPUS 1 W ORD L IST F ILE V IEW Click on any of the files in the left window. You will see the original text and will be able to scroll around it to capture as much context as you like. S EARCH T ERM This feature allows you to highlight all the instances of the word in the full context. Type in any word you’d like to explore in this box, and you will see how it is distributed across the text. 11.
The Concordance Plot allows you to see the distribution of particular terms across multiple texts. C ONCORDANCE P LOT S EARCH T ERM 12.
The Clusters/N-Grams tool will allow you to patterns in the phraseology of your corpus. These patterns can be genre-specific, so comparing them across corpora of different genres can be a generative space for research. F ILE > C LOSE ALL FILES Now to identify the 3, 4, and 5-word formulas of language in your corpus. F ILE > O PEN D IR …C ORPUS 1 ` C S M LUSTERS /N GRAMS EARCH TERM N-G IN RAM F S REQ IZE 10 C M IN 3 HECK N M -G AX RAMS 3 S ORT BY F REQ Clone Results (save window) and put on the side. Repeat the steps above, except increase the size of your phrase to four tokens. N-G RAM S IZE M IN 4 M AX 4 Clone Results (save window) and put on the side. Repeat the steps above, except increase the size of your phrase to five tokens. N-G RAM S IZE M IN 5 M AX 5 Clone Results (save window) and put on the side.
Questions to consider:
What do you observe about the phraseology of the language in your corpus? What might this tell you about genre? 7
Now do the 3 steps under 12 above, but for your second corpus. F ILE > C LOSE ALL FILES F ILE > O PEN D IR …C ORPUS 2
Questions to consider:
What do you observe about the phraseology of the language in this corpus? What do you observe about differences between the phraseology of the language in both of your corpora? 13.
The Keyword List allows you to identify what words are unique to a particular corpus. This is best done by comparing your corpus to a much larger and diverse corpus of language. For now, you can practice working with these tools by simply comparing your corpora to one another. Set up the target corpus. F ILE > O PEN D IR …C ORPUS 1 W ORD L IST S ORT BY F REQ S TART Set up the comparison or background a.
Settings>Tool Preferences>KeyWord List A DD D IRECTORY …C ORPUS 2 b.
You can repeat this process for any additional corpora you would like to use as a c.
background for comparison. L OAD A PPLY K EYWORD LIST S TART These are sorted by Keyness (log-likelihood).
Questions to consider:
What is the nature of the vocabulary in your corpus? What words show a higher frequency in this corpus? S ORT BY F REQ S ORT What is Keyness telling you? What is Frequency telling you? 8
You can also use AntConc to deal with sets of words, for example in semantic sets. To do this, you might develop a word list that is particularly interesting through qualitative analysis, then supplement that analysis with quantitative data. To do this, start by choosing a word list that focuses upon relevant words. For instance, if you wanted to search for prescriptive language, you might create a wordlist of the following verbs:
need, require, requires, required, must, demand, ought, should, obliged,
etc. Saves these words in a text file. To count instances of these words in a set of files: First, set up the target genre. F ILE > O PEN D IR …C ORPUS 1 W ORD L IST Advanced Use search terms from list below Load File (FILE NAME) Apply S ORT BY F REQ S TART Then, to see the uses of these words: C ONCORDANCE Advanced S TART Use search terms from list below Load File (FILE NAME) Apply Kwic sort Level 1 0 Level 2 1R S ORT C OLLOCATES Advanced S TART Use search terms from list below Load File (FILE NAME) Apply 9
` C LUSTERS /N GRAMS Advanced Use search terms from list below Load File (FILE NAME) Apply N-G RAM S IZE M IN 3 M AX 5 M IN F REQ 1 S ORT BY F REQ
15. AntConc will also allow you to search for word sequences, such as phrases within phrases. For example, if you wanted to explore how
the point of
is used in your corpus, as in
from the point of view
…, you can do this using the N-grams tool. C ONCORDANCE S EARCH T ERM (Example:
the point of
) S TART Kwic sort S ORT Level 1 0 Level 2 1R Level 3 1L This will allow you to see the words collocated with the phrase you have searched. In addition, you can search for slot and frame patterns to see what words appear within particular structures you specify. C ONCORDANCE S EARCH T ERM (Examples:
is *ed, the * man,
etc.) S TART Kwic sort Level 1 0 Level 2 1R Level 3 1L S ORT This will allow you to see the words that appear in the wildcard (*) slots of the structures you specify. 10
16. AntConc can also be used for searches dealing with multiple frames. For instance, if you are interested in the use of the passive in academic texts, you could search some of the frames which use passive voice, like:
C ONCORDANCE Advanced Use search terms from list below is * by Apply was * by are * by S TART Kwic sort S ORT Level 1 0 Level 2 1R This search will generate instances of phrases framed by the terms you specify around the wildcard (*). 17. AntConc allows you to save your outputs in order to enter them into statistics programs for calculation of contingency, collostructional analysis, etc. Most of these outputs will be generated as .txt files. C OLLOCATE Search term (Example:
) Window span 0 to 1R Sort by stat File>Save output to text file These tab-separated files can be imported into Excel for sorting, merging, statistical analysis, etc. They can then serve as input to programs like Gries’ collocate 11