Handout: Getting Started with AntConc

advertisement

WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

Saturday, October 12

Exploring “Freedom” in First-Year Writing: A Corpus Approach for Comparing

Instructor and Student-Generated Feedback

Justine Neiderhiser, University of Michigan

Getting Started with AntConc 1

Concordance programs (or “concordancers”) help you investigate patterns of language use across a large number of texts. AntConc, like other concordancers, allows you to search for all instances of a particular item (e.g., the word this or the phrase in order to ). You can also use AntConc to find: collocations (words that frequently co-occur such as crystal and clear ); word frequencies (a list of the most frequently occurring words in the corpus); and frequently appearing word clusters (e.g., three, four, or five-word clusters such as the fact that , on the other hand , or due to the fact that .).

AntConc was developed by the corpus linguist Laurence Anthony and can be downloaded for free from his website: http://www.antlab.sci.waseda.ac.jp/software.html

. There are also helpful tutorial videos available halfway down this page: http://www.antlab.sci.waseda.ac.jp/antconc_index.html

.

Stage 1: Installing AntConc and uploading your corpus

1.

To download AntConc, go to: http://www.antlab.sci.waseda.ac.jp/software.html

2.

Download the version appropriate to your platform. If you’re using Windows, click on

AntConc 3.2.4w

. In the pop up box, click on the Save option. Then choose to Save to your desktop. If you’re using a Mac, scroll down and click on AntConc 3.2.4m.

3.

Once you’ve downloaded the program, open it by clicking on the icon.

Note: You may be prompted to run the program, so click run.)

4.

You are now ready to upload your corpus into the program.

Note: AntConc will not allow you to upload Word or PDF files, so before uploading you first need to convert all your documents to TXT (or plain text) files and give them consistent file titles. To convert MS Word documents to TXT files, open the doc (or docx) file and click Save As, then Other Formats. Then choose Plain Text under

“Save as type.” For larger projects, you may want to download a program for about

$20 that will allow you to convert all of your files at one time. To get the program, search for “MS Word Export To Multiple Text Files Software 7.0,” download the free trial version, then purchase the full version if you want to be able to actually use the

TXT files it generates.

5.

To upload your corpus, click on File , which is located at the top left of the screen. Then click

Open Dir

… In the window that appears, find the folder that contains your corpus, then highlight it, then click OKAY.

Your corpus should now be uploaded into AntConc.

1 Materials developed by Zak Lancaster, Assistant Professor of English at Wake Forest University.

1

WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

Congratulations! You are ready to begin concordancing.

Note: Running vertically at the left of the interface are all the files in your corpus. If you uploaded ten papers, for example, you will see ten files. The number 10 will also appear in the Total No. box at the bottom left. Running horizontally at the top of the page are seven tabs: Concordance, Concordance Plot, File View, Clusters, Collocates, Word List, and

Keyword List. These are your different search tools. Below, the tools that you will use most frequently are reviewed.

Stage 2: Conducting searches

1.

You can use the Concordance tool to search for a specific language item. For example, you may be curious about how the word however is used in your corpus. To find out, you could type however in the “Search Term” box toward the bottom of the page and find out how many concordance hits were retrieved. a.

Now you can begin to look for patterns. You might want to consider whether the word appears at the beginning of the sentence, the end, or in the middle. For each hit, you will be presented with minimal context—just the clause that contains the item. If you click directly on the highlighted item, you will be taken to the whole document.

Often, you will need to consider this larger context carefully when analyzing the function of a particular language feature. b.

To accelerate your analyses, you can use the sort tab below the “search term” box.

What this tool will do is (a) sort the words to the right or the left of your search item according to alphabetic order and also (b) highlight the words to the right or to the left of the item. To use this tool, first adjust the levels that are located at the very bottom of the screen. If you’re interested in words that appear to the left of however , then adjust the levels to 1L and 2L and 3L. Then press sort . This will highlight (in different colors) the first, second, and third words to the left of however .

2.

The Concordance Plot tool will show you (a) the particular file or files where your search item is found and (b) the exact location of the item in each file. Notice that each little vertical line represents one hit. (If you click on that hit, you will be taken to its location in the document.) This tool is useful for linking particular language items with their typical locations within a piece of writing. For example, you may discover that sentence-opening

However occurs more frequently at the beginning of research articles than in the middle or end. Or you may discover that self mentions (e.g., “I” or “my argument”) occur commonly toward the end of an introductory section.

3.

You have several options for using the Clusters tool. As one option, you can research how your particular item in question (for example, however ) appears in two or three or four-word clusters (e.g., However, I think or In my view, however ). You can adjust the cluster size toward the bottom right: adjust the minimum size to 3 if you’re only interested in clusters of

3 or more words; adjust the maximum size to 5 if you want no more than 5-word clusters. As

2

WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu) another option, you can research the most frequently appearing clusters of words in your corpus, without selecting a particular item to research. To do this, you need to tick the N-

Grams box below the window. Then adjust the N-Gram size (or numbers of words in each cluster) to your preference. A good place to begin is with a minimum of three words and maximum of 5 words. (6-word clusters are quite rare.) Finally, you should also adjust the

Minimum N-Gram frequency. Since your corpus is small, you can set this to 5 or 6. This means a cluster will be retrieved only if it appears at least 5 or 6 times in your corpus.

4.

The Collocates tool performs a similar operation as the Clusters tool. The difference is that, while the Clusters tool retrieves strings of words that occur together in a series, the

Collocates tool retrieves words that are most frequently associated with your search item. For example, the phrases I believe that and the fact that may be retrieved as frequently appearing

3-word clusters , but we would not say that the words the and fact are collocates . An example of collocates are high and probability . These two words frequently co-occur (e.g., There is a high probability that it will remain sunny tomorrow), but high and chance are not collocates.

We would not say there is a “high chance” that something will happen; instead we might say there is a “good chance” that something will happen. Thus good and chance are collocates.

5.

The Word List tool creates a list of the most frequently occurring words in your corpus.

(Most likely, the definite article the is at the top of the list in your corpus.) This tool can be very useful for many research purposes. When analyzing your own writing, for example, it can help you to identify words that you may be overusing . If, for example, the intensifiers really and very are high on your list, you may be overusing these words in your academic writing.

Now is a good time to start experimenting with the interface. Start by asking yourself what aspects of your corpus, e.g., what grammatical patterns, you are interested in learning more about. Then try to use all five of the tools just reviewed to conduct some mini-investigations.

As you are conducting searches, it is natural for your observations to lead you to new questions and thus to new observations. Follow this course of exploration for a while. Frequently, our most interesting observations, questions, and insights about language come about while we are investigating some other area of language! The exercises on the following pages will help you work through searches that might be relevant to your own research questions.

Citing/Referencing AntConc

Use the following method to cite/reference AntConc according to the APA style guide:

Anthony, L. (YEAR OF RELEASE). AntConc (Version VERSION NUMBER) [Computer Software]. Tokyo,

Japan: Waseda University. Available from http://www.antlab.sci.waseda.ac.jp/

Example: Anthony, L. (2011). AntConc (Version 3.2.2) [Computer Software]. Tokyo, Japan: Waseda

University. Available from http://www.antlab.sci.waseda.ac.jp/

3

WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

Practicing Corpus Analysis with AntConc 2

The following steps demonstrate the capabilities of AntConc which can be used with any corpus of text files. You might have an existing corpus of text files which you’d like to use here, or you may need to develop your own. These steps will show you how to compare two separate corpora, so you will need to have at least two separate folders containing one or more text files. For this exercise, you might create a folder with one or more student papers, saved as .txt files, and one or more of your own papers, also saved as .txt files. Any files you are interested in searching will work for the purpose of this exercise.

Exploring AntConc

AntConc has a very intuitive interface. It’s best simply to explore it. It can be helpful to work through these steps with someone as you learn about the capabilities of this software.

STEPS

1.

Load your first corpus of text files into AntConc.

F

ILE

> O

PEN

D

IR

2.

Make a frequency list of word types in this corpus.

W

ORD

L

IST

S

ORT BY

F

REQ

S

TART

Questions to consider:

How many words are in the corpus (number of word tokens)?

What words are most frequent? What might the top 20 words tell you about this corpus?

Clone the results (save window) and keep the window on the screen to the side.

3.

Empty the working set.

F

ILE

> C

LOSE ALL FILES

4.

Load your second corpus.

F ILE > O PEN D IR

5.

Make a frequency list of word types

W

ORD

L

IST

S

ORT BY

F

REQ

S TART

2 Materials developed by Nick Ellis, Professor of Psychology and Linguistics at the University of Michigan and adapted by Justine Neiderhiser ( janeider@umich.edu

) for CCCC 2013.

4

WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

Questions to consider:

How many words (number of word tokens) are in the corpus?

What words are most frequent? What might the top 20 words tell you about this corpus?

Clone the results (save window) and keep the window on the screen to the side.

6.

From comparing the two frequency lists, make some observations of the differences between your two corpora. These observations will likely connect to differences between the corpora you selected. If you compared two different genres, for instance, you might make observations about generic differences between your texts. If you chose texts from writers with different levels of experience, you might make observations about differences between markers of experienced and novice writers.

7.

Sort your second corpus alphabetically.

S

ORT BY

W

ORD

S

ORT

Pick a word that you’d like to explore. Scroll up and down. How often does that word occur?

What other variants of the word do you see in the corpus? Try to select a word that has multiple forms, for instance, something like cause, caused, causing.

Click on the word you have selected.

The Concordance window will open and show you a

KWIC (Key Word in Context) view of all occurrences.

Sort these occurrences by level: one word to the right, two words to the right, three words to the right.

K

WIC

S

ORT

1R 2R 3R S

ORT

Go back to Word List. Click on a variant of that word, if applicable.

The Concordance window will open and show you a KWIC (Key Word in Context) view of all occurrences.

Sort these occurrences by level: one word to the right, two words to the right, three words to the right.

K WIC S ORT 1R 2R 3R S ORT

Go back to Word List. Try repeating this search with other variants you notice.

Questions to consider:

What are you noticing about the use of this word in your corpus?

By considering the contexts within which this word appears, you are actually examining semantic prosody.

5

WIDE-EMU 2013

8.

Searching the full corpus

F ILE > C LOSE ALL FILES

F

ILE

> O

PEN

D

IR

…CORPUS 1

F

ILE

> O

PEN

D

IR

…CORPUS 2

Justine Neiderhiser (janeider@umich.edu)

In the following searches, you will be considering both of your corpora together. To begin, generate a word list of the most frequently occurring words in your corpora.

W

ORD

L

IST

S

ORT BY

F

REQ

S

TART

In the Concordance window, search for all forms of your word by using a wildcard (*). For instance, if you were searching cause, caused, causing you could find words that begin caus…

S

EARCH

T

ERM

(Consider wildcards: caus* )

S

TART

S

ORT

Consider further the semantic prosody of the word you are searching by comparing its context with the context around other words that might convey the same (or similar) meaning. For example, if you were considering cause, you might compare its use to the use of bring about , grow, lead to , produce , create , etc. in your corpus.

Questions to consider:

What seems unique about the way your word is being used in this corpus?

What meanings does it seem to be used to communicate?

How does this compare to other similar words in the corpus?

9.

Examining collocations can help you summarize the context around your search term statistically.

C

OLLOCATES

S

EARCH

T

ERM

EXAMPLE:

CAUS

*

F ROM 0 TO 4R

M IN C OLLOCATE F REQ 4

S ORT BY S TAT

S TART

So what are the significant collocates of your term?

S

ORT BY

F

REQ

(R)

S ORT

6

WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

10.

To see the text you are analyzing in context, use File View.

F ILE > C LOSE ALL FILES

F

ILE

> O

PEN

D

IR

…C

ORPUS

1

W

ORD

L

IST

F

ILE

V

IEW

Click on any of the files in the left window. You will see the original text and will be able to scroll around it to capture as much context as you like.

S

EARCH

T

ERM

This feature allows you to highlight all the instances of the word in the full context. Type in any word you’d like to explore in this box, and you will see how it is distributed across the text.

11.

The Concordance Plot allows you to see the distribution of particular terms across multiple texts.

C ONCORDANCE P LOT

S

EARCH

T

ERM

12.

The Clusters/N-Grams tool will allow you to patterns in the phraseology of your corpus. These patterns can be genre-specific, so comparing them across corpora of different genres can be a generative space for research.

F ILE > C LOSE ALL FILES

Now to identify the 3, 4, and 5-word formulas of language in your corpus.

F

ILE

> O

PEN

D

IR

…C

ORPUS

1

C LUSTERS /NGRAMS

S EARCH TERM C HECK N -G RAMS

N-G RAM S IZE M IN 3 M AX 3

` M

IN

F

REQ

10

S

ORT BY

F

REQ

Clone Results (save window) and put on the side.

Repeat the steps above, except increase the size of your phrase to four tokens.

N-G RAM S IZE M IN 4 M AX 4

Clone Results (save window) and put on the side.

Repeat the steps above, except increase the size of your phrase to five tokens.

N-G

RAM

S

IZE

M

IN

5 M

AX

5

Clone Results (save window) and put on the side.

Questions to consider:

What do you observe about the phraseology of the language in your corpus?

What might this tell you about genre?

7

WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

Now do the 3 steps under 12 above, but for your second corpus.

F ILE > C LOSE ALL FILES

F

ILE

> O

PEN

D

IR

…C

ORPUS

2

Questions to consider:

What do you observe about the phraseology of the language in this corpus?

What do you observe about differences between the phraseology of the language in both of your corpora?

13.

The Keyword List allows you to identify what words are unique to a particular corpus. This is best done by comparing your corpus to a much larger and diverse corpus of language. For now, you can practice working with these tools by simply comparing your corpora to one another.

Set up the target corpus.

F

ILE

> O

PEN

D

IR

…C

ORPUS

1

W ORD L IST

S

ORT BY

F

REQ

S

TART

Set up the comparison or background a.

Settings>Tool Preferences>KeyWord List A DD D IRECTORY

…C

ORPUS 2 b.

You can repeat this process for any additional corpora you would like to use as a background for comparison. c.

L

OAD d.

A

PPLY

K EYWORD LIST

S TART

These are sorted by Keyness (log-likelihood).

Questions to consider:

What is the nature of the vocabulary in your corpus? What words show a higher frequency in this corpus?

S

ORT BY

F

REQ

S ORT

What is Keyness telling you? What is Frequency telling you?

8

WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

14.

You can also use AntConc to deal with sets of words, for example in semantic sets. To do this, you might develop a word list that is particularly interesting through qualitative analysis, then supplement that analysis with quantitative data. To do this, start by choosing a word list that focuses upon relevant words. For instance, if you wanted to search for prescriptive language, you might create a wordlist of the following verbs: need, require, requires, required, must, demand, ought, should, obliged, etc.

Saves these words in a text file.

To count instances of these words in a set of files:

First, set up the target genre.

F

ILE

> O

PEN

D

IR

…C

ORPUS

1

W ORD L IST

Advanced

Use search terms from list below

Load File (FILE NAME)

Apply

S ORT BY F REQ

S TART

Then, to see the uses of these words:

C

ONCORDANCE

Advanced

S

TART

Use search terms from list below

Load File (FILE NAME)

Apply

S ORT

Kwic sort

Level 1 0

Level 2 1R

C

OLLOCATES

Advanced

S

TART

Use search terms from list below

Load File (FILE NAME)

Apply

9

WIDE-EMU 2013

`

C LUSTERS /NGRAMS

Advanced

Use search terms from list below

Load File (FILE NAME)

Apply

N-G RAM S IZE M IN 3 M AX 5

M IN F REQ 1

S ORT BY F REQ

Justine Neiderhiser (janeider@umich.edu)

15. AntConc will also allow you to search for word sequences, such as phrases within phrases.

For example, if you wanted to explore how the point of is used in your corpus, as in from the point of view

…, you can do this using the N-grams tool.

C ONCORDANCE

S EARCH T ERM (Example: the point of )

S

TART

Kwic sort

S

ORT

Level 1 0

Level 2 1R

Level 3 1L

This will allow you to see the words collocated with the phrase you have searched.

In addition, you can search for slot and frame patterns to see what words appear within particular structures you specify.

C

ONCORDANCE

S

EARCH

T

ERM

(Examples: is *ed, the * man, etc.)

S TART

Kwic sort

Level 1 0

Level 2 1R

Level 3 1L

S

ORT

This will allow you to see the words that appear in the wildcard (*) slots of the structures you specify.

10

WIDE-EMU 2013 Justine Neiderhiser (janeider@umich.edu)

16. AntConc can also be used for searches dealing with multiple frames. For instance, if you are interested in the use of the passive in academic texts, you could search some of the frames which use passive voice, like: was|are|is verb by, was|are|is verb in, was|are|is verb on.

C

ONCORDANCE

Advanced

Use search terms from list below is * by was * by are * by

Apply

S TART

Kwic sort

S

ORT

Level 1 0

Level 2 1R

This search will generate instances of phrases framed by the terms you specify around the wildcard (*).

17. AntConc allows you to save your outputs in order to enter them into statistics programs for calculation of contingency, collostructional analysis, etc. Most of these outputs will be generated as .txt files.

C OLLOCATE

Search term (Example: give )

Window span 0 to 1R

Sort by stat

File>Save output to text file

These tab-separated files can be imported into Excel for sorting, merging, statistical analysis, etc. They can then serve as input to programs like Gries’ collocate

11

Download