Corpora Give it a Try

Corpora? Give it A Try!
Zhang Ze, Allan
Vocabulary, as a cornerstone of a language, plays a significant role in our daily
reading, listening, writing, and speaking. Most of us accumulate words through
extensive reading and seem to have the perception that language is for practicing
grammar rules, which is biased. A more important function of language should be
communication and exchanging ideas. Thus, when one starts to build up a store of
vocabulary, it is important to practice using it in our oral English. Unfortunately, very
few vocabulary books or reading materials provides a good reference for oral English.
Corpora, however, can make up for this shortage
This journal article will mainly focus on using corpora databases readily available on
the website to help improve one’s language proficiency. Moreover, I will give advice
on using corpora recourses and cover my own vocabulary learning experience using
A corpus is a large database containing a vast amount of structured set of texts. This
method represents a digestive approach to deriving a set of abstract rules by which a
natural language is governed or else relates to another language. Thus it provides a
special but effective way in learning oral English.
The corpus resources are more than sufficient and the majority of them are free for
use. You can easily get access to them online. Here are the two largest and most
commonly used corpora:
Corpus’ Name
British National Corpus
BNC is the largest corpus in
the world, finished in 1994,
includes more than 100
million words text samples of
written and spoken English
from a wide range of
LDC is an open consortium of
universities, companies and
speech and text databases,
research and development
Though founded by different organizations, these two famous corpora contain the
same amount of texts and examples. However, I recommend LDC because of the
following reasons.
(1). LDC provides more ways to look up a word. For example, when you want to look
up a phrase with missing words, you can type the part of speech instead. If you type
in convert/V IN (IN stands for preposition, /V is a delimiter with no specific meaning),
the following search results can be found:
…a local school that had been converted into a disaster…
…farmland will be voluntarily converted into wetlands…
…a military vehicle converted for civilian use…
…He wanted the family to convert from their Episcopal to Catholicism …
…products that can be easily converted for other markets...
…The base was converted into a manufacturing zone and regional transport…
(2). LDC is much more practical, especially when you what to compare two similar
words or phrases, you can search for two or more words/phrases at one time, and
the corpus will show all the texts which include your key words. If you type in look/V
up|down to|upon, the following search results can be found:
...Africans tend to look down on African American...
...I looked up to see a family of five struggling to get to their room...
...It was as if Rush had looked down on his creation -- and then rested ...
...I don't like looking down on people by virtue of the color of their skin...
...other countries look up to the United States as the world’s pre-eminent economy...
...a lot of children look up to famous people before they look up to their parents...
...the hawk could look down on the world from his old vantage...
...I looked up to the American swimmers as role models...
Self-Access Language Learning Using Corpus
(1). A More Simple Way to Polysemy. (Words which have multiple meanings)
I noticed that most of my classmates built their own vocabulary mainly by using
vocabulary books, which are very tedious and hard to remember. However, corpora
provide us with a better way of learning words. It includes thousands of examples,
which are all from latest novels, papers, magazines. Compared to abstract meaning in
vocabulary books, these vivid examples really offer us a more effective way in
studying polysemy.
(2). A More Practical Way to Idioms.
Studying Idioms is always one of the most difficult parts in learning English. Although
some reading books can provide examples for idioms, these are far from enough to
make us proficiency in using a specific idiom well. Sufficient examples contained in a
corpus can help us to overcome this difficulty. If you search for have an eye for in LDC,
the following sentences would be shown:
...she also had an eye for pretty women whom she like to dominate...
...has an eye for color and a sense of daring about story structure...
...he had an eye for artistic talent...
...These people have an eye for detail...
...has an eye for recognizing gaps...
...She has an eye for choosing objects...
...Gardeners who have an eye for esthetics...
And from the above examples, we can easily understand the meaning of ‘have an eye
for’, which is have a taste or an inclination for someone or something
(3). A Clearer Way to Synonyms (words or phrases with the same meaning)
We all have experienced an embarrassing time when confusing synonyms. Because
synonyms are hard nuts to crack and the examples and explanation cited in reference
books are either dogmatic or far from clear. Corpora, however, give us a more
practical solution to our studying of synonym. Corpora can show us the most
commonly used model sentence in both oral and written English. When you want to
compare persist with insist:
...the condition persists for three months...
...that condition persists for 200 years or more...
...job applicants persists to this day...
...have persisted over the years...
...the rash can persist for weeks...
...the U.S. aid and persist in our policy against drug trafficking...
...Bosnian Serbs persist in refusing to accept the peace settlement...
...The Serbs persist in rejecting it...
...reporters persisted in asking about it...
...system persist in applying a standard that is unjust and unfair...
...insisting the new Bosphorus rules have been under study for year...
...insisting they would not change the bill’s basic dynamic...
...Arafat insisted that he should be the one to go to Bashir...
...officials insisted that the standards were not changed...
...He insisted it was the military’s abusive behavior...
...the North Koreans insisted on leaving the matter for the two leaders to settle...
...he insisted on moving away from a noisy family...
...insist on finding ways to yawn defuse the situation...
...they insist on being sent to a refugee camp...
Through comparison, we can easily conclude that “insist” is often used with “opinion
or idea” and followed by “on”, while “persist” is followed mostly stresses “duration”.
In conclusion, Corpora, a relatively new technology in linguistics, is an quite useful
and effective way in learning a new language from a practical approach, especially for
us students, who have relatively few opportunities to talk with native speaker from
time to time. Specifically, this new technology is particularly helpful in learning
polysemy, idioms, and synonyms. In the end, nobody is born as a natural linguist.
Even the best teacher can hardly promise your improvement in language skill, just as
the old saying goes, “Practice makes perfect”. Persistence is the only key to success.
1. Wikipedia. British National Corpus [online]. [undated] Available: Accessed: 2011 February 27.
2. Wikipedia. Linguistic Data Consortium [online]. [undated] Available: Accessed: 2011 February 27.