Corpora? Give it A Try! Zhang Ze, Allan Vocabulary, as a cornerstone of a language, plays a significant role in our daily reading, listening, writing, and speaking. Most of us accumulate words through extensive reading and seem to have the perception that language is for practicing grammar rules, which is biased. A more important function of language should be communication and exchanging ideas. Thus, when one starts to build up a store of vocabulary, it is important to practice using it in our oral English. Unfortunately, very few vocabulary books or reading materials provides a good reference for oral English. Corpora, however, can make up for this shortage This journal article will mainly focus on using corpora databases readily available on the website to help improve one’s language proficiency. Moreover, I will give advice on using corpora recourses and cover my own vocabulary learning experience using corpora. Resources A corpus is a large database containing a vast amount of structured set of texts. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Thus it provides a special but effective way in learning oral English. The corpus resources are more than sufficient and the majority of them are free for use. You can easily get access to them online. Here are the two largest and most commonly used corpora: Corpus’ Name British National Corpus Linguistic Consortium URL http://www.natcorp.ox.ac.uk/ Data http://www.ldc.upenn.edu/ Description BNC is the largest corpus in the world, finished in 1994, includes more than 100 million words text samples of written and spoken English from a wide range of sources1. LDC is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, vocabulary, and other resources for linguistics research and development purposes2 Though founded by different organizations, these two famous corpora contain the same amount of texts and examples. However, I recommend LDC because of the following reasons. (1). LDC provides more ways to look up a word. For example, when you want to look up a phrase with missing words, you can type the part of speech instead. If you type in convert/V IN (IN stands for preposition, /V is a delimiter with no specific meaning), the following search results can be found: …a local school that had been converted into a disaster… …farmland will be voluntarily converted into wetlands… …a military vehicle converted for civilian use… …He wanted the family to convert from their Episcopal to Catholicism … …products that can be easily converted for other markets... …The base was converted into a manufacturing zone and regional transport… (2). LDC is much more practical, especially when you what to compare two similar words or phrases, you can search for two or more words/phrases at one time, and the corpus will show all the texts which include your key words. If you type in look/V up|down to|upon, the following search results can be found: ...Africans tend to look down on African American... ...I looked up to see a family of five struggling to get to their room... ...It was as if Rush had looked down on his creation -- and then rested ... ...I don't like looking down on people by virtue of the color of their skin... ...other countries look up to the United States as the world’s pre-eminent economy... ...a lot of children look up to famous people before they look up to their parents... ...the hawk could look down on the world from his old vantage... ...I looked up to the American swimmers as role models... Self-Access Language Learning Using Corpus (1). A More Simple Way to Polysemy. (Words which have multiple meanings) I noticed that most of my classmates built their own vocabulary mainly by using vocabulary books, which are very tedious and hard to remember. However, corpora provide us with a better way of learning words. It includes thousands of examples, which are all from latest novels, papers, magazines. Compared to abstract meaning in vocabulary books, these vivid examples really offer us a more effective way in studying polysemy. (2). A More Practical Way to Idioms. Studying Idioms is always one of the most difficult parts in learning English. Although some reading books can provide examples for idioms, these are far from enough to make us proficiency in using a specific idiom well. Sufficient examples contained in a corpus can help us to overcome this difficulty. If you search for have an eye for in LDC, the following sentences would be shown: ...she also had an eye for pretty women whom she like to dominate... ...has an eye for color and a sense of daring about story structure... ...he had an eye for artistic talent... ...These people have an eye for detail... ...has an eye for recognizing gaps... ...She has an eye for choosing objects... ...Gardeners who have an eye for esthetics... And from the above examples, we can easily understand the meaning of ‘have an eye for’, which is have a taste or an inclination for someone or something (3). A Clearer Way to Synonyms (words or phrases with the same meaning) We all have experienced an embarrassing time when confusing synonyms. Because synonyms are hard nuts to crack and the examples and explanation cited in reference books are either dogmatic or far from clear. Corpora, however, give us a more practical solution to our studying of synonym. Corpora can show us the most commonly used model sentence in both oral and written English. When you want to compare persist with insist: ...the condition persists for three months... ...that condition persists for 200 years or more... ...job applicants persists to this day... ...have persisted over the years... ...the rash can persist for weeks... ...the U.S. aid and persist in our policy against drug trafficking... ...Bosnian Serbs persist in refusing to accept the peace settlement... ...The Serbs persist in rejecting it... ...reporters persisted in asking about it... ...system persist in applying a standard that is unjust and unfair... ...insisting the new Bosphorus rules have been under study for year... ...insisting they would not change the bill’s basic dynamic... ...Arafat insisted that he should be the one to go to Bashir... ...officials insisted that the standards were not changed... ...He insisted it was the military’s abusive behavior... ...the North Koreans insisted on leaving the matter for the two leaders to settle... ...he insisted on moving away from a noisy family... ...insist on finding ways to yawn defuse the situation... ...they insist on being sent to a refugee camp... Through comparison, we can easily conclude that “insist” is often used with “opinion or idea” and followed by “on”, while “persist” is followed mostly stresses “duration”. Conclusion In conclusion, Corpora, a relatively new technology in linguistics, is an quite useful and effective way in learning a new language from a practical approach, especially for us students, who have relatively few opportunities to talk with native speaker from time to time. Specifically, this new technology is particularly helpful in learning polysemy, idioms, and synonyms. In the end, nobody is born as a natural linguist. Even the best teacher can hardly promise your improvement in language skill, just as the old saying goes, “Practice makes perfect”. Persistence is the only key to success. References 1. Wikipedia. British National Corpus [online]. [undated] Available: http://en.wikipedia.org/wiki/British_National_Corpus Accessed: 2011 February 27. 2. Wikipedia. Linguistic Data Consortium [online]. [undated] Available: http://en.wikipedia.org/wiki/Linguistic_Data_Consortium Accessed: 2011 February 27.