What sort of corpus is the BNC? Monolingual: It deals with modern British English, not other languages used in Britain. However non-British English and foreign language words do occur in the corpus. Synchronic: It covers British English of the late twentieth century, rather than the historical development which produced it. General: It includes many different styles and varieties, and is not limited to any particular subject field, genre or register. In particular, it contains examples of both spoken and written language. Sample: For written sources, samples of 45,000 words are taken from various parts of single-author texts. Shorter texts up to a maximum of 45,000 words, or multi-author texts such as magazines and newspapers, are included in full. Sampling allows for a wider coverage of texts within the 100 million limit, and avoids overrepresenting idiosyncratic texts. British National Corpus You are here:Home/British National Corpus What is British National Corpus? The British National Corpus (BNC) is a 100-million-word collection of samples of a written and spoken language of British English from the later part of the 20th century. The BNC consists of the bigger written part (90 % written part, e.g. newspapers, academic books, letters, essays, etc.) and the smaller spoken part (remaining 10 % spoken part, e.g. informal conversations, radio shows, etc.). The spoken part is also available in the audio format. The corpus texts contain a large amount of information and thus each user can use many search criteria as a time of publication, region captured spoken text, type of media and text domain, or the David Lee’s classification – a detailed genre specification. The full list of genres of this classification is here. The official website: http://www.natcorp.ox.ac.uk Content in detail See the charts and more information about texts in the British National Corpus. Distribution of parts of speech Further information about texts in the corpus Tools to work with British National Corpus A complete set of tools is available to work with the British National Corpus to generate: word sketch– English collocations categorized by grammatical relations thesaurus– synonyms and similar words for every word keywords– terminology extraction of one-word and multi-word units word lists – lists of English nouns, verbs, adjectives etc. organized by frequency n-grams– frequency list of multi-word units concordance– examples in context trends– diachronic analysis automatically identifies neologisms and changes in use Part-of-speech tagsets in BNC Sketch Engine offers BNC tagged with the 2 different POS tagsets: Penn TreeBank tagset tagset used in the CLAWS POS tagger version 5 with specific attributes: An attribute can refer to: A possitional attribute – information added to each token in a corpus, e.g. its lemma or part of speech. A structure attribute – information added to a structure in a corpus, often called metadata view glossary ambtag: the ambivalent part of speech tag (all tags before disambiguation) pos: one-letter abbreviation of the part of speech (the second part of lempos)