Picturing the key words of a very large corpus and their lexical

advertisement
Picturing the key words of a very large corpus and their lexical upshots
or
Getting at the Guardian’s view of the world
Mike Scott (Liverpool, UK)
This presentation introduces recent work derived from analysis of over 800,000 Guardian
newspaper texts, almost the whole of the Guardian’s output from 1984 to the present. An
extensive key words database has been computed; inter-relationships between the key words
(KWs), based on a modification of the Mutual Information algorithm, are presented. These
will be illustrated with live examples using the software. Some implications for our
understanding of a content-based linguistics are presented. It will be argued that the
relationship of co-keyness is akin to such classic lexical relations as synonymy, antonymy,
etc., but also that the resulting clumps of linked associates provide useful indicator of
stereotype. Further applications for language teaching, and for text retrieval, are noted.
Download