Abstract: Postgraduate Conference in Corpus Linguistics, Aston University Multi-modal spoken corpus analysis and its relevance for key issues in language description: the case of multi-word expressions Methodologies in corpus linguistics have revolutionised the way in which we study and describe language, allowing us to make objective observations and analyses using a range of written and spoken data from naturally occurring contexts. Yet, most current corpora are only concerned with textual representations and do not take account of other aspects that generate meaning in conjunction with text, such as gestures, prosody and kinesics which all add meaning to utterances and discourse as a whole. Recent research in the area of spoken corpus analysis has started to explore the potential impact of drawing on multi-modal corpus resources on our descriptions of spoken language (see for example Knight et al. 2006). In this paper I contrast a purely text-based analysis of spoken corpora with an analysis which uses the additional parameter of pauses measured and integrated into a multi-modal corpus resource. The unit of analysis I will focus on is that of multiword expressions (MWEs), the very frequent string 'I think' in particular. The description and extraction of MWEs has been a key topic in a variety of areas within applied linguistics and natural language processing for some time, however, there seem to be a number of problems associated with a purely textual and frequency based approach. One of the main problems with computational extraction methods is that we cannot be sure whether corpus-derived MWEs are psycholinguistically valid. In this study I argue that an analysis of the placement of pauses represented within a multi-modal corpus resource can contribute to our understanding of MWE, their boundaries and their psycholinguistic reality. Knight, D. et al. (2006). Beyond the Text: Construction and Analysis of Multi-Modal Linguistic Corpora. 2nd annual international e-Social Science Conference