From small words to big ideas: semantic sequences in humanities writing Nicholas Groom Centre for English Language Studies University of Birmingham Agenda Agenda • My research interests and concerns • Methodological arguments • Examples of data revealed by methodology Discourse analysis Discourse analysis • interaction analysis (pragmatics, CA, IS) Discourse analysis • interaction analysis (pragmatics, CA, IS) • values analysis (CDA, CT) Discourse analysis • interaction analysis (pragmatics, CA, IS) • values analysis (CDA, CT) • ≈ Context of Situation vs. Context of Culture Discourse analysis • interaction analysis (pragmatics, CA, IS) • values analysis (CDA, CT) • ≈ Context of Situation vs. Context of Culture • both approaches involve text analysis Discourse analysis • interaction analysis (pragmatics, CA, IS) • values analysis (CDA, CT) • ≈ Context of Situation vs. Context of Culture • both approaches involve text analysis • text analysis = individual texts or corpora Discourse analysis • interaction analysis (pragmatics, CA, IS) • values analysis (CDA, CT) • ≈ Context of Situation vs. Context of Culture • both approaches involve text analysis • text analysis = individual texts or corpora • Today’s talk: ‘corpus-driven’ values analysis Phraseology and epistemology in humanities writing Phraseology and epistemology in humanities writing Phraseology and epistemology in humanities writing • Social and probabilistic: • “the preferred way of saying things in a particular discourse” (Gledhill 2000) • “the tendency of words to occur in preferred sequences” (Hunston 2002) Phraseology and epistemology in humanities writing Phraseology and epistemology in humanities writing • sociological rather than philosophical: • how knowledge is conceptualized, produced and reproduced within particular communities Phraseology and epistemology in humanities writing Phraseology and epistemology in humanities writing • Journal articles in the fields of history and literary criticism • HistArt (3.2 million words) • LitArt (4.0 million words) Epistemological variation across academic disciplines Epistemological variation across academic disciplines • Kuhn • Toulmin • Whitley • Biglan • Kolb • Becher (1987, 1989, 1994; Becher & Trowler 2001; Neumann et al 2002) Epistemological variation across academic disciplines Epistemological variation across academic disciplines hard soft Epistemological variation across academic disciplines hard pure applied soft Physics History Engineering Education hard pure applied soft Physics History Engineering Education hard Physics pure applied soft Physics History Engineering Education hard Physics pure applied History soft Physics History Engineering Education hard Physics Engineering pure applied History soft Physics History Engineering Education hard Physics Engineering pure applied History Education soft Characteristics of knowledge domains Characteristics of knowledge domains • ‘Soft-pure’ disciplines (e.g. History, LitCrit): reiterative; holistic; concerned with particulars, qualities, complication; goal = understanding/interpretation. Epistemology phraseology • If [my] general thesis … is tenable, one would expect differences in fields of knowledge to be reflected in differences in linguistic form: and by the same token, differences in linguistic form to signify differences in fields of knowledge (Becher 1987: 261). Problem Problem • Difficult (impossible?) to draw up an a priori list of language features that express such concepts as reiterativeness, holism, particularism etc. Problem • Difficult (impossible?) to draw up an a priori list of language features that express such concepts as reiterativeness, holism, particularism etc. • Even if we could, would it be a good idea to do this? Problem • Difficult (impossible?) to draw up an a priori list of language features that express such concepts as reiterativeness, holism, particularism etc. • Even if we could, would it be a good idea to do this? • Need an inductive (corpus-driven) rather than a deductive (corpus-based) methodology (±) Inductive approaches to identifying phraseology (±) Inductive approaches to identifying phraseology • lexical bundles/clusters/chains/n-grams (Biber, Scott, Stubbs, Fletcher) (±) Inductive approaches to identifying phraseology • lexical bundles/clusters/chains/n-grams (Biber, Scott, Stubbs, Fletcher) • collocational frameworks (Renouf & Sinclair, Butler, Luzon Marco) (±) Inductive approaches to identifying phraseology • lexical bundles/clusters/chains/n-grams (Biber, Scott, Stubbs, Fletcher) • collocational frameworks (Renouf & Sinclair, Butler, Luzon Marco) • chains-and-frames analysis (Mason) (±) Inductive approaches to identifying phraseology • lexical bundles/clusters/chains/n-grams (Biber, Scott, Stubbs, Fletcher) • collocational frameworks (Renouf & Sinclair, Butler, Luzon Marco) • • chains-and-frames analysis (Mason) collostructions (Stefanowitsch & Gries) (±) Inductive approaches to identifying phraseology • lexical bundles/clusters/chains/n-grams (Biber, Scott, Stubbs, Fletcher) • collocational frameworks (Renouf & Sinclair, Butler, Luzon Marco) • • • chains-and-frames analysis (Mason) collostructions (Stefanowitsch & Gries) concgrams (Cheng, Greaves & Warren) (±) Inductive approaches to identifying phraseology • lexical bundles/clusters/chains/n-grams (Biber, Scott, Stubbs, Fletcher) • collocational frameworks (Renouf & Sinclair, Butler, Luzon Marco) • • • • chains-and-frames analysis (Mason) collostructions (Stefanowitsch & Gries) concgrams (Cheng, Greaves & Warren) node, span and collocates (Sinclair) (±) Inductive approaches to identifying phraseology • lexical bundles/clusters/chains/n-grams (Biber, Scott, Stubbs, Fletcher) • collocational frameworks (Renouf & Sinclair, Butler, Luzon Marco) • • • • • chains-and-frames analysis (Mason) collostructions (Stefanowitsch & Gries) concgrams (Cheng, Greaves & Warren) node, span and collocates (Sinclair) keywords (Wordsmith, Antconc) Why keywords? Why keywords? • Insulation from researcher bias Why keywords? • • Insulation from researcher bias Keywords algorithm does not rely on any theory of language Why keywords? • • Insulation from researcher bias • Algorithm very good at selecting important and interesting features (often features that human researcher would never have thought of looking at) Keywords algorithm does not rely on any theory of language Keyword selection Keyword selection • Procedure yields too many items for analysis, so need to select Keyword selection • Procedure yields too many items for analysis, so need to select • Usual strategy: discard closed-class ‘grammatical’ words and proper nouns as a first step, and then topslice or select from remaining list of open-class items Keyword selection • Procedure yields too many items for analysis, so need to select • Usual strategy: discard closed-class ‘grammatical’ words and proper nouns as a first step, and then topslice or select from remaining list of open-class items • Alternative strategy (Gledhill 2000): discard all open-class keywords and focus exclusively on closed-class items Why closed-class keywords? • The cockroach argument Why closed-class keywords? Why closed-class keywords? • The coverage argument Why closed-class keywords? • • The coverage argument argue → argue that ... Why closed-class keywords? • The coverage argument • argue → • that → argue that ... Why closed-class keywords? • The coverage argument • argue → • that → • argue that ... argue that, claim that, state that, believe that, maintain that ... Why closed-class keywords? • The coverage argument • argue → • that → argue that ... • argue that, claim that, state that, believe that, maintain that ... • fact that, idea that, belief that, notion that ... Why closed-class keywords? • The coverage argument • argue → • that → argue that ... • argue that, claim that, state that, believe that, maintain that ... • fact that, idea that, belief that, notion that ... • clear that, possible that, ... Why closed-class keywords? • Another coverage argument: • “By far the majority of text is made of the occurrence of common words in common patterns.” (Sinclair 1991: 108). • So not a good idea to exclude the commonest words from the analysis Why closed-class keywords? • Yet another coverage argument: • Distribution of closed-class keywords throughout a keyword list Why closed-class keywords? Why closed-class keywords? • The non-compositional argument: Why closed-class keywords? • The non-compositional argument: • “Most everyday words do not have an independent meaning, or meanings, but are components of a rich repertoire of multi-word patterns that make up text” (Sinclair 1991: 108). Why closed-class keywords? • The non-compositional argument: • “Most everyday words do not have an independent meaning, or meanings, but are components of a rich repertoire of multi-word patterns that make up text” (Sinclair 1991: 108). • What does possible mean? Why closed-class keywords? • The non-compositional argument: • “Most everyday words do not have an independent meaning, or meanings, but are components of a rich repertoire of multi-word patterns that make up text” (Sinclair 1991: 108). • • What does possible mean? It’s possible that she didn’t get the message. Why closed-class keywords? • The non-compositional argument: • “Most everyday words do not have an independent meaning, or meanings, but are components of a rich repertoire of multi-word patterns that make up text” (Sinclair 1991: 108). • • • What does possible mean? It’s possible that she didn’t get the message. It’s possible to leave a message. Why closed-class keywords? Why closed-class keywords? • The non-compositional argument: Why closed-class keywords? • The non-compositional argument: • “Most everyday words do not have an independent meaning, or meanings, but are components of a rich repertoire of multi-word patterns that make up text” (Sinclair 1991: 108). Why closed-class keywords? • The non-compositional argument: • “Most everyday words do not have an independent meaning, or meanings, but are components of a rich repertoire of multi-word patterns that make up text” (Sinclair 1991: 108). • What does possible mean? Why closed-class keywords? • The non-compositional argument: • “Most everyday words do not have an independent meaning, or meanings, but are components of a rich repertoire of multi-word patterns that make up text” (Sinclair 1991: 108). • • What does possible mean? It’s possible + that = ‘maybe’ Why closed-class keywords? • The non-compositional argument: • “Most everyday words do not have an independent meaning, or meanings, but are components of a rich repertoire of multi-word patterns that make up text” (Sinclair 1991: 108). • • • What does possible mean? It’s possible + that = ‘maybe’ It’s possible + to-inf = ‘do-able’ Why closed-class keywords? • The semantic sequences argument “recurring sequences of words and phrases that may be very diverse in form and which are therefore more usefully characterised as sequences of meaning elements rather than as formal sequences” (Hunston 2008: 271) Semantic sequences Semantic sequences • “In winter Hammerfest is a thirty-hour ride by bus from Oslo” (Bryson, in Hoey 2004) a half-hour drive a four-hour flight a two-week trip a three-day journey a two-hour hop an eight-year slog Semantic sequences • “In winter Hammerfest is a thirty-hour ride by bus from Oslo” (Bryson, in Hoey 2004) a half-hour drive a four-hour flight a two-week trip a three-day journey a two-hour hop an eight-year slog • NUMBER + TIME + JOURNEY Semantic sequences • “Hammerfest is a thirty-hour ride by bus from Oslo” • “Ntobeye is a two-hour ride by four wheel drive vehicle from the vast refugee camp at Ngara” Semantic sequences • “Hammerfest is a NUMBER TIME JOURNEY by bus from Oslo” • “Ntobeye is a NUMBER TIME JOURNEY by four wheel drive vehicle from the vast refugee camp at Ngara” Semantic sequences • “Hammerfest is a NUMBER TIME JOURNEY by bus from Oslo” • “Ntobeye is a NUMBER TIME JOURNEY by four wheel drive vehicle from the vast refugee camp at Ngara” Semantic sequences • “PLACE is a NUMBER TIME JOURNEY by bus from Oslo” • “PLACE is a NUMBER TIME JOURNEY by four wheel drive vehicle from the vast refugee camp at Ngara” Semantic sequences • “PLACE is a NUMBER TIME JOURNEY by bus from Oslo” • “PLACE is a NUMBER TIME JOURNEY by four wheel drive vehicle from the vast refugee camp at Ngara” Semantic sequences • “PLACE is a NUMBER TIME JOURNEY by MODE OF TRANSPORT from Oslo” • “PLACE is a NUMBER TIME JOURNEY by MODE OF TRANSPORT from the vast refugee camp at Ngara” Semantic sequences • “PLACE is a NUMBER TIME JOURNEY by MODE OF TRANSPORT from Oslo” • “PLACE is a NUMBER TIME JOURNEY by MODE OF TRANSPORT from the vast refugee camp at Ngara” Semantic sequences • “PLACE is a NUMBER TIME JOURNEY by MODE OF TRANSPORT from PLACE” • “PLACE is a NUMBER TIME JOURNEY by MODE OF TRANSPORT from PLACE” Semantic sequences • “DESTINATION is a NUMBER TIME JOURNEY by MODE OF TRANSPORT from POINT OF DEPARTURE” • “DESTINATION is a NUMBER TIME JOURNEY by MODE OF TRANSPORT from POINT OF DEPARTURE” Semantic sequences • “DESTINATION is a NUMBER TIME JOURNEY by MODE OF TRANSPORT from POINT OF DEPARTURE” • “DESTINATION is a NUMBER TIME JOURNEY by MODE OF TRANSPORT from POINT OF DEPARTURE” Methodology Methodology Methodology • Lists of KWs for each corpus generated through an external comparison with BNC (written) Methodology • Lists of KWs for each corpus generated through an external comparison with BNC (written) • Comparing HistArt and LitArt against each other would only reveal differences between them Methodology • Lists of KWs for each corpus generated through an external comparison with BNC (written) • Comparing HistArt and LitArt against each other would only reveal differences between them • Comparing HistArt and LitArt against a reference corpus of academic writing would only reveal features unique to each corpus Methodology • Lists of KWs for each corpus generated through an external comparison with BNC (written) • Comparing HistArt and LitArt against each other would only reveal differences between them • Comparing HistArt and LitArt against a reference corpus of academic writing would only reveal features unique to each corpus • Exhaustive qualitative concordance analysis of multiple 100-line samples for each KW Results of keywords analysis Results of keywords analysis • 19 words salient in both history and literary criticism: among, and, as, between, beyond, both, in, its, itself, neither, nor, of, such, the, themselves, these, throughout, whose, within Results of keywords analysis • 19 words salient in both history and literary criticism: among, and, as, between, beyond, both, in, its, itself, neither, nor, of, such, the, themselves, these, throughout, whose, within • 13 discipline-specific words: • LitCrit: himself, his, is, might, one’s, though, upon, which • History: against, did, during, their, were How many samples do you need? How many samples do you need? Reiterativeness as text? • the sudden awakening of the party might be interpreted as a desire to give the regime a less dictatorial aspect • they have been viewed too readily as indicating a fixed hostility Reiterativeness as text? Reiterativeness as text? the sudden awakening of the party might be interpreted as a desire to give the regime a less dictatorial aspect they have been viewed too readily as indicating a fixed hostility Reiterativeness as text? the sudden awakening of the party might be interpreted as a desire to give the regime a less dictatorial aspect they have been viewed too readily as indicating a fixed hostility Reiterativeness as text? as DISCIPLINARY ENTITY the sudden awakening of the party might be interpreted as a desire to give the regime a less dictatorial aspect they have been viewed too readily as indicating a fixed hostility Reiterativeness as text? DISCIPLINARY ENTITY CONCEPTUALISING PROCESS as the sudden awakening of the party might be interpreted as a desire to give the regime a less dictatorial aspect they have been viewed too readily as indicating a fixed hostility Reiterativeness as text? DISCIPLINARY ENTITY CONCEPTUALISING PROCESS as CONCEPTUALISATION the sudden awakening of the party might be interpreted as a desire to give the regime a less dictatorial aspect they have been viewed too readily as indicating a fixed hostility Reiterativeness as text? Reiterativeness as text? CONCEPTUALISER CONCEPTUALISING DISCIPLINARY PROCESS ENTITY we can see the moment of revelation Mark Storey describes the scene as CONCEPTUALISATION as a moment of alienation and misery as mundane Reiterativeness as text? Reiterativeness as text? CONCEPTUALISING PROCESS of DISCIPLINARY ENTITY as CONCEPTUALISATION Derrida's conceptualisation of writing as a spatio-temporal structure patristic ideas of pilgrimage as moral reformation Reiterativeness as text? Reiterativeness as text? ANALYST I Other authors ANALYTICAL FRAMING PROCESS want to contextualize have situated DISCIPLINARY ENTITY Jonson's troublesome poem it within CONTEXTUAL FRAME within the physical and cultural environment of early modern London. within the galaxy of federalist movements in Europe Holism as text? Holism as text? SUBORDINATE PHENOMENON Esprit's own evolution between 1956 and 1968 the opening sequence of Longo's Johnny Mnemonic DESCRIPTION OF RELATIONSHIP SUPERORDINATE PHENOMENON itself highlights the deStalinisation crisis in France itself figures the scopophilic fetishism of cinema itself Particularism as text? The case of against in history • Predicted: • n against n Venice took no part in the war against the Normans • v against n extreme competition shaped policies that discriminated against blacks. • v n against n alleged witches and their families also had various strategies that they could employ to defend themselves against rumours and formal accusations of witchcraft. Particularism as text? The case of against in history • Not predicted: background/backdrop • Narrative: • deliberation took place against a changing backdrop of military events • It was against this background that abortion was discussed during the 1930s • What of the normative institutional culture of charity to the dead, the background against which Stoeckhlin's idiosyncratic views were drawn? Particularism as text? The case of against in history • Not predicted: background/backdrop • Argumentative: • Boniface's emphasis on kingship is better understood if viewed against the backdrop of the rhetoric of just authority and good rule that surrounded the conflict. • This description should also be seen against the backdrop of a new guiding principle for Nordic co-operation, termed `Nordic usefulness' (nordisk nytte). • Belgium's `Europeanism' is similarly incomprehensible unless seen against the background of its internal dissensions. The case of both in LitArt The case of both in LitArt • both Ellmann's Joyce transcends both politics and contemporary history The case of both in LitArt • both Ellmann's Joyce transcends both politics and contemporary history • 16% of all instances of both in LitArt express ‘paradoxical’ meanings: In his mind the bridge was both fact and ideal Elizabeth Tudor was both the paragon and the antithesis of the model female. .... the newly defined social sphere, a space that is both private and public. Milford Haven is both unlocateable and a site of dislocation Middleton manipulates the sexual economics that both maintain and undermine the socio-economic status quo. Wales figures for early modern England as that which is both familiar and strange Beyond Becher? Beyond Becher? • Analysis reveals phraseologies expressing epistemological values of humanities: reiterativeness, holism and particularism Beyond Becher? • Analysis reveals phraseologies expressing epistemological values of humanities: reiterativeness, holism and particularism • Many common to both history and lit crit (but different proportions and preferences) Beyond Becher? • Analysis reveals phraseologies expressing epistemological values of humanities: reiterativeness, holism and particularism • Many common to both history and lit crit (but different proportions and preferences) • Also identified: discipline-specific values that do not easily fit Becher model • History = dynamic; groups • LitCrit = static; individual entities within in literary criticism within in literary criticism within in history Summary Summary • CCKW analysis identifies semantic sequences that provide insights into epistemologies of academic discourses Summary • CCKW analysis identifies semantic sequences that provide insights into epistemologies of academic discourses • Very labour-intensive! Summary • CCKW analysis identifies semantic sequences that provide insights into epistemologies of academic discourses • Very labour-intensive! • But hard to see how such sequences could be identified as efficiently and thoroughly using other currently available methods. Summary • CCKW analysis identifies semantic sequences that provide insights into epistemologies of academic discourses • Very labour-intensive! • But hard to see how such sequences could be identified as efficiently and thoroughly using other currently available methods. • CCKW analysis not restricted to academic discourses - could be applied to any specialized discourse for which a representative corpus might be compiled. Moral: Moral: Don’t ignore the little words!