From small words to big ideas: semantic sequences in humanities writing Nicholas Groom

advertisement
From small words to big ideas:
semantic sequences in
humanities writing
Nicholas Groom
Centre for English Language Studies
University of Birmingham
Agenda
Agenda
• My research interests and concerns
• Methodological arguments
• Examples of data revealed by methodology
Discourse analysis
Discourse analysis
• interaction analysis (pragmatics, CA, IS)
Discourse analysis
• interaction analysis (pragmatics, CA, IS)
• values analysis (CDA, CT)
Discourse analysis
• interaction analysis (pragmatics, CA, IS)
• values analysis (CDA, CT)
• ≈ Context of Situation vs. Context of
Culture
Discourse analysis
• interaction analysis (pragmatics, CA, IS)
• values analysis (CDA, CT)
• ≈ Context of Situation vs. Context of
Culture
• both approaches involve text analysis
Discourse analysis
• interaction analysis (pragmatics, CA, IS)
• values analysis (CDA, CT)
• ≈ Context of Situation vs. Context of
Culture
• both approaches involve text analysis
• text analysis = individual texts or corpora
Discourse analysis
• interaction analysis (pragmatics, CA, IS)
• values analysis (CDA, CT)
• ≈ Context of Situation vs. Context of
Culture
• both approaches involve text analysis
• text analysis = individual texts or corpora
• Today’s talk: ‘corpus-driven’ values analysis
Phraseology and epistemology
in humanities writing
Phraseology and epistemology
in humanities writing
Phraseology and epistemology
in humanities writing
• Social and probabilistic:
• “the preferred way of saying things in a
particular discourse” (Gledhill 2000)
• “the tendency of words to occur in
preferred sequences” (Hunston 2002)
Phraseology and epistemology
in humanities writing
Phraseology and epistemology
in humanities writing
• sociological rather than philosophical:
• how knowledge is conceptualized,
produced and reproduced within
particular communities
Phraseology and epistemology
in humanities writing
Phraseology and epistemology
in humanities writing
• Journal articles in the fields of history
and literary criticism
• HistArt (3.2 million words)
• LitArt (4.0 million words)
Epistemological variation
across academic disciplines
Epistemological variation
across academic disciplines
• Kuhn
• Toulmin
• Whitley
• Biglan
• Kolb
• Becher (1987, 1989, 1994; Becher & Trowler
2001; Neumann et al 2002)
Epistemological variation
across academic disciplines
Epistemological variation
across academic disciplines
hard
soft
Epistemological variation
across academic disciplines
hard
pure
applied
soft
Physics History Engineering Education
hard
pure
applied
soft
Physics History Engineering Education
hard
Physics
pure
applied
soft
Physics History Engineering Education
hard
Physics
pure
applied
History
soft
Physics History Engineering Education
hard
Physics
Engineering
pure
applied
History
soft
Physics History Engineering Education
hard
Physics
Engineering
pure
applied
History
Education
soft
Characteristics of knowledge
domains
Characteristics of knowledge
domains
• ‘Soft-pure’ disciplines (e.g. History,
LitCrit): reiterative; holistic; concerned with
particulars, qualities, complication; goal =
understanding/interpretation.
Epistemology
phraseology
• If [my] general thesis … is tenable, one
would expect differences in fields of
knowledge to be reflected in differences in
linguistic form: and by the same token,
differences in linguistic form to signify
differences in fields of knowledge
(Becher 1987: 261).
Problem
Problem
• Difficult (impossible?) to draw up an a priori
list of language features that express such
concepts as reiterativeness, holism,
particularism etc.
Problem
• Difficult (impossible?) to draw up an a priori
list of language features that express such
concepts as reiterativeness, holism,
particularism etc.
• Even if we could, would it be a good idea to
do this?
Problem
• Difficult (impossible?) to draw up an a priori
list of language features that express such
concepts as reiterativeness, holism,
particularism etc.
• Even if we could, would it be a good idea to
do this?
• Need an inductive (corpus-driven) rather
than a deductive (corpus-based)
methodology
(±) Inductive approaches to
identifying phraseology
(±) Inductive approaches to
identifying phraseology
•
lexical bundles/clusters/chains/n-grams (Biber, Scott,
Stubbs, Fletcher)
(±) Inductive approaches to
identifying phraseology
•
lexical bundles/clusters/chains/n-grams (Biber, Scott,
Stubbs, Fletcher)
•
collocational frameworks (Renouf & Sinclair, Butler,
Luzon Marco)
(±) Inductive approaches to
identifying phraseology
•
lexical bundles/clusters/chains/n-grams (Biber, Scott,
Stubbs, Fletcher)
•
collocational frameworks (Renouf & Sinclair, Butler,
Luzon Marco)
•
chains-and-frames analysis (Mason)
(±) Inductive approaches to
identifying phraseology
•
lexical bundles/clusters/chains/n-grams (Biber, Scott,
Stubbs, Fletcher)
•
collocational frameworks (Renouf & Sinclair, Butler,
Luzon Marco)
•
•
chains-and-frames analysis (Mason)
collostructions (Stefanowitsch & Gries)
(±) Inductive approaches to
identifying phraseology
•
lexical bundles/clusters/chains/n-grams (Biber, Scott,
Stubbs, Fletcher)
•
collocational frameworks (Renouf & Sinclair, Butler,
Luzon Marco)
•
•
•
chains-and-frames analysis (Mason)
collostructions (Stefanowitsch & Gries)
concgrams (Cheng, Greaves & Warren)
(±) Inductive approaches to
identifying phraseology
•
lexical bundles/clusters/chains/n-grams (Biber, Scott,
Stubbs, Fletcher)
•
collocational frameworks (Renouf & Sinclair, Butler,
Luzon Marco)
•
•
•
•
chains-and-frames analysis (Mason)
collostructions (Stefanowitsch & Gries)
concgrams (Cheng, Greaves & Warren)
node, span and collocates (Sinclair)
(±) Inductive approaches to
identifying phraseology
•
lexical bundles/clusters/chains/n-grams (Biber, Scott,
Stubbs, Fletcher)
•
collocational frameworks (Renouf & Sinclair, Butler,
Luzon Marco)
•
•
•
•
•
chains-and-frames analysis (Mason)
collostructions (Stefanowitsch & Gries)
concgrams (Cheng, Greaves & Warren)
node, span and collocates (Sinclair)
keywords (Wordsmith, Antconc)
Why keywords?
Why keywords?
•
Insulation from researcher bias
Why keywords?
•
•
Insulation from researcher bias
Keywords algorithm does not rely on any
theory of language
Why keywords?
•
•
Insulation from researcher bias
•
Algorithm very good at selecting important
and interesting features (often features that
human researcher would never have
thought of looking at)
Keywords algorithm does not rely on any
theory of language
Keyword selection
Keyword selection
•
Procedure yields too many items for
analysis, so need to select
Keyword selection
•
Procedure yields too many items for
analysis, so need to select
•
Usual strategy: discard closed-class
‘grammatical’ words and proper nouns as a
first step, and then topslice or select from
remaining list of open-class items
Keyword selection
•
Procedure yields too many items for
analysis, so need to select
•
Usual strategy: discard closed-class
‘grammatical’ words and proper nouns as a
first step, and then topslice or select from
remaining list of open-class items
•
Alternative strategy (Gledhill 2000): discard
all open-class keywords and focus
exclusively on closed-class items
Why closed-class
keywords?
•
The cockroach argument
Why closed-class
keywords?
Why closed-class
keywords?
•
The coverage argument
Why closed-class
keywords?
•
•
The coverage argument
argue → argue that ...
Why closed-class
keywords?
•
The coverage argument
• argue →
• that →
argue that ...
Why closed-class
keywords?
•
The coverage argument
• argue →
• that →
•
argue that ...
argue that, claim that, state that, believe
that, maintain that ...
Why closed-class
keywords?
•
The coverage argument
• argue →
• that →
argue that ...
•
argue that, claim that, state that, believe
that, maintain that ...
•
fact that, idea that, belief that, notion
that ...
Why closed-class
keywords?
•
The coverage argument
• argue →
• that →
argue that ...
•
argue that, claim that, state that, believe
that, maintain that ...
•
fact that, idea that, belief that, notion
that ...
•
clear that, possible that, ...
Why closed-class
keywords?
•
Another coverage argument:
•
“By far the majority of text is made of the
occurrence of common words in common
patterns.” (Sinclair 1991: 108).
•
So not a good idea to exclude the
commonest words from the analysis
Why closed-class
keywords?
•
Yet another coverage argument:
•
Distribution of closed-class keywords
throughout a keyword list
Why closed-class
keywords?
Why closed-class
keywords?
•
The non-compositional argument:
Why closed-class
keywords?
•
The non-compositional argument:
•
“Most everyday words do not have an
independent meaning, or meanings, but are
components of a rich repertoire of multi-word
patterns that make up text” (Sinclair 1991:
108).
Why closed-class
keywords?
•
The non-compositional argument:
•
“Most everyday words do not have an
independent meaning, or meanings, but are
components of a rich repertoire of multi-word
patterns that make up text” (Sinclair 1991:
108).
•
What does possible mean?
Why closed-class
keywords?
•
The non-compositional argument:
•
“Most everyday words do not have an
independent meaning, or meanings, but are
components of a rich repertoire of multi-word
patterns that make up text” (Sinclair 1991:
108).
•
•
What does possible mean?
It’s possible that she didn’t get the message.
Why closed-class
keywords?
•
The non-compositional argument:
•
“Most everyday words do not have an
independent meaning, or meanings, but are
components of a rich repertoire of multi-word
patterns that make up text” (Sinclair 1991:
108).
•
•
•
What does possible mean?
It’s possible that she didn’t get the message.
It’s possible to leave a message.
Why closed-class
keywords?
Why closed-class
keywords?
•
The non-compositional argument:
Why closed-class
keywords?
•
The non-compositional argument:
•
“Most everyday words do not have an
independent meaning, or meanings, but are
components of a rich repertoire of multi-word
patterns that make up text” (Sinclair 1991:
108).
Why closed-class
keywords?
•
The non-compositional argument:
•
“Most everyday words do not have an
independent meaning, or meanings, but are
components of a rich repertoire of multi-word
patterns that make up text” (Sinclair 1991:
108).
•
What does possible mean?
Why closed-class
keywords?
•
The non-compositional argument:
•
“Most everyday words do not have an
independent meaning, or meanings, but are
components of a rich repertoire of multi-word
patterns that make up text” (Sinclair 1991:
108).
•
•
What does possible mean?
It’s possible + that = ‘maybe’
Why closed-class
keywords?
•
The non-compositional argument:
•
“Most everyday words do not have an
independent meaning, or meanings, but are
components of a rich repertoire of multi-word
patterns that make up text” (Sinclair 1991:
108).
•
•
•
What does possible mean?
It’s possible + that = ‘maybe’
It’s possible + to-inf = ‘do-able’
Why closed-class
keywords?
•
The semantic sequences argument
“recurring sequences of words and phrases that
may be very diverse in form and which are
therefore more usefully characterised as sequences
of meaning elements rather than as formal
sequences” (Hunston 2008: 271)
Semantic sequences
Semantic sequences
•
“In winter Hammerfest is a thirty-hour ride by bus
from Oslo” (Bryson, in Hoey 2004)
a half-hour drive
a four-hour flight
a two-week trip
a three-day journey
a two-hour hop
an eight-year slog
Semantic sequences
•
“In winter Hammerfest is a thirty-hour ride by bus
from Oslo” (Bryson, in Hoey 2004)
a half-hour drive
a four-hour flight
a two-week trip
a three-day journey
a two-hour hop
an eight-year slog
•
NUMBER + TIME + JOURNEY
Semantic sequences
•
“Hammerfest is a thirty-hour ride by bus from
Oslo”
•
“Ntobeye is a two-hour ride by four wheel drive
vehicle from the vast refugee camp at Ngara”
Semantic sequences
•
“Hammerfest is a NUMBER TIME JOURNEY by
bus from Oslo”
•
“Ntobeye is a NUMBER TIME JOURNEY by four
wheel drive vehicle from the vast refugee camp at
Ngara”
Semantic sequences
•
“Hammerfest is a NUMBER TIME JOURNEY by
bus from Oslo”
•
“Ntobeye is a NUMBER TIME JOURNEY by four
wheel drive vehicle from the vast refugee camp at
Ngara”
Semantic sequences
•
“PLACE is a NUMBER TIME JOURNEY by bus
from Oslo”
•
“PLACE is a NUMBER TIME JOURNEY by four
wheel drive vehicle from the vast refugee camp at
Ngara”
Semantic sequences
•
“PLACE is a NUMBER TIME JOURNEY by bus
from Oslo”
•
“PLACE is a NUMBER TIME JOURNEY by four
wheel drive vehicle from the vast refugee camp at
Ngara”
Semantic sequences
•
“PLACE is a NUMBER TIME JOURNEY by
MODE OF TRANSPORT from Oslo”
•
“PLACE is a NUMBER TIME JOURNEY by
MODE OF TRANSPORT from the vast refugee
camp at Ngara”
Semantic sequences
•
“PLACE is a NUMBER TIME JOURNEY by
MODE OF TRANSPORT from Oslo”
•
“PLACE is a NUMBER TIME JOURNEY by
MODE OF TRANSPORT from the vast refugee
camp at Ngara”
Semantic sequences
•
“PLACE is a NUMBER TIME JOURNEY by
MODE OF TRANSPORT from PLACE”
•
“PLACE is a NUMBER TIME JOURNEY by
MODE OF TRANSPORT from PLACE”
Semantic sequences
•
“DESTINATION is a NUMBER TIME JOURNEY
by MODE OF TRANSPORT from POINT OF
DEPARTURE”
•
“DESTINATION is a NUMBER TIME JOURNEY
by MODE OF TRANSPORT from POINT OF
DEPARTURE”
Semantic sequences
•
“DESTINATION is a NUMBER TIME JOURNEY
by MODE OF TRANSPORT from POINT OF
DEPARTURE”
•
“DESTINATION is a NUMBER TIME JOURNEY
by MODE OF TRANSPORT from POINT OF
DEPARTURE”
Methodology
Methodology
Methodology
•
Lists of KWs for each corpus generated through
an external comparison with BNC (written)
Methodology
•
Lists of KWs for each corpus generated through
an external comparison with BNC (written)
•
Comparing HistArt and LitArt against each other
would only reveal differences between them
Methodology
•
Lists of KWs for each corpus generated through
an external comparison with BNC (written)
•
Comparing HistArt and LitArt against each other
would only reveal differences between them
•
Comparing HistArt and LitArt against a reference
corpus of academic writing would only reveal
features unique to each corpus
Methodology
•
Lists of KWs for each corpus generated through
an external comparison with BNC (written)
•
Comparing HistArt and LitArt against each other
would only reveal differences between them
•
Comparing HistArt and LitArt against a reference
corpus of academic writing would only reveal
features unique to each corpus
•
Exhaustive qualitative concordance analysis of
multiple 100-line samples for each KW
Results of keywords analysis
Results of keywords analysis
•
19 words salient in both history and literary
criticism: among, and, as, between, beyond, both, in,
its, itself, neither, nor, of, such, the, themselves,
these, throughout, whose, within
Results of keywords analysis
•
19 words salient in both history and literary
criticism: among, and, as, between, beyond, both, in,
its, itself, neither, nor, of, such, the, themselves,
these, throughout, whose, within
•
13 discipline-specific words:
•
LitCrit: himself, his, is, might, one’s, though, upon,
which
•
History: against, did, during, their, were
How many samples do
you need?
How many samples do
you need?
Reiterativeness as text?
• the
sudden awakening of the
party might be interpreted as
a desire to give the regime a
less dictatorial aspect
• they
have been viewed too
readily as indicating a fixed
hostility
Reiterativeness as text?
Reiterativeness as text?
the sudden
awakening of the
party
might be interpreted
as
a desire to give
the regime a less
dictatorial aspect
they
have been viewed too
readily
as
indicating a fixed
hostility
Reiterativeness as text?
the sudden
awakening of the
party
might be interpreted
as
a desire to give
the regime a less
dictatorial aspect
they
have been viewed too
readily
as
indicating a fixed
hostility
Reiterativeness as text?
as
DISCIPLINARY ENTITY
the sudden
awakening of the
party
might be interpreted
as
a desire to give
the regime a less
dictatorial aspect
they
have been viewed too
readily
as
indicating a fixed
hostility
Reiterativeness as text?
DISCIPLINARY ENTITY
CONCEPTUALISING
PROCESS
as
the sudden
awakening of the
party
might be interpreted
as
a desire to give
the regime a less
dictatorial aspect
they
have been viewed too
readily
as
indicating a fixed
hostility
Reiterativeness as text?
DISCIPLINARY ENTITY
CONCEPTUALISING
PROCESS
as
CONCEPTUALISATION
the sudden
awakening of the
party
might be interpreted
as
a desire to give
the regime a less
dictatorial aspect
they
have been viewed too
readily
as
indicating a fixed
hostility
Reiterativeness as text?
Reiterativeness as text?
CONCEPTUALISER
CONCEPTUALISING DISCIPLINARY
PROCESS
ENTITY
we
can see
the moment
of
revelation
Mark Storey
describes
the scene
as CONCEPTUALISATION
as
a moment of
alienation and
misery
as
mundane
Reiterativeness as text?
Reiterativeness as text?
CONCEPTUALISING
PROCESS
of
DISCIPLINARY
ENTITY
as
CONCEPTUALISATION
Derrida's
conceptualisation
of
writing
as
a spatio-temporal
structure
patristic ideas
of
pilgrimage
as
moral reformation
Reiterativeness as text?
Reiterativeness as text?
ANALYST
I
Other authors
ANALYTICAL
FRAMING
PROCESS
want to
contextualize
have situated
DISCIPLINARY
ENTITY
Jonson's
troublesome
poem
it
within
CONTEXTUAL
FRAME
within
the physical and
cultural
environment of
early modern
London.
within
the galaxy of
federalist
movements in
Europe
Holism as text?
Holism as text?
SUBORDINATE
PHENOMENON
Esprit's own
evolution between
1956 and 1968
the opening
sequence of
Longo's Johnny
Mnemonic
DESCRIPTION OF
RELATIONSHIP
SUPERORDINATE
PHENOMENON
itself
highlights
the deStalinisation
crisis in France
itself
figures
the scopophilic
fetishism of
cinema
itself
Particularism as text? The case
of against in history
• Predicted:
• n against n
Venice took no part in the war against the
Normans
• v against n
extreme competition shaped policies that
discriminated against blacks.
• v n against n
alleged witches and their families also had
various strategies that they could employ to
defend themselves against rumours and formal
accusations of witchcraft.
Particularism as text? The case
of against in history
• Not predicted: background/backdrop
• Narrative:
•
deliberation took place against a changing
backdrop of military events
•
It was against this background that abortion
was discussed during the 1930s
•
What of the normative institutional culture
of charity to the dead, the background
against which Stoeckhlin's idiosyncratic
views were drawn?
Particularism as text? The case
of against in history
• Not predicted: background/backdrop
• Argumentative:
•
Boniface's emphasis on kingship is better
understood if viewed against the backdrop of
the rhetoric of just authority and good rule
that surrounded the conflict.
•
This description should also be seen against
the backdrop of a new guiding principle for
Nordic co-operation, termed `Nordic
usefulness' (nordisk nytte).
•
Belgium's `Europeanism' is similarly
incomprehensible unless seen against the
background of its internal dissensions.
The case of both in LitArt
The case of both in LitArt
•
both
Ellmann's Joyce transcends both politics and
contemporary history
The case of both in LitArt
•
both
Ellmann's Joyce transcends both politics and
contemporary history
•
16% of all instances of both in LitArt express ‘paradoxical’
meanings:
In his mind the bridge was both fact and ideal
Elizabeth Tudor was both the paragon and the antithesis of
the model female.
.... the newly defined social sphere, a space that is both
private and public.
Milford Haven is both unlocateable and a site of dislocation
Middleton manipulates the sexual economics that both
maintain and undermine the socio-economic status quo.
Wales figures for early modern England as that which is both
familiar and strange
Beyond Becher?
Beyond Becher?
• Analysis reveals phraseologies expressing
epistemological values of humanities:
reiterativeness, holism and particularism
Beyond Becher?
• Analysis reveals phraseologies expressing
epistemological values of humanities:
reiterativeness, holism and particularism
• Many common to both history and lit crit
(but different proportions and preferences)
Beyond Becher?
• Analysis reveals phraseologies expressing
epistemological values of humanities:
reiterativeness, holism and particularism
• Many common to both history and lit crit
(but different proportions and preferences)
•
Also identified: discipline-specific values that
do not easily fit Becher model
• History = dynamic; groups
• LitCrit = static; individual entities
within in literary criticism
within in literary criticism
within in history
Summary
Summary
• CCKW analysis identifies semantic
sequences that provide insights into
epistemologies of academic discourses
Summary
• CCKW analysis identifies semantic
sequences that provide insights into
epistemologies of academic discourses
• Very labour-intensive!
Summary
• CCKW analysis identifies semantic
sequences that provide insights into
epistemologies of academic discourses
• Very labour-intensive!
• But hard to see how such sequences could
be identified as efficiently and thoroughly
using other currently available methods.
Summary
• CCKW analysis identifies semantic
sequences that provide insights into
epistemologies of academic discourses
• Very labour-intensive!
• But hard to see how such sequences could
be identified as efficiently and thoroughly
using other currently available methods.
• CCKW analysis not restricted to academic
discourses - could be applied to any
specialized discourse for which a
representative corpus might be compiled.
Moral:
Moral:
Don’t ignore the little words!
Download