Constructions in Wonderland

Constructions in Wonderland:
Exploring the functionality of constructions through N-grams
Functionality of Language | Sprogets Funktionalitet
Aalborg Unviersity
10 December, 2014
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Introduction
• Is there a way to (semi-)automatically identify
constructions in texts, discourses, and corpora?
• Could N-gram analysis and N-gram based network
analysis be ways to do that?
• How can (semi-)automatic identification of constructions
help us learn about the functional contributions of
constructions in discourse?
• To this end, we will analyze two literary classics and four
political speeches.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Introduction
Outline
• Theoretical framework
•
•
•
•
Method
•
•
•
•
•
Positioning of paper
(Very) basic principles of construction grammar
Functionality of constructions
Data collection and methods
Wordclouds
N-grams
Network analysis
Analyses
•
•
•
Word clouds
N-grams
Networks
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Theoretical grounding
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Theoretical and methodological
positioning of this paper
• Theoretical positioning
• Construction grammar (e.g. Goldberg 1995; Croft 2001)
• Cognitive poetics/stylistics (e.g. Stockwell 2002)
• Cognitive discourse analysis (e.g. Hart 2013)
• Methodological positioning
• Corpus stylistics (e.g. Semino & Short 2004)
• Corpus-aided discourse studies (e.g. Baker 2012)
• Text-mining (e.g. Miner et al. 2012)
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Constructions
In construction grammar, a construction is a functional unit that pairs form and
semantic and/or discourse-pragmatic function (Goldberg 1995, 2006; Croft
2001, 2005; Hilpert 2014).
Examples from English - [form]/[function] (cf. Langacker 1987):
•
[S V IO DO]/[TRANSFER OF POSSESSION] (Goldberg 1995)
•
[X BE so Y that Z]/[SCALAR CAUSATION] (Bergen & Binsted 2004)
•
[you don't want me to V]/[THREATENING SPEECH ACT] (Martínez 2013)
•
[to begin with]/[INTRODUCTION OF LIST OF ITEMS] (Lipka & Schmid 1994)
•
[PROacc CLinf (or NP)]/[DISBELIEF TOWARDS PROPOSITION] (Lambrecht 1990)
•
'What, me worry?!', 'Him a doctor?!' etc.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Constructions
•
Constructions may be schematic, substantive (fixed), or something inbetween (Fillmore et al. 1988).
•
Constructions may be atomic/simple, complex, or something inbetween and form a lexicon-syntax continuum (e.g. Goldberg 1995,
Croft 2001).
•
Language competence is an inventory of constructions (aka. the
construct-i-con) of varying degrees of abstraction which are instantiated
in language use (e.g. Goldberg 1995).
•
In most contemporary incarnations of construction grammar, the
construct-i-con is usage-based (e.g. Croft 2001).
•
Constructions are subject to general human cognitive processes and
principles, such that language is not a separate, autonomous cognitive
faculty; thus, construction grammar is part of the overall endeavor of
cognitive linguistics.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
The discursive and stylistic
functionality of constructions
•
If constructions are functional units (pairings of form and meaning/function),
then they logically must contribute to discourse as part of a speaker's
linguistic repertoire. Here are two examples:
•
Writers of fiction may use constructions in...
•
•
descriptions of actions and happenings;
•
characterizations (Culpeper 2009) and mind styles (Fowler 1977) by having characters
use certain constructions in their dialog and narrative, or by using certain constructions
in the descriptions of characters or of their actions;
•
in setting up the text-world and specifying temporal relations in the narrative;
•
foregrounding, deviation, parallelism etc. (e.g. Short & Leech 2007)
In political speeches, speakers may use constructions in...
•
framing of issues and other ideologically based representations;
•
organization of topics/issues;
•
rhetorical strategies (including parallelisms etc.).
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Method
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Data collection and methods
• Text data
• Alice's Adventures in Wonderland by Lewis Carroll (1865),
obtained via Gutenberg Project = AW
• Adventures of Huckleberry Finn by Mark Twain (1884), obtained
via Gutenberg Project = HF
• Inaugural speeches by US Presidents, obtained via Bartleby
• Text mining and corpus-linguistic methods
• Wordclouds (R package ‘wordcloud’)
• N-grams (R package ‘tau’, AntConc)
• Network analysis (R package ‘igraph’)
• Concordances (AntConc)
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Wordclouds
• Wordclouds are graphical representations of the lexical
texture of a text, based on frequencies.
• They are visual versions of frequency lists.
• ... except they do not provide any information on
frequencies.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams
•
An N-gram is a string of words that co-occur frequently in a data set (such
as a corpus or a text).
•
N-grams are specified in accordance with the number of words in the string
in question (N = number)
• Monogram (1-gram) = one word, bigram (2-gram) = two words, trigram
(3-gram) = three words, four-gram (4-gram) = four words, five-gram (5gram) = five words etc.
• Examples = 'damn you' (2-gram), 'what the hell' (3-gram), 'pick up the
phone' (4-gram)
(e.g. Stubbs 2009)
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams
•
N-gram retrieval is a text mining technique used in
the identification of N-grams in a data set, text, or
corpus.
• How it works (in a nutshell): ask a computer to
find strings of N words, and it returns a list of
N-grams ranked in terms of frequency.
• An example: 4-grams in all of Shakespeare's
plays
• Find all instances of word + word +
word + word combinations in the
collective body of Shakespeare's plays.
• Calculate frequency of word + word +
word + word combinations in the
collective body of shakespeare's plays.
• List the word + word + word + word
combinations in terms of frequency in the
collective body of shakespeare's plays.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams
•
N-gram retrieval is a text mining technique used in
the identification of N-grams in a data set, text, or
corpus.
• How it works (in a nutshell): ask a computer to
find strings of N words, and it returns a list of
N-grams ranked in terms of frequency.
• An example: 4-grams in all of Shakespeare's
plays
• Find all instances of word + word +
word + word combinations in the
collective body of Shakespeare's plays.
• Calculate frequency of word + word +
word + word combinations in the
collective body of shakespeare's plays.
• List the word + word + word + word
combinations in terms of frequency in the
collective body of shakespeare's plays.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams
•
N-gram analysis can be used for a wide range of
things:
•
Identification of "aboutness" of a text or
discourse.
•
Phraseological analysis.
•
Probablistic language modeling.
•
Analysis of aspects of style, genre and
register.
•
Identification of various types of anonymous
writers.
•
Identification of constructions and their
functionality in a text or discourse.
•
Etc.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis
• Network analysis sets up data points as nodes in a network and
calculates the strengths of association between them.
• As a text mining method, it represents word types (as opposed to
tokens) in a text as nodes and, based on N-gram relations, sets up
relations between the nodes, or words.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Analysis
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Wordclouds: a first look
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Wordclouds
Wordcloud HF
things
think
left
help water
people
want
comes
bed
lot
yet
now way
say
tell couldnt
nothing little going
dat
put
find
thing
money
duke
time good
done
back
made
night
look
place kind
run
mind
enough
hadnt
dark
talk
three
away
didnt
found
raft
take
long
ever
two
hear
old get
well
king
half
got
ill
looked
pap
wont
set
like one
man
see jim
house
laid
said
just
warnt
says
big
reckoned
can
wouldnt
thats head reckon first
along around
home
hands
took
right
know
dont
never
last
aint come went door
mile
Yoshikata Shibuya
Kyoto University of Foreign Studies
prettytold make day
tom
still
low
cook
let
river
three
knew
please
indeed
question
tried
word
long hand
heard mad
made
right youre
nigger
give
town
curious
certainly remark
anxiously
cat
come
upon head king
soon
used
nothing
mean
high
seen table words
mouse
air
find
enough
dear
caterpillar
white added
good
arm
ive
way
alic
time
duchess
beautiful
house
left
begin
hare
back
even
gryphon will
theres
going
wouldnt grow
cats
suddenly
thought
take
replied
much
far
jury
put
day
must
little
dont
thing see
took ever
old
found
garden
get
like went
know one think
now
alice
last
didnt eat
wonder
perhaps
room another
felt first
seemed
still
course
make
turtl
ran
queen
said
yet looked
dodo half two say
came
got well
tell
began
just
mock
end
voice
dormouse
better gave
try
round
quite
chapter voic
yes size
hastily
however turned
ask littl
behind
cant away sure
change
set
done
rather wish hear
sea sat use things
getting
talking
doesnt moment idea next might poor
spoke
wont marchnever thats large till
bit
sort may
rabbit
near without turtle
side
ill hatter door looking
eyes great
kept
can
look
let
shall
anything
talk
tone
soon
much
saying
feet
close
among
tea
beginning something speak
cried court saw
minute soup
till lay
Wordcloud AW
Kim Ebensgaard Jensen
Aalborg University
Wordclouds
Wordcloud HF
hands
things
think
left
help water
people
want
comes
lot
yet
mind
thing
money
find
enough
hadnt
now way
time good
kind look back done
No details (e.g. place
frequency)
run dukenight put made
mile
tell couldnt
nothing little going
dat
say
two
hear
dark
three
Informative to someaway
extent
didnt
found
talk
bed
ever
get
raft
take
long
king old
Visually attractive
well
half
ill
looked
pap
wont
set
like one
man
see jim
got
house
laid
said
just
warnt
says
big
reckoned
can
wouldnt
thats head reckon first
curious
along around
home
tom
still
low
Yoshikata Shibuya
Kyoto University of Foreign Studies
mean
cook
let
prettytold make day
took
right
know
dont
never
last
aint come went door
river
three
knew
please
indeed
question
tried
word
long hand
heard mad
made
right youre
nigger
give
town
cat
come
upon head king
soon
used
nothing
certainly remark
anxiously
mouse
air
find
high
seen table words
enough
dear
caterpillar
white added
good
arm
ive
way
alic
time
duchess
beautiful
house
left
begin
hare
back
even
gryphon will
theres
going
wouldnt grow
cats
suddenly
thought
take
replied
much
far
jury
put
day
must
little
dont
thing see
took ever
old
found
garden
get
like went
know one think
now
alice
last
didnt eat
wonder
perhaps
room another
felt first
seemed
still
course
make
turtl
ran
queen
said
yet looked
dodo half two say
came
got well
tell
began
just
mock
end
voice
dormouse
better gave
try
round
quite
chapter voic
yes size
hastily
however turned
ask littl
behind
cant away sure
change
set
done
rather wish hear
sea sat use things
getting
talking
doesnt moment idea next might poor
spoke
wont marchnever thats large till
bit
sort may
rabbit
near without turtle
side
ill hatter door looking
eyes great
kept
can
look
let
shall
anything
talk
tone
soon
much
saying
feet
close
among
tea
beginning something speak
cried court saw
minute soup
till lay
Wordcloud AW
Kim Ebensgaard Jensen
Aalborg University
N-grams in AW and HF
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams in AW
2-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
3-grams
4-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in AW
2-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
3-grams
4-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in AW
2-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
3-grams
4-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in AW
2-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
3-grams
4-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in AW
2-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
3-grams
4-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in AW
2-grams
3-grams
4-grams
speech / dialog
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams in AW
2-grams
3-grams
definite NP
4-grams
speech / dialog
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams in AW
2-grams
3-grams
definite NP
4-grams
speech / dialog
'said'
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams in AW
3-grams
2-grams
speech / dialog
definite NP
4-grams
[DIALOG said NPdef/unique]/[TOPICALIZATION
OF DIALOG]
'said'
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams in HF
2-grams
3-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
4-grams
5-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in HF
2-grams
3-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
4-grams
5-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in HF
2-grams
3-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
4-grams
5-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in HF
2-grams
3-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
4-grams
5-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in HF
2-grams
3-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
4-grams
5-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in HF
2-grams
3-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
4-grams
5-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in HF
2-grams
3-grams
4-grams
5-grams
productive
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams in HF
2-grams
3-grams
productive
Yoshikata Shibuya
Kyoto University of Foreign Studies
4-grams
5-grams
less productive
Kim Ebensgaard Jensen
Aalborg University
N-grams in HF
2-grams
3-grams
productive
4-grams
5-grams
less productive
Two different constructions
[it BEn't no X]
[there BEn't no X]
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams in HF
2-grams
3-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
4-grams
5-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in HF
2-grams
3-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
4-grams
5-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in HF
2-grams
3-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
4-grams
5-grams
Kim Ebensgaard Jensen
Aalborg University
N-grams in HF
2-grams
3-grams
4-grams
5-grams
Event relating constructions
used to temporally structure events
in the narrative:
[X and then Y]/[EVENT1 FOLLOWED BY EVENT2]
[by and by X]/[EVENT HAPPENING AFTER SOME TIME]
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-gram analysis: pros and cons 1
Pro:
• Simple N-gram analysis can help us identify and address constructions and
their functionality in one text or discourse, as it identifies frequent
combinations of words in the text or discourse in question.
Con: Problem #1:
• What simple N-gram analysis does not tell us is whether or not those
frequent combination of words can also be found in other text.
• Our N-gram analyses of AW and HF do not tell us if the findings are actually
just general patterns in English or if they are actually good delineations of
the texts.
Solution to problem #1:
• To obtain a list of N-grams that really delineate a given text (so that we can
identify what constructions are characteristically associated with the text), a
comparative analysis can be useful.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams: AW vs. HF
• Note that here we focus on 2-grams
• Procedures
1. 2-gram analysis of AW and HF
2. Normalization of frequencies for AW and HF: per
10,000 words
3. Comparison between 2-grams of AW and those of HF
with Fisher’s exact test
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams: AW vs. HF
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-grams: AW vs. HF
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Other texts: a quick look at inaugural
speeches
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
What about other texts?
Political texts: inaugural speeches by US Presidents
• Procedures
1. 2-gram analysis of 4 inaugural speeches by George Bush (GB), Bill Clinton
(BC), George W. Bush (GWB), and Barack Obama (BO).
2. Normalization: per 1,000 words
3. Fisher’s exact test
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Results
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Results
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-gram analysis: pros and cons 2
•
Pros
• N-grams can help us identify and address constructions and their
functionality in one text or discourse
•
Comparative N-gram analysis can help us find N-grams that delineate texts or
discourses
• So far, we have found 2-grams that delineate AW and HF, and also some political
speeches.
•
Cons: Problem #2:
•
•
•
•
Shorter N-grams are embedded in longer N-grams (2-grams can be found inside
some of the 3-grams and 4-grams).
Consequently, our N-gram lists contain some redundancy.
Our comparative N-gram analyses focused on 2-grams, but texts contain longer
strings of words (3-grams, 4-grams, etc) and also shorter N-grams (1-grams).
From the perspective of construction grammar, we should also talk about those
longer/shorter N-grams, like we did in our isolated N-gram analyses.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
N-gram analysis: pros and cons 2
• Solution to problem #2
• We can use the methods we have used 2-grams with other Ngrams, but is there any simpler way to find both short and long
N-grams?
• That is, can we find 1-grams, 2-grams, 3-grams, 4-grams, etc. all
at one time?
• Is there any way to provide descriptively a more efficient analysis
on frequently co-occurring combinations of words
(constructions)?
• Our suggestion here is to use network analysis based on 2grams.
• Other N-grams emerge in the network, as a network includes all
N-grams.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of AW and HF
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of AW
Top 96 2-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of AW
A closer look at some N-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of AW
A closer look at some N-grams:
•
'said the march hare' (4gram)
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of AW
A closer look at some N-grams:
•
'said the march hare' (4gram)
•
'said the mock turtle' (4gram)
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of AW
A closer look at some N-grams:
•
'said the march hare' (4gram)
•
'said the mock turtle' (4gram)
•
'said the king' (3-gram)
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of AW
A closer look at some N-grams:
•
'said the march hare' (4gram)
•
'said the mock turtle' (4gram)
•
'said the king' (3-gram)
•
'said the caterpillar' (3gram)
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of AW
A closer look at some N-grams:
•
'said the march hare' (4gram)
•
'said the mock turtle' (4gram)
•
'said the king' (3-gram)
•
'said the caterpillar' (3gram)
•
'said the cat' (3-gram)
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of AW
A closer look at some N-grams:
•
'said the march hare' (4gram)
•
'said the mock turtle' (4gram)
•
'said the king' (3-gram)
•
'said the caterpillar' (3gram)
•
'said the cat' (3-gram)
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of AW
A closer look at some N-grams:
•
'said the march hare' (4gram)
•
'said the mock turtle' (4gram)
•
'said the king' (3-gram)
•
'said the caterpillar' (3gram)
•
'said the cat' (3-gram)
•
'said alice' (2-gram)
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of AW
A closer look at some N-grams:
•
'said the march hare' (4gram)
•
'said the mock turtle' (4gram)
•
'said the king' (3-gram)
•
'said the caterpillar' (3gram)
•
'said the cat' (3-gram)
•
'said alice' (2-gram)
[the N]/[DEFINITE NOMINAL REFERENCE]
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of AW
A closer look at some N-grams:
•
'said the march hare' (4gram)
•
'said the mock turtle' (4gram)
•
'said the king' (3-gram)
•
'said the caterpillar' (3gram)
•
'said the cat' (3-gram)
[DIALOG
said
NPdef/unique
]/[TOPICALIZATION
• 'said
alice'
(2-gram)
OF DIALOG]
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of HF
Top 99 2-grams
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of HF
Negation constructions
Including double negation
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis of HF
Negation constructions
Including double negation
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis (HF)
Two interesting
2-grams of 'i'
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis (HF)
Two interesting
2-grams of 'i'
•
'i reckon'
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis (HF)
Two interesting
2-grams of 'i'
•
'i reckon'
•
'says i'
•
'i says'
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis (HF)
Two interesting
2-grams of 'i'
•
'i reckon'
•
'says i'
•
'i says'
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis (HF)
Two interesting
2-grams of 'i'
•
'i reckon'
•
'says i'
•
'i says'
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis (HF)
Three N-grams
of 'and':
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis (HF)
Three N-grams
of 'and':
•
'and then'
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis (HF)
Three N-grams
of 'and':
•
'and then'
•
'by and by'
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis (HF)
Three N-grams
of 'and':
•
'and then'
•
'by and by'
•
'and so'
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis (HF)
Three N-grams
of 'and':
•
'and then'
•
'by and by'
•
'and so'
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis (HF)
Three N-grams
of 'and':
•
'and then'
•
'by and by'
•
'and so'
[X and so Y]/[EVENT1 AMOUNTS TO EVENT2]
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Network analysis
•
At the end of the day, short and long N-grams can be found automatically.
•
A list of N-grams can be used to compute p.values with Fisher’s exact test,
for instance.
•
Then, what’s good about using network analysis?
•
Firstly, it allows us to capture several N-gram types simulteneously without too much
redundancy.
•
Secondly, the real advantage that network analysis offers is not just its visual effects,
but in fact it tells you a lot about the internal functionality of the network.
•
For instance, it is possible to compute the centrality of nodes in a network. This
method provides estimates regarding the relationship between a network and the
functionality of nodes in it.
•
A simple N-gram analysis does not provide answers to these issues.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Centrality
•
Betweenness centrality: Dehmer & Subhash C. Basak (2012: 70-1)
... the betweenness centrality is based on the shortest path between nodes.
... The betweenness centrality focuses on the number of visits through the
shortest paths. If a walker moves from one node to another node via the
shortest path, then the nodes with a large number of visits by the walker
may have high centrality.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Centrality
•
Computing betweenness centrality: Top 15
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Concluding remarks
•
Can N-grams be used as means of identifying constructions?
•
•
Is network analysis useful in the identification of constructions?
•
•
Yes, partially.
Yes, partially.
What about functionality?
•
Neither N-grams nor N-gram based networks tell us much about functionality, as they
show us purely formal relations.
•
However, they guide us in terms of connections between words that are salient in a
given text and may be indicative of constructional as functional units.
•
We can then look at the discursive behavior of such N-grams and extrapolate
constructions and their functionality in the text or discourse (and, depending on the
corpus, in general).
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University
Bibliography
Baker, P. (2012). Acceptable bias? Using corpus linguistic methods withcritical discourse analysis. Discourse Studies, 9(3): 247-256.
Bergen, B. & K. Binsted (2004). The cognitive linguistics of scalar humor. In M. Achard & S. Kemmer (eds.). Language, Culture, and Mind. Stanford, CA: CSLI.
79-91.
Croft, W. A. (2001). Radical Construction Grammar: Syntactic Theory in Typological Perspective. Oxford: Oxford University Press.
Culpeper, J. (2009). In G. Brône & J. Vandaele (eds). Cognitive Poetics: Goals, Gains and Gaps. Berling: Mouton de Gruyter.
Dehmer, M. & S. C. Basak (2012). Statistical and Machine Learning Approaches for Network Analysis. Chichester: Wiley-Blackwell.
Fillmore, C, P. Kay and M. C. O'Connor (1988). Regularity and idiomaticity in grammatical constructions: The case of let alone. Language, 64: 501–38.
Fowler, R. (1977). Linguistics and the Novel. London: Methuen.
Hart, C. (2013). Constructing contexts through grammar: Cognitive models and conceptualisation in British newspaper reports on political protests. In J.
Flowerdew (ed.), Discourse in Context. London: Bloomsbury. 159-183.
Goldberg, A. (1995). Constructions: A Construction Grammar Approach to Argument Structure. Chicago: Chicago University Press.
Lambrect, K. (1990). "What, me worry?": Mad Magazine sentences revisited. BLS, 16: 215-228.
Langacker, R. (1987). Foundations of Cognitive Grammar. Vol. 1: Theoretical Prerequisites. Stanford, CA: Stanford University Press.
Lipka, L. & H.-J. Schmid (1994). To begin with: Degrees of idiomaticity, textual functions and pragmatic exploitations of a fixed expression. ZAA, 42: 6-15.
Martínez, N. D. C. (2013). Illocutionary Construtions in English: Cognitive Motivation and Linguistic Realization. Bern: Peter Lang.
Miner, G., J. Elder, T. Hill, R. Nisbet, D. Delen & A. Fast,. (2012). Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications.
Elsevier Academic Press.
Semino, E. & M. Short (2004). Corpus Stylistics: Speech, Writing and Thought Presentation in a Corpus of English Writing. Londong: Routledge.
Short M. & G. Leech (2007). A Linguistic Introduction to English Fictional Prose (2nd ed.). Harlow: Pearson Longman.
Stockwell, P. (2002). Cognitive Poetics. London: Routledge.
Stubbs, M. (2009). Technology and phraseology. In U. Römer & R. Schulze (eds.), Exploring the Lexis-Grammar Interface. Amsterdam: John Benjamins. 1531.
Yoshikata Shibuya
Kyoto University of Foreign Studies
Kim Ebensgaard Jensen
Aalborg University