Word Vectors in the Eighteenth Century

advertisement
Word Vectors in the Eighteenth
Century
25 May 2016 | IPAM Workshop IV: Mathematical Analysis of Cultural Expressive Forms: Text
Data
Ryan Heuser | @quadrismegistus | heuser@stanford.edu
I. Introduction to Word Vectors
What is a vector?
"Vector" in Programming
= An array of numbers.
V(Virtue) = [0.024, 0.043, …]
"Vector" in Space
= A line with a direction and a length.
Traditional Word Vectors
Document-Term Matrix
V(Virtue) = [
appears 1023 times in Document 1,
appears 943 times in Document 2,
…
]
Traditional Word Vectors
Document-Term Matrix
Term-Term Matrix
V(Virtue) = [
appears 1023 times in Document 1,
appears 943 times in Document 2,
…
]
V(Virtue) = [
appears 343 times near Term 1
["Honor"],
appears 101 times near Term 2
["Truth"],
…
]
Traditional Word Vectors
Document-Term Matrix
Term-Term Matrix
V(Virtue) = [
appears 1023 times in Document 1,
appears 943 times in Document 2,
…
]
V(Virtue) = [
appears 343 times near Term 1
["Honor"],
appears 101 times near Term 2
["Truth"],
…
]
Problems with traditional word vectors
Matrix is too large: Word vectors defined
along thousands of dimensions, making
statistics difficult, and its representation of
semantic relationships too noisy.
Word Embedding Models
(word2vec)
V(Virtue) = [Neuron 1 fires 0.012, Neuron 2 fires -0.013, …] [50-500
dimensions]
"cat sat
on the
mat"
VALID
~100
artificial
"neurons"
"cat sat
song
the mat"
INVALID
Word Embedding Models
(word2vec)
V(Virtue) = [Neuron 1 fires 0.012, Neuron 2 fires -0.013, …] [50-500
dimensions]
~100
artificial
"neurons"
V(Queen) = V(King) + V(Woman) - V(Man)
V(Woman) – V(Man)
V(Woman) = [0.001, 0.12, -0.15, 0.1, …]
V(Man) = [0.0012, 0.13, 0.14, 0.1, ...]
V(Woman)
Human
being
Adult
Female
Noun
- V(Man)
Human
being
Adult
Male
Noun
=
V(Woman)
- V(Man)
0
0
FemaleMale
0
V(Woman-Man) = [-0.0002, -0.01, -0.29, 0,
...]
V(Queen) – V(King)
V(Queen) = [0.001, 0.5, -0.15, 0.1, …]
V(King) = [0.0012, 0.51, 0.14, 0.1, ...]
V(Queen)
Human
being
Monarc
h
Femal
e
Noun
- V(King)
Human
being
Monarc
h
Male
Noun
=
V(Queen)
- V(King)
0
0
Femal
e-Male
0
V(Queen-King) = [-0.0002, -0.01, -0.29, 0, ...]
V(Queen) – V(King) = V(Woman) – V(Man)
V(Queen) – V(King) = V(Woman) – V(Man)
=
V(Queen) = V(King) + V(Woman) – V(Man)
V(Queen) = V(King) + V(Woman) - V(Man)
V(Queen) = V(King) + V(Woman) - V(Man)
V(Queen) = V(King) + V(Woman) - V(Man)
V(Queen) = V(King) + V(Woman) - V(Man)
V(Queen) = V(King) + V(Woman) - V(Man)
A close reading of word vectors in the eighteenth century
Literary Analogy
Edward Young, Conjectures on Original
Composition: In a Letter to the Author of Sir
Charles Grandison (1759):
Riches are to Virtue
as
Learning is to Genius
A close reading of word vectors in the eighteenth century
Literary Analogy
Vector Analogy
Edward Young, Conjectures on Original
Composition: In a Letter to the Author of Sir
Charles Grandison (1759):
V(Virtue-Riches) +V(Learning)
=?
Riches are to Virtue
as
Learning is to Genius
A close reading of word vectors in the eighteenth century
Interpreting V(Virtue-Riches)
What does V(Virtue-Riches) mean?
If (binary) gender is expressed through
V(Woman-Man)…
A close reading of word vectors in the eighteenth century
Interpreting V(Virtue-Riches)
What does V(Virtue-Riches) mean?
If (binary) gender is expressed through
V(Woman-Man), what does V(Virtue-Riches)
express?
A close reading of word vectors in the eighteenth century
Interpreting V(Virtue-Riches)
V(Virtue)
Form of Value
Noun
Immaterial
Abstraction
Inherent/imma
nent to
individual
- V(Riches)
Form of Value
Noun
Material
Abstraction
Contingent/ext
erior to
individual
= V(VirtueRiches)
0
0
ImmaterialMaterial
0
InherentContingent
A close reading of word vectors in the eighteenth century
Interpreting V(Virtue-Riches)
What does V(Virtue-Riches) mean?
If (binaristic) gender is expressed through the
semantic axis of difference of V(WomanMan), what semantic axis does (VirtueRiches) express?
V(Virtue-Riches) ?
[Contingent/Material]

[Inherent/Immaterial]
A close reading of word vectors in the eighteenth century
Interpreting V(Virtue-Riches+Learning)
What does V(Virtue-Riches) mean?
If (binaristic) gender is expressed through the
semantic axis of difference of V(WomanMan), what semantic axis does (VirtueRiches) express?
V(Virtue-Riches) ?
[Contingent/Material]

[Inherent/Immaterial]
V(Virtue-Riches) + V(Learning) ?
Start from the semantic profile of V(Learning)
Move along the axis of V(Virtue-Riches):
[Contingent/Material]

[Inherent/Immaterial]
Public materiality of one form of writerly
attribute (Learning: academic, class-based)

Immaterial immanence of another
(inborn Genius)
II. Vector-Experiments
Vector Experiments: Corpus
ECCO-TCP
Number of texts and words
~2,350 texts published between 1700-99
~84 million words.
Number of texts by genre
1,250 prose texts (53%)
605 drama texts (26%), ~50% of which in
verse 498 poetry texts (21%).
Percentage of words by genre
Prose: 66%
Drama: 23%
Poetry’s 11%
Historical Distribution
Vector Experiments: Model
gensim
Python implementation of word2vec.
Settings:
5-word skip-grams (non-overlapping ngrams)
All other settings default.
--Download model at:
http://ryanheuser.org/data/word2vec.ECCOTCP.txt.zip
Load:
Vector Experiments: Model
gensim
Python implementation of word2vec.
Settings:
5-word skip-grams (non-overlapping ngrams)
All other settings default.
--Download model at:
http://ryanheuser.org/data/word2vec.ECCOTCP.txt.zip
Load:
Vector Experiment 1: ???
Vector Experiment 1: Semantic Fields in Vector Space
Semantic Cohort 1
Abstract Values
Moral Valuation
character, honour, conduct, respect, worthy
…
Social Restraint
gentle, pride, proud, proper, agreeable, …
Sentiment
heart, feeling, passion, bosom, emotion, …
Partiality
correct, prejudice, partial, disinterested, …
Literary Lab Pamphlet 4, Figure 8 (Heuser & LeKhac)
Vector Experiment 1: Semantic Fields in Vector Space
Semantic Cohort 1
Hard Seed
Action Verbs
see, come, go, came, look, let, looked, …
Body Parts
eyes, hand, face, head, hands, eye, arms, …
Physical Adjectives
round, hard, low, clear, heavy, hot, straight, …
Colors
white, black, red, blue, green, gold, grey, …
Locative Prepositions
out, up, over, down, away, back, through, …
Numbers
two, three, ten, thousand, four, five, hundred, …
Literary Lab Pamphlet 4, Figure 15 (Heuser & LeKhac)
Vector Experiment 1: Semantic Fields in Vector Space
Each word-vector in
the semantic fields is
compared with every
other to find the
cosine
distance
between them.
T-sne (dimensionality
reduction
algorithm)
applied to these vector
distances.
Colored
by
the
semantic field from
which
each
word
comes.
All fields' words
Vector Experiment 1: Semantic Fields in Vector Space
Each word-vector in
the semantic fields is
compared with every
other to find the
cosine
distance
between them.
T-sne (dimensionality
reduction
algorithm)
applied to these vector
distances.
Colored
by
the
semantic field from
which
each
word
comes.
Abstract Values
Vector Experiment 1: Semantic Fields in Vector Space
Each word-vector in
the semantic fields is
compared with every
other to find the
cosine
distance
between them.
T-sne (dimensionality
reduction
algorithm)
applied to these vector
distances.
Colored
by
the
semantic field from
which
each
word
comes.
Hard Seed
Vector Experiment 2: Operationalizing Abstractness as a Vector
Step 1: Find the
center of the vector
space occupied by a
range
of
abstract
words.
= V([Abstract words])
Vector Experiment 2: Operationalizing Abstractness as a Vector
Step 1: Find the
center of the vector
space occupied by a
range
of
abstract
words.
= V([Abstract words])
Step 2: Find the
center of the vector
space occupied by a
range of concrete
words.
= V([Concrete words])
Vector Experiment 2: Operationalizing Abstractness as a Vector
Step 1: Find the
center of the vector
space occupied by a
range
of
abstract
words.
= V([Abstract words])
Step 2: Find the
center of the vector
space occupied by a
range of concrete
words.
= V([Concrete words])
Step 3: Compute the
vector-difference
between them.
= V([Abstract words])
Vector Experiment 2: Operationalizing Abstractness as a Vector
The cosine similarity
between any wordvector and V(AbstractConcrete) can then be
computed.
Showing the top 1,000
most frequent words,
by part of speech.
Vector Experiment 2: Operationalizing Abstractness as a Vector
Abstractness in the
eighteenth-century,
according
to
V(Abstract-Concrete),
can be compared to
contemporary
measures
of
abstractness.
Data
drawn
from
Mechanical
Turk
study,
“Concreteness ratings
for
40
thousand
generally
known
English word lemmas"
(Brysbaert, Warriner,
Kuperman).
R^2=0.32
Vector Experiment 2: Operationalizing Abstractness as a Vector
Abstractness in the
eighteenth-century,
according
to
V(Abstract-Concrete),
can be compared to
contemporary
measures
of
abstractness.
Data
drawn
from
Mechanical
Turk
study,
“Concreteness ratings
for
40
thousand
generally
known
English word lemmas"
(Brysbaert, Warriner,
Kuperman).
R^2=0.32
Vector Experiment 2: Operationalizing Abstractness as a Vector
Abstractness in the
eighteenth-century,
according
to
V(Abstract-Concrete),
can be compared to
contemporary
measures
of
abstractness.
Data
drawn
from
Mechanical
Turk
study,
“Concreteness ratings
for
40
thousand
generally
known
English word lemmas"
(Brysbaert, Warriner,
Kuperman).
R^2=0.32
Vector Experiment 3: Networks of Abstraction
Step 1: Find the 1,000
most
frequent
"abstract"
singular
nouns,
where
"abstract" means a
cosine similarity > 0
on
V(AbstractConcrete).
Step 2: Draw a link
between two words if
their cosine similarity
> 0.7.
= A network of "slant"
synonymy: semantic
turns are allowed, but
the
degree
of
semantic swerve is
Vector Experiment 3: Networks of Abstraction
Step 1: Find the 1,000
most
frequent
"abstract"
singular
nouns,
where
"abstract" means a
cosine similarity > 0
on
V(AbstractConcrete).
Step 2: Draw a link
between two words if
their cosine similarity
> 0.7.
= A network of "slant"
synonymy: semantic
turns are allowed, but
the
degree
of
semantic swerve is
Vector Experiment 3: Networks of Abstraction
Step 1: Find the 1,000
most
frequent
"abstract"
singular
nouns,
where
"abstract" means a
cosine similarity > 0
on
V(AbstractConcrete).
Step 2: Draw a link
between two words if
their cosine similarity
> 0.7.
= A network of "slant"
synonymy: semantic
turns are allowed, but
the
degree
of
semantic swerve is
Vector Experiment 3: Networks of Abstraction
Step 1: Find the 1,000
most
frequent
"abstract"
singular
nouns,
where
"abstract" means a
cosine similarity > 0
on
V(AbstractConcrete).
Step 2: Draw a link
between two words if
their cosine similarity
> 0.7.
= A network of "slant"
synonymy: semantic
turns are allowed, but
the
degree
of
semantic swerve is
Vector Experiment 3: Networks of Abstraction
Step 1: Find the 1,000
most
frequent
"abstract"
singular
nouns,
where
"abstract" means a
cosine similarity > 0
on
V(AbstractConcrete).
Step 2: Draw a link
between two words if
their cosine similarity
> 0.7.
= A network of "slant"
synonymy: semantic
turns are allowed, but
the
degree
of
semantic swerve is
Theoretical Interlude: Word Vectors in the Eighteenth-Century
Formal Homology 1: Abstraction and Addition
Locke on Abstraction
Of the complex Ideas, signified by the
names Man, and Horse, leaving out but
those particulars in which they differ, and
retaining only those in which they agree, and
of
those,
making
a
new
distinct
complex Idea, and giving the name Animal to
it, one has a more general term, that
comprehends, with Man, several other
Creatures (An Essay Concerning Human
Understanding [1689], Chapter III, “Of
General Terms”).
Theoretical Interlude: Word Vectors in the Eighteenth-Century
Formal Homology 1: Abstraction and Addition
Locke on Abstraction
Locke's Abstraction as Vector Operation
Of the complex Ideas, signified by the
names Man, and Horse, leaving out but
those particulars in which they differ, and
retaining only those in which they agree, and
of
those,
making
a
new
distinct
complex Idea, and giving the name Animal to
it, one has a more general term, that
comprehends, with Man, several other
Creatures (An Essay Concerning Human
Understanding [1689], Chapter III, “Of
General Terms”).
V(Man) + V(Horse)
Magnifies the semantics Man and Horse,
outstripping—ab-stracting—the semantics
idiosyncratic to each.
= V(Man) + V(Horse)
= AVERAGE(V(0.2, 0.1), V(0.4, 0.2))
= V(0.3, 0.15)
≈ V(Animal)
Theoretical Interlude: Word Vectors in the Eighteenth-Century
Formal Homology 2: Contrast, Analogy, and Subtraction
Hume's "Of Simplicity and Refinement"
[W]e ought to be more on our Guard against
the Excess of Refinement than that of
Simplicity, because the former Excess is
both less beautiful, and more dangerous than
the latter. Tis a certain Rule, That Wit and
Passion are intirely inconsistent. When the
Affections are mov'd, there is no Place for
the Imagination. The Mind of Man being
naturally limited, 'tis impossible all its
Faculties can operate at once: And the more
any one predominates, the less Room is
there for the others to exert their Vigour.
Theoretical Interlude: Word Vectors in the Eighteenth-Century
Formal Homology 2: Contrast, Analogy, and Subtraction
Hume's "Of Simplicity and Refinement"
[W]e ought to be more on our Guard against
the Excess of Refinement than that of
Simplicity, because the former Excess is
both less beautiful, and more dangerous than
the latter. Tis a certain Rule, That Wit and
Passion are intirely inconsistent. When the
Affections are mov'd, there is no Place for
the Imagination. The Mind of Man being
naturally limited, 'tis impossible all its
Faculties can operate at once: And the more
any one predominates, the less Room is
there for the others to exert their Vigour.
Analogical Network
Type of
oppositio
n
Associated
with
Simplicity
Associated
with
Refinement
Vector
operation
Relative
outcomes
of excess
of either
(more)
Beautiful
(more)
Dangerous
V(BeautifulDangerous)
Aesthetic
faculties
Passion
Wit
V(PassionWit)
Mental
faculties
Affections
Imagination
V(Affections
Imagination)
Vector Experiment 4: Measuring Analogy through Vector Correlation
Ancient(s) <> Modern(s)
Beautiful <> Sublime
Body <> Mind
Comedy <> Tragedy
[Concrete words] <> [Abstract words]
Folly <> Wisdom
Genius <> Learning
Human <> Divine
Judgment <> Invention
Law <> Liberty
Marvellous <> Common
[Nonevaluative words] <> [Evaluative words]
Parliament <> King
Passion <> Reason
Pity <> Fear
Private <> Public
Queen <> King
Romances <> Novels
Ruin <> Reputation
Simplicity <> Refinement
Tradition <> Revolution
Tyranny <> Liberty
Virtue <> Honour
Virtue <> Riches
Virtue <> Vice
Whig <> Tory
Woman <> Man
Vector Experiment 4: Measuring Analogy through Vector Correlation
V(Simplicity-Refinement)
V(Virtue-Vice)
Statistics: R^2 = 0.41, Pearson correlation =
0.64
Interpretation: Simplicity is consistently
moralized in the period as greater than
Refinement; Refinement associated with the
effeminacy and decline supposed to have
happened to post-Augustan Rome.
Vector Experiment 4: Measuring Analogy through Vector Correlation
V(Simplicity-Refinement)
V(Human-Divine)
Statistics: R^2 = 0.30, Pearson correlation = 0.55
Interpretation: Divine simplicity vs. refined
humanity
Vector Experiment 4: Measuring Analogy through Vector Correlation
V(Simplicity-Refinement)
V(Ancient-Modern)
Statistics: R^2 = 0.23, Pearson correlation =
0.48
Interpretation: Ancients associated
simplicity; refinement associated
moderns and modernity.
with
with
Vector Experiment 4: Measuring Analogy through Vector Correlation
V(Simplicity-Refinement)
V(Woman-Man)
Statistics: R^2 = 0.17, Pearson correlation =
0.42
Interpretation: V(Simplicity-Refinement) is
gendered, with women associated with
Simplicity and men associated with
Refinement.
Vector Experiment 4: Measuring Analogy through Vector Correlation
V(Simplicity-Refinement)
Step 1: Train a
word2vec
(5skipgram) on each
quarter-century
of
texts in ECCO-TCP.
Step 2: Measure the
correlation (Pearson)
between V(SimplicityRefinement) and other
selected vectors, in
each quartercentury
model.
Vector Experiment 4: Measuring Analogy through Vector Correlation
V(Simplicity-Refinement)
Step 1: Train a
word2vec
(5skipgram) on each
quarter-century
of
texts in ECCO-TCP.
Step 2: Measure the
correlation (Pearson)
between V(SimplicityRefinement) and other
selected vectors, in
each quartercentury
model.
Vector Experiment 4: Measuring Analogy through Vector Correlation
Analogical Network
Step 1: Add all
manually-selected
semantic contrasts as
nodes.
Step 2: Link nodes
(semantic contrasts)
when the R^2 of their
linear regression is >
0.1.
Step 3: Color edges
by
red
(negative
correlation) and blue
(positive correlation.
Step 4: Size nodes by
betweenness
Vector Experiment 4: Measuring Analogy through Vector Correlation
Analogical Network
Step 1: Add all
manually-selected
semantic contrasts as
nodes.
Step 2: Link nodes
(semantic contrasts)
when the R^2 of their
linear regression is >
0.1.
Step 3: Color edges
by
red
(negative
correlation) and blue
(positive correlation.
Step 4: Size nodes by
betweenness
Vector Experiment 4: Measuring Analogy through Vector Correlation
Analogical Network
Step 1: Add all
manually-selected
semantic contrasts as
nodes.
Step 2: Link nodes
(semantic contrasts)
when the R^2 of their
linear regression is >
0.1.
Step 3: Color edges
by
red
(negative
correlation) and blue
(positive correlation.
Step 4: Size nodes by
betweenness
Vector Experiment 4: Measuring Analogy through Vector Correlation
Analogical Network
Step 1: Add all
manually-selected
semantic contrasts as
nodes.
Step 2: Link nodes
(semantic contrasts)
when the R^2 of their
linear regression is >
0.1.
Step 3: Color edges
by
red
(negative
correlation) and blue
(positive correlation.
Step 4: Size nodes by
betweenness
Vector Experiment 4: Measuring Analogy through Vector Correlation
Analogical Network
Step 1: Add all
manually-selected
semantic contrasts as
nodes.
Step 2: Link nodes
(semantic contrasts)
when the R^2 of their
linear regression is >
0.1.
Step 3: Color edges
by
red
(negative
correlation) and blue
(positive correlation.
Step 4: Size nodes by
betweenness
End
Theoretical Interlude: Word Vectors in the Eighteenth-Century
Vector Operations and Virtual Concepts
Abstracted Vectors [Locke]
Might have a direct expression in language as
a word ("animal").
But also might not: V(Virtue+Riches) might
mean something like "forms of value".
Thus V(Virtue+Riches) is a virtual concept in its
eighteenth-century sense: "Having the efficacy
without the sensible or material part"
(Johnson).
Virtual concept that might be called an
abstracted concept, expressed by an
abstracting vector operation.
Critical Note: Abstracted concepts are often
invisibly and ideologically efficacious as
Theoretical Interlude: Word Vectors in the Eighteenth-Century
Vector Operations and Virtual Concepts
Abstracted Vectors [Locke]
Spectral Vectors [Hume]
Might have a direct expression in language as
a word ("animal").
Might have a direct expression in language
("gender", [in 18C] "sex").
But also might not: V(Virtue+Riches) might
mean something like "forms of value".
But also might not: V(Virtue-Riches) might
mean something like "the semantic spectrum
between immanent spirituality and public
materiality." Not exactly a single word.
Thus V(Virtue+Riches) is a virtual concept in its
eighteenth-century sense: "Having the efficacy
without the sensible or material part"
(Johnson).
Virtual concept that might be called an
abstracted concept, expressed by an
abstracting vector operation.
Critical Note: Abstracted concepts are often
invisibly and ideologically efficacious as
Thus (Virtue-Riches) is a virtual concept that
might be called a spectral concept.
Critical Note: Spectral in both senses: as a
spectrum and as a specter. The spectral
concept of V(Woman-Man) enacts a
binaristic, misogynist ideology of gender. And
the spectral concept of V(Black-White)
enacts a distinctly American binaristic racist
Vector Experiment 1: Semantic Fields in Vector Space
Each word-vector in
the semantic fields is
compared with every
other to find the
cosine
distance
between them.
Each node/word is
linked to the three
nodes/words closest
to it in the vector
space,
among
all
words in the semantic
fields.
Vector Experiment 1: Semantic Fields in Vector Space
Each node/word in the
semantic fields is now
linked to its three closest
words in the vector space,
even if those words are
not included in the
semantic fields.
New nodes/words brought
into the network in this
first step, are themselves
also connected to their
closest words in the
vector space.
Gray words/nodes are not
included in the fields.
Download