Word Vectors in the Eighteenth Century 25 May 2016 | IPAM Workshop IV: Mathematical Analysis of Cultural Expressive Forms: Text Data Ryan Heuser | @quadrismegistus | heuser@stanford.edu I. Introduction to Word Vectors What is a vector? "Vector" in Programming = An array of numbers. V(Virtue) = [0.024, 0.043, …] "Vector" in Space = A line with a direction and a length. Traditional Word Vectors Document-Term Matrix V(Virtue) = [ appears 1023 times in Document 1, appears 943 times in Document 2, … ] Traditional Word Vectors Document-Term Matrix Term-Term Matrix V(Virtue) = [ appears 1023 times in Document 1, appears 943 times in Document 2, … ] V(Virtue) = [ appears 343 times near Term 1 ["Honor"], appears 101 times near Term 2 ["Truth"], … ] Traditional Word Vectors Document-Term Matrix Term-Term Matrix V(Virtue) = [ appears 1023 times in Document 1, appears 943 times in Document 2, … ] V(Virtue) = [ appears 343 times near Term 1 ["Honor"], appears 101 times near Term 2 ["Truth"], … ] Problems with traditional word vectors Matrix is too large: Word vectors defined along thousands of dimensions, making statistics difficult, and its representation of semantic relationships too noisy. Word Embedding Models (word2vec) V(Virtue) = [Neuron 1 fires 0.012, Neuron 2 fires -0.013, …] [50-500 dimensions] "cat sat on the mat" VALID ~100 artificial "neurons" "cat sat song the mat" INVALID Word Embedding Models (word2vec) V(Virtue) = [Neuron 1 fires 0.012, Neuron 2 fires -0.013, …] [50-500 dimensions] ~100 artificial "neurons" V(Queen) = V(King) + V(Woman) - V(Man) V(Woman) – V(Man) V(Woman) = [0.001, 0.12, -0.15, 0.1, …] V(Man) = [0.0012, 0.13, 0.14, 0.1, ...] V(Woman) Human being Adult Female Noun - V(Man) Human being Adult Male Noun = V(Woman) - V(Man) 0 0 FemaleMale 0 V(Woman-Man) = [-0.0002, -0.01, -0.29, 0, ...] V(Queen) – V(King) V(Queen) = [0.001, 0.5, -0.15, 0.1, …] V(King) = [0.0012, 0.51, 0.14, 0.1, ...] V(Queen) Human being Monarc h Femal e Noun - V(King) Human being Monarc h Male Noun = V(Queen) - V(King) 0 0 Femal e-Male 0 V(Queen-King) = [-0.0002, -0.01, -0.29, 0, ...] V(Queen) – V(King) = V(Woman) – V(Man) V(Queen) – V(King) = V(Woman) – V(Man) = V(Queen) = V(King) + V(Woman) – V(Man) V(Queen) = V(King) + V(Woman) - V(Man) V(Queen) = V(King) + V(Woman) - V(Man) V(Queen) = V(King) + V(Woman) - V(Man) V(Queen) = V(King) + V(Woman) - V(Man) V(Queen) = V(King) + V(Woman) - V(Man) A close reading of word vectors in the eighteenth century Literary Analogy Edward Young, Conjectures on Original Composition: In a Letter to the Author of Sir Charles Grandison (1759): Riches are to Virtue as Learning is to Genius A close reading of word vectors in the eighteenth century Literary Analogy Vector Analogy Edward Young, Conjectures on Original Composition: In a Letter to the Author of Sir Charles Grandison (1759): V(Virtue-Riches) +V(Learning) =? Riches are to Virtue as Learning is to Genius A close reading of word vectors in the eighteenth century Interpreting V(Virtue-Riches) What does V(Virtue-Riches) mean? If (binary) gender is expressed through V(Woman-Man)… A close reading of word vectors in the eighteenth century Interpreting V(Virtue-Riches) What does V(Virtue-Riches) mean? If (binary) gender is expressed through V(Woman-Man), what does V(Virtue-Riches) express? A close reading of word vectors in the eighteenth century Interpreting V(Virtue-Riches) V(Virtue) Form of Value Noun Immaterial Abstraction Inherent/imma nent to individual - V(Riches) Form of Value Noun Material Abstraction Contingent/ext erior to individual = V(VirtueRiches) 0 0 ImmaterialMaterial 0 InherentContingent A close reading of word vectors in the eighteenth century Interpreting V(Virtue-Riches) What does V(Virtue-Riches) mean? If (binaristic) gender is expressed through the semantic axis of difference of V(WomanMan), what semantic axis does (VirtueRiches) express? V(Virtue-Riches) ? [Contingent/Material] [Inherent/Immaterial] A close reading of word vectors in the eighteenth century Interpreting V(Virtue-Riches+Learning) What does V(Virtue-Riches) mean? If (binaristic) gender is expressed through the semantic axis of difference of V(WomanMan), what semantic axis does (VirtueRiches) express? V(Virtue-Riches) ? [Contingent/Material] [Inherent/Immaterial] V(Virtue-Riches) + V(Learning) ? Start from the semantic profile of V(Learning) Move along the axis of V(Virtue-Riches): [Contingent/Material] [Inherent/Immaterial] Public materiality of one form of writerly attribute (Learning: academic, class-based) Immaterial immanence of another (inborn Genius) II. Vector-Experiments Vector Experiments: Corpus ECCO-TCP Number of texts and words ~2,350 texts published between 1700-99 ~84 million words. Number of texts by genre 1,250 prose texts (53%) 605 drama texts (26%), ~50% of which in verse 498 poetry texts (21%). Percentage of words by genre Prose: 66% Drama: 23% Poetry’s 11% Historical Distribution Vector Experiments: Model gensim Python implementation of word2vec. Settings: 5-word skip-grams (non-overlapping ngrams) All other settings default. --Download model at: http://ryanheuser.org/data/word2vec.ECCOTCP.txt.zip Load: Vector Experiments: Model gensim Python implementation of word2vec. Settings: 5-word skip-grams (non-overlapping ngrams) All other settings default. --Download model at: http://ryanheuser.org/data/word2vec.ECCOTCP.txt.zip Load: Vector Experiment 1: ??? Vector Experiment 1: Semantic Fields in Vector Space Semantic Cohort 1 Abstract Values Moral Valuation character, honour, conduct, respect, worthy … Social Restraint gentle, pride, proud, proper, agreeable, … Sentiment heart, feeling, passion, bosom, emotion, … Partiality correct, prejudice, partial, disinterested, … Literary Lab Pamphlet 4, Figure 8 (Heuser & LeKhac) Vector Experiment 1: Semantic Fields in Vector Space Semantic Cohort 1 Hard Seed Action Verbs see, come, go, came, look, let, looked, … Body Parts eyes, hand, face, head, hands, eye, arms, … Physical Adjectives round, hard, low, clear, heavy, hot, straight, … Colors white, black, red, blue, green, gold, grey, … Locative Prepositions out, up, over, down, away, back, through, … Numbers two, three, ten, thousand, four, five, hundred, … Literary Lab Pamphlet 4, Figure 15 (Heuser & LeKhac) Vector Experiment 1: Semantic Fields in Vector Space Each word-vector in the semantic fields is compared with every other to find the cosine distance between them. T-sne (dimensionality reduction algorithm) applied to these vector distances. Colored by the semantic field from which each word comes. All fields' words Vector Experiment 1: Semantic Fields in Vector Space Each word-vector in the semantic fields is compared with every other to find the cosine distance between them. T-sne (dimensionality reduction algorithm) applied to these vector distances. Colored by the semantic field from which each word comes. Abstract Values Vector Experiment 1: Semantic Fields in Vector Space Each word-vector in the semantic fields is compared with every other to find the cosine distance between them. T-sne (dimensionality reduction algorithm) applied to these vector distances. Colored by the semantic field from which each word comes. Hard Seed Vector Experiment 2: Operationalizing Abstractness as a Vector Step 1: Find the center of the vector space occupied by a range of abstract words. = V([Abstract words]) Vector Experiment 2: Operationalizing Abstractness as a Vector Step 1: Find the center of the vector space occupied by a range of abstract words. = V([Abstract words]) Step 2: Find the center of the vector space occupied by a range of concrete words. = V([Concrete words]) Vector Experiment 2: Operationalizing Abstractness as a Vector Step 1: Find the center of the vector space occupied by a range of abstract words. = V([Abstract words]) Step 2: Find the center of the vector space occupied by a range of concrete words. = V([Concrete words]) Step 3: Compute the vector-difference between them. = V([Abstract words]) Vector Experiment 2: Operationalizing Abstractness as a Vector The cosine similarity between any wordvector and V(AbstractConcrete) can then be computed. Showing the top 1,000 most frequent words, by part of speech. Vector Experiment 2: Operationalizing Abstractness as a Vector Abstractness in the eighteenth-century, according to V(Abstract-Concrete), can be compared to contemporary measures of abstractness. Data drawn from Mechanical Turk study, “Concreteness ratings for 40 thousand generally known English word lemmas" (Brysbaert, Warriner, Kuperman). R^2=0.32 Vector Experiment 2: Operationalizing Abstractness as a Vector Abstractness in the eighteenth-century, according to V(Abstract-Concrete), can be compared to contemporary measures of abstractness. Data drawn from Mechanical Turk study, “Concreteness ratings for 40 thousand generally known English word lemmas" (Brysbaert, Warriner, Kuperman). R^2=0.32 Vector Experiment 2: Operationalizing Abstractness as a Vector Abstractness in the eighteenth-century, according to V(Abstract-Concrete), can be compared to contemporary measures of abstractness. Data drawn from Mechanical Turk study, “Concreteness ratings for 40 thousand generally known English word lemmas" (Brysbaert, Warriner, Kuperman). R^2=0.32 Vector Experiment 3: Networks of Abstraction Step 1: Find the 1,000 most frequent "abstract" singular nouns, where "abstract" means a cosine similarity > 0 on V(AbstractConcrete). Step 2: Draw a link between two words if their cosine similarity > 0.7. = A network of "slant" synonymy: semantic turns are allowed, but the degree of semantic swerve is Vector Experiment 3: Networks of Abstraction Step 1: Find the 1,000 most frequent "abstract" singular nouns, where "abstract" means a cosine similarity > 0 on V(AbstractConcrete). Step 2: Draw a link between two words if their cosine similarity > 0.7. = A network of "slant" synonymy: semantic turns are allowed, but the degree of semantic swerve is Vector Experiment 3: Networks of Abstraction Step 1: Find the 1,000 most frequent "abstract" singular nouns, where "abstract" means a cosine similarity > 0 on V(AbstractConcrete). Step 2: Draw a link between two words if their cosine similarity > 0.7. = A network of "slant" synonymy: semantic turns are allowed, but the degree of semantic swerve is Vector Experiment 3: Networks of Abstraction Step 1: Find the 1,000 most frequent "abstract" singular nouns, where "abstract" means a cosine similarity > 0 on V(AbstractConcrete). Step 2: Draw a link between two words if their cosine similarity > 0.7. = A network of "slant" synonymy: semantic turns are allowed, but the degree of semantic swerve is Vector Experiment 3: Networks of Abstraction Step 1: Find the 1,000 most frequent "abstract" singular nouns, where "abstract" means a cosine similarity > 0 on V(AbstractConcrete). Step 2: Draw a link between two words if their cosine similarity > 0.7. = A network of "slant" synonymy: semantic turns are allowed, but the degree of semantic swerve is Theoretical Interlude: Word Vectors in the Eighteenth-Century Formal Homology 1: Abstraction and Addition Locke on Abstraction Of the complex Ideas, signified by the names Man, and Horse, leaving out but those particulars in which they differ, and retaining only those in which they agree, and of those, making a new distinct complex Idea, and giving the name Animal to it, one has a more general term, that comprehends, with Man, several other Creatures (An Essay Concerning Human Understanding [1689], Chapter III, “Of General Terms”). Theoretical Interlude: Word Vectors in the Eighteenth-Century Formal Homology 1: Abstraction and Addition Locke on Abstraction Locke's Abstraction as Vector Operation Of the complex Ideas, signified by the names Man, and Horse, leaving out but those particulars in which they differ, and retaining only those in which they agree, and of those, making a new distinct complex Idea, and giving the name Animal to it, one has a more general term, that comprehends, with Man, several other Creatures (An Essay Concerning Human Understanding [1689], Chapter III, “Of General Terms”). V(Man) + V(Horse) Magnifies the semantics Man and Horse, outstripping—ab-stracting—the semantics idiosyncratic to each. = V(Man) + V(Horse) = AVERAGE(V(0.2, 0.1), V(0.4, 0.2)) = V(0.3, 0.15) ≈ V(Animal) Theoretical Interlude: Word Vectors in the Eighteenth-Century Formal Homology 2: Contrast, Analogy, and Subtraction Hume's "Of Simplicity and Refinement" [W]e ought to be more on our Guard against the Excess of Refinement than that of Simplicity, because the former Excess is both less beautiful, and more dangerous than the latter. Tis a certain Rule, That Wit and Passion are intirely inconsistent. When the Affections are mov'd, there is no Place for the Imagination. The Mind of Man being naturally limited, 'tis impossible all its Faculties can operate at once: And the more any one predominates, the less Room is there for the others to exert their Vigour. Theoretical Interlude: Word Vectors in the Eighteenth-Century Formal Homology 2: Contrast, Analogy, and Subtraction Hume's "Of Simplicity and Refinement" [W]e ought to be more on our Guard against the Excess of Refinement than that of Simplicity, because the former Excess is both less beautiful, and more dangerous than the latter. Tis a certain Rule, That Wit and Passion are intirely inconsistent. When the Affections are mov'd, there is no Place for the Imagination. The Mind of Man being naturally limited, 'tis impossible all its Faculties can operate at once: And the more any one predominates, the less Room is there for the others to exert their Vigour. Analogical Network Type of oppositio n Associated with Simplicity Associated with Refinement Vector operation Relative outcomes of excess of either (more) Beautiful (more) Dangerous V(BeautifulDangerous) Aesthetic faculties Passion Wit V(PassionWit) Mental faculties Affections Imagination V(Affections Imagination) Vector Experiment 4: Measuring Analogy through Vector Correlation Ancient(s) <> Modern(s) Beautiful <> Sublime Body <> Mind Comedy <> Tragedy [Concrete words] <> [Abstract words] Folly <> Wisdom Genius <> Learning Human <> Divine Judgment <> Invention Law <> Liberty Marvellous <> Common [Nonevaluative words] <> [Evaluative words] Parliament <> King Passion <> Reason Pity <> Fear Private <> Public Queen <> King Romances <> Novels Ruin <> Reputation Simplicity <> Refinement Tradition <> Revolution Tyranny <> Liberty Virtue <> Honour Virtue <> Riches Virtue <> Vice Whig <> Tory Woman <> Man Vector Experiment 4: Measuring Analogy through Vector Correlation V(Simplicity-Refinement) V(Virtue-Vice) Statistics: R^2 = 0.41, Pearson correlation = 0.64 Interpretation: Simplicity is consistently moralized in the period as greater than Refinement; Refinement associated with the effeminacy and decline supposed to have happened to post-Augustan Rome. Vector Experiment 4: Measuring Analogy through Vector Correlation V(Simplicity-Refinement) V(Human-Divine) Statistics: R^2 = 0.30, Pearson correlation = 0.55 Interpretation: Divine simplicity vs. refined humanity Vector Experiment 4: Measuring Analogy through Vector Correlation V(Simplicity-Refinement) V(Ancient-Modern) Statistics: R^2 = 0.23, Pearson correlation = 0.48 Interpretation: Ancients associated simplicity; refinement associated moderns and modernity. with with Vector Experiment 4: Measuring Analogy through Vector Correlation V(Simplicity-Refinement) V(Woman-Man) Statistics: R^2 = 0.17, Pearson correlation = 0.42 Interpretation: V(Simplicity-Refinement) is gendered, with women associated with Simplicity and men associated with Refinement. Vector Experiment 4: Measuring Analogy through Vector Correlation V(Simplicity-Refinement) Step 1: Train a word2vec (5skipgram) on each quarter-century of texts in ECCO-TCP. Step 2: Measure the correlation (Pearson) between V(SimplicityRefinement) and other selected vectors, in each quartercentury model. Vector Experiment 4: Measuring Analogy through Vector Correlation V(Simplicity-Refinement) Step 1: Train a word2vec (5skipgram) on each quarter-century of texts in ECCO-TCP. Step 2: Measure the correlation (Pearson) between V(SimplicityRefinement) and other selected vectors, in each quartercentury model. Vector Experiment 4: Measuring Analogy through Vector Correlation Analogical Network Step 1: Add all manually-selected semantic contrasts as nodes. Step 2: Link nodes (semantic contrasts) when the R^2 of their linear regression is > 0.1. Step 3: Color edges by red (negative correlation) and blue (positive correlation. Step 4: Size nodes by betweenness Vector Experiment 4: Measuring Analogy through Vector Correlation Analogical Network Step 1: Add all manually-selected semantic contrasts as nodes. Step 2: Link nodes (semantic contrasts) when the R^2 of their linear regression is > 0.1. Step 3: Color edges by red (negative correlation) and blue (positive correlation. Step 4: Size nodes by betweenness Vector Experiment 4: Measuring Analogy through Vector Correlation Analogical Network Step 1: Add all manually-selected semantic contrasts as nodes. Step 2: Link nodes (semantic contrasts) when the R^2 of their linear regression is > 0.1. Step 3: Color edges by red (negative correlation) and blue (positive correlation. Step 4: Size nodes by betweenness Vector Experiment 4: Measuring Analogy through Vector Correlation Analogical Network Step 1: Add all manually-selected semantic contrasts as nodes. Step 2: Link nodes (semantic contrasts) when the R^2 of their linear regression is > 0.1. Step 3: Color edges by red (negative correlation) and blue (positive correlation. Step 4: Size nodes by betweenness Vector Experiment 4: Measuring Analogy through Vector Correlation Analogical Network Step 1: Add all manually-selected semantic contrasts as nodes. Step 2: Link nodes (semantic contrasts) when the R^2 of their linear regression is > 0.1. Step 3: Color edges by red (negative correlation) and blue (positive correlation. Step 4: Size nodes by betweenness End Theoretical Interlude: Word Vectors in the Eighteenth-Century Vector Operations and Virtual Concepts Abstracted Vectors [Locke] Might have a direct expression in language as a word ("animal"). But also might not: V(Virtue+Riches) might mean something like "forms of value". Thus V(Virtue+Riches) is a virtual concept in its eighteenth-century sense: "Having the efficacy without the sensible or material part" (Johnson). Virtual concept that might be called an abstracted concept, expressed by an abstracting vector operation. Critical Note: Abstracted concepts are often invisibly and ideologically efficacious as Theoretical Interlude: Word Vectors in the Eighteenth-Century Vector Operations and Virtual Concepts Abstracted Vectors [Locke] Spectral Vectors [Hume] Might have a direct expression in language as a word ("animal"). Might have a direct expression in language ("gender", [in 18C] "sex"). But also might not: V(Virtue+Riches) might mean something like "forms of value". But also might not: V(Virtue-Riches) might mean something like "the semantic spectrum between immanent spirituality and public materiality." Not exactly a single word. Thus V(Virtue+Riches) is a virtual concept in its eighteenth-century sense: "Having the efficacy without the sensible or material part" (Johnson). Virtual concept that might be called an abstracted concept, expressed by an abstracting vector operation. Critical Note: Abstracted concepts are often invisibly and ideologically efficacious as Thus (Virtue-Riches) is a virtual concept that might be called a spectral concept. Critical Note: Spectral in both senses: as a spectrum and as a specter. The spectral concept of V(Woman-Man) enacts a binaristic, misogynist ideology of gender. And the spectral concept of V(Black-White) enacts a distinctly American binaristic racist Vector Experiment 1: Semantic Fields in Vector Space Each word-vector in the semantic fields is compared with every other to find the cosine distance between them. Each node/word is linked to the three nodes/words closest to it in the vector space, among all words in the semantic fields. Vector Experiment 1: Semantic Fields in Vector Space Each node/word in the semantic fields is now linked to its three closest words in the vector space, even if those words are not included in the semantic fields. New nodes/words brought into the network in this first step, are themselves also connected to their closest words in the vector space. Gray words/nodes are not included in the fields.