PHYLOGENY OF THE EARLY GERMANIC LANGUAGES Dario Papavassiliou & Keith M. Briggs University of Warwick Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 UWE & BT Research Contents Contents Introduction Why is evolutionary linguistics interesting? Quantitative linguistics The Germanic languages Data Methods Maximum parsimony MCMC Results & conclusions Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Introduction Evolutionary linguistics Charles Darwin offered languages as an illustrative example of evolution Languages show analogies to genetic features: mutation and inheritance The history of languages has a close correspondence to the history of humanity The origin of language Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 the origin of modern humanity? Introduction A brief history 178 Liba Taub Schleicher’s tree model (“Language stock”) 4~irak- f~71i (Language families) A5~Pewk-;Srcfmi14e 4e C------ ~~--------- S/pra)-ftaieinfandt ~~~~~~~~~~~~~~~~~~~~~~~~~- - -- - - -- I, a Ursprache (Ancestral language) Figure1. A. Schleicher,Die Deutsche Sprache (Stuttgart, 1869), 2nd edn (1st edn, 1860), p. 28. Dario Papavassiliou Courtesyof the Universityof ChicagoLibrary. Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Introduction A brief history Schleicher’s tree model of Indo-European (German) Evolutionary ideas and 'empirical' methods (Lithuanian) (Slavic) (Celtic) 'e \<a ' ,jnbogerm,Urjpradae (Proto-Indo-European) Dario Papavassiliou Phylogeny of the early Germanic languages (Greek) (Iranian) (Indian) 1 Wednesday, 3 September 2014 (Italic) (Albanian) Introduction A brief history Schmidt’s wave model Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Introduction A brief history Schmidt’s wave model Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Introduction A brief history Schmidt’s wave model Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Introduction A brief history Schmidt’s wave model Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Introduction A brief history Schmidt’s wave model - The Balkan Sprachbund 4 Indo-European language families (Greek, Romance, Albanian, Slavic) and the unrelated Turkish Turkish Share many grammatical (and lexical) features not seen elsewhere Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Introduction A brief history Real linguistic evolution is driven by a combination of these processes Analogous to genetic evolution: inheritance versus lateral transfer (in viruses) Inheritance is dominant in sparsely populated regions, lateral transfer becomes important when there is much contact between unrelated languages (Strong influence of technology: writing, printing, internet...) Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Introduction Challenges facing evolutionary linguists A (nearly) total absence of historical data! Analysis must depend on observation of modern (i.e. written) languages, plus (more recently) modelling Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Introduction Methodology-Swadesh lists Very common for analyses to be based on lexical data: Swadesh lists List of 100 common words thought to be particularly resistant to replacement by loanwords Italian (mare) Gaelic (muir) Russian (more) Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 German (Meer, See) English (sea) Norwegian (sjø) Dutch (zee) Greek (thalassa) Introduction Quantitative linguistics Dutch English Gaelic German Italian Norwegian Russian Greek Swadesh lists allow for construction of a “genome” for languages M*r 0 0 1 1 1 0 1 0 S* 1 1 0 1 0 1 0 0 This is then used with similar machinery as used to compare amino acid or DNA sequences Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Motivation What if non-lexical data are used? Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Data Data set - the early Germanic languages Vikings (C8th) Goths (C2nd - ) Angles (C5th) Saxons (C5th) Jutes (C5th) Franks (C4th) “Germans” (BC) Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Data Data set - the early Germanic languages Old Norse Old English dialects Anglian West Saxon Kentish Old Frisian Old Saxon Old High German Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Gothic Data Data set - the early Germanic languages Proto-Germanic A classic data set... Gothic Old English dialects Anglian Kentish West Saxon Old Frisian Old Norse Gothic Old High German Old Saxon Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 German Norse Low German High German “in a broader sense” Saxon Frisian Old Saxon English Dutch Low German Schleicher’s classification Data Data set - source Old English and the Continental Germanic Languages: A Survey of Morphological and Phonological Interrelations Hans Frede Nielsen Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Data Data set - source Sample entry: “The [Indo-European genitive singular] ō-stem ending -ãs is reflected in Gothic gibōs, ON skarar, OS geƀa and OHG geba, but not in OE giefe and OFris. ieve, where the original suffix has been analogically replaced by the [dative singular] ending ([reflecting Indo- European] -ãi)...” Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Data Data set - interpretation as binary genome Reflects IE Gen Dat Sample entry: OE Anglian “The [Indo-European genitive singular] ō-stem ending -ãs is reflected in Gothic gibōs, ON skarar, OS geƀa and OHG geba, but not in OE giefe and OFris. ieve, where the original suffix has been analogically replaced by the [dative singular] ending ([reflecting Indo- European] -ãi)...” OE Kentish OE W Saxon O Frisian O Saxon O H German O Norse Gothic Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 Data Data set - interpretation as binary genome Missing data marked with ? Omitted data (duplicate entries, “insignificant/late”, too subtle) marked with - and disregarded Results in a ‘genome’ of 531 characters for each language Can be filtered into sub-genomes for different linguistic categories (nouns, verbs, numerals..., vowels, consonants) and (in principle) weighted Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Reflects IE Gen Dat OE Anglian OE Kentish OE W Saxon O Frisian O Saxon O H German O Norse Gothic 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 Data Statistics - traits per language 3 Traits per language 2.5 A very basic indication of the completeness of the data Frequency 2 Gothic under-represented (due to a lack of texts in Gothic) 1.5 Old English dialects over-represented (due to subject of book) 1 0.5 0 160 180 200 220 240 260 280 300 Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Data Statistics - traits per language 3 Traits per language 2.5 A very basic indication of the completeness of the data Frequency 2 Gothic under-represented (due to a lack of Gothic sources) 1.5 Old English dialects over-represented (due to subject of book) 1 0.5 0 160 180 200 220 240 260 280 300 Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Data Statistics - languages per trait 90 Languages per trait 3 Traits per language 80 2.5 Since book focuses on relationships between languages it does not discuss traits seen in only one language 2 70 Frequency Frequency 60 Traits seen in all, or none, of the species 1.5 are uninformative 50 40 Flat distribution → timescale of evolution is long 30 1 20 0.5 10 0 0 1 2 3 4 5 6 7 8 Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 0 160 180 200 220 240 260 280 300 Data Gothic Old Norse Old High German Old Saxon Old Frisian West Saxon Kentish Anglian Statistics - distance matrix Anglian Kentish West Saxon Old Frisian Old Saxon Old High German Old Norse Gothic cihtoG sroN dlO Wednesday, 3 September 2014 hgiH dlO oxaS dlO sirF dlO xaS tseW hsitneK nailgnA Dario Papavassiliou Phylogeny of the early Germanic languages Form distance matrix by counting differences in genome Some relationships immediately apparent Data Minimal spanning tree ONor A very crude quantification of distances between languages Construct a full graph with edge weights defined as distance Delete edges with large weight to give minimal spanning tree 234 223 OEAn OSax 42 126 Goth OEWS 18 OEKt 70 OFri 113 OHGe Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Methods Maximum parsimony 0 1 Minimises number of changes over tree to obtain observed genomes 1 1 0 0 {011} {010} {010 } Implemented using the Fitch algorithm Repeated for each character in genome, then for each possible tree topology Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 1 0 { 01 } 0 1 Methods Maximum parsimony Unless ancestral state is a leaf, the tree is unrooted a b c Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 a b c a b c Methods Maximum parsimony Unless ancestral state is a leaf, the tree is unrooted a b x a b c a b c x c Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 x Methods Maximum parsimony Unless ancestral state is a leaf, the tree is unrooted a b x x a b c a b c x c Gothic chosen as outgroup due to distance from other languages Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Methods Maximum parsimony ONor ? Gives a sensible tree topology, but unrooted tree → cannot resolve EG/ WG/NG split! OEAn ?? OSax 99 94 OEKt Goth OFri OEWS OHGe Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Gives only information on topology, not chronology Methods Markov chain Monte Carlo - Dollo model Evolution modelled as a collection of Poisson processes: Trait born with rate λ ● Trait dies with rate μ ✖ Lineage splits with rate θ ★ ✖ ● ● ✖ ● ★ ✖ ● Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Methods Markov chain Monte Carlo - Dollo model Evolution modelled as a collection of Poisson processes: Trait born with rate λ ● Trait dies with rate μ ✖ Lineage splits with rate θ ★ ✖ ● ● ✖ ● ★ ✖ ▲ ● Catastrophe occurs with rate ρ: each trait dies with P(κ), Poisson(κλ/μ) new traits born ▲ Equivalent to an edge lengthening Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Methods Markov chain Monte Carlo - Implementation Implemented using the TraitLab package* d et al. MCMC scheme example moves Change tree topology *Geoff Nicholls, Oxford Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Methods Markov chain Monte Carlo - Implementation Implemented using the TraitLab package* d et al. MCMC scheme example moves Vary model parameters Change tree topology *Geoff Nicholls, Oxford Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Methods Markov chain Monte Carlo - Implementation Implemented using the TraitLab package* d et al. MCMC scheme example moves Vary model parameters Vary locations of catastrophes Change tree topology *Geoff Nicholls, Oxford Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Methods Markov chain Monte Carlo - Implementation 1,000,000 steps performed First 100,000 discarded (equilibration) Remaining sampled every 100 steps Samples averaged to give a consensus tree Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Methods Consensus tree Given a set of N trees, a consensus tree representing an ‘average’ topology is constructed: ... Root node Most common split x% ... Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Results Results ONor Obtain same tree (topologically) as from parsimony Chronological resolution groups NG with WG OEAn OSax 99 Goth Very good consensus between samples 94 OEKt OFri OEWS OHGe Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Results Results Morphology Phonology - vowels ONor ONor 71 75 OEAn OEAn OSax OSax 81 80 Goth Goth 98 74 OEKt OFri 97 OEKt OEWS OHGe Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 OFri OEWS OHGe Results Results We obtain the following phylogeny... Proto-Germanic WG EG Gothic Old High German NG Old Norse Old Saxon Old English Old Frisian Anglian West Saxon Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Kentish Results Conclusions Compares (mostly) favourably to Schleicher’s classification Proto-Germanic Proto-Germanic WG EG Gothic Gothic Old High German NG Old Norse German Low German High German Old Saxon “in a broader sense” Saxon Old English Old Frisian Anglian Kentish West Saxon Frisian Old Saxon English Dutch Low German as well as quantitative (lexical) analyses by others Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Norse Results Conclusions Compares (mostly) favourably to Schleicher’s classification Proto-Germanic Proto-Germanic ?? WG EG Gothic Gothic Old High German NG Old Norse German Low German High German Old Saxon “in a broader sense” Saxon Old English Old Frisian Anglian Kentish West Saxon Frisian Old Saxon English Dutch Low German as well as quantitative (lexical) analyses by others Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Norse Results Conclusions Phonetic, particularly vocalic, data emphasise later contact... Proto-Germanic WG EG Gothic North Sea Germanic Continental Germanic Old High German NG Old Norse Old Saxon Old English Old Frisian Anglian West Saxon Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Kentish Results Conclusions Phonetic, particularly vocalic, data emphasise later contact... Proto-Germanic WG EG Gothic North Sea Germanic Continental Germanic Old High German NG Old Norse Old Saxon Old English Old Frisian Anglian West Saxon Kentish ...criterion to determine breakdown of phylogenic model? Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Thanks Keith Briggs UWE & BT Research Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Dario Spanò Warwick Geoff Nicholls Oxford Thanks Keith Briggs UWE & BT Research Dario Papavassiliou Phylogeny of the early Germanic languages Wednesday, 3 September 2014 Dario Spanò Warwick Geoff Nicholls Oxford