PHYLOGENY OF THE EARLY GERMANIC LANGUAGES Dario Papavassiliou

advertisement
PHYLOGENY OF THE
EARLY GERMANIC LANGUAGES
Dario Papavassiliou & Keith M. Briggs
University of Warwick
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
UWE & BT Research
Contents
Contents
Introduction
Why is evolutionary linguistics interesting?
Quantitative linguistics
The Germanic languages
Data
Methods
Maximum parsimony
MCMC
Results & conclusions
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Introduction
Evolutionary linguistics
Charles Darwin offered languages as an illustrative example
of evolution
Languages show analogies to genetic features: mutation
and inheritance
The history of languages has a close correspondence to the
history of humanity
The origin of language
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
the origin of modern humanity?
Introduction
A brief history
178
Liba Taub
Schleicher’s
tree
model
(“Language stock”)
4~irak- f~71i
(Language
families) A5~Pewk-;Srcfmi14e
4e
C------
~~---------
S/pra)-ftaieinfandt
~~~~~~~~~~~~~~~~~~~~~~~~~-
-
--
-
-
--
I,
a
Ursprache (Ancestral language)
Figure1. A. Schleicher,Die Deutsche Sprache (Stuttgart, 1869), 2nd edn (1st edn, 1860), p. 28.
Dario Papavassiliou
Courtesyof the Universityof ChicagoLibrary.
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Introduction
A brief history
Schleicher’s tree model of Indo-European
(German)
Evolutionary ideas and 'empirical' methods
(Lithuanian)
(Slavic)
(Celtic)
'e
\<a
'
,jnbogerm,Urjpradae
(Proto-Indo-European)
Dario Papavassiliou
Phylogeny of the early Germanic languages
(Greek)
(Iranian)
(Indian)
1
Wednesday, 3 September 2014
(Italic)
(Albanian)
Introduction
A brief history
Schmidt’s wave model
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Introduction
A brief history
Schmidt’s wave model
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Introduction
A brief history
Schmidt’s wave model
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Introduction
A brief history
Schmidt’s wave model
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Introduction
A brief history
Schmidt’s wave model - The Balkan Sprachbund
4 Indo-European language
families (Greek, Romance,
Albanian, Slavic) and the
unrelated Turkish
Turkish
Share many grammatical
(and lexical) features not
seen elsewhere
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Introduction
A brief history
Real linguistic evolution is driven by a combination of these processes
Analogous to genetic evolution: inheritance versus lateral transfer (in
viruses)
Inheritance is dominant in sparsely populated regions, lateral transfer
becomes important when there is much contact between unrelated
languages
(Strong influence of technology: writing, printing, internet...)
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Introduction
Challenges facing evolutionary linguists
A (nearly) total absence of
historical data!
Analysis must depend on observation of modern (i.e.
written) languages, plus (more recently) modelling
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Introduction
Methodology-Swadesh lists
Very common for analyses to be based on lexical data: Swadesh lists
List of 100 common words thought to be particularly resistant to
replacement by loanwords
Italian
(mare)
Gaelic
(muir)
Russian
(more)
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
German
(Meer, See)
English
(sea)
Norwegian
(sjø)
Dutch
(zee)
Greek
(thalassa)
Introduction
Quantitative linguistics
Dutch
English
Gaelic
German
Italian
Norwegian
Russian
Greek
Swadesh lists allow for construction of a “genome” for languages
M*r
0
0
1
1
1
0
1
0
S*
1
1
0
1
0
1
0
0
This is then used with similar machinery as used to compare amino
acid or DNA sequences
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Motivation
What if non-lexical data are used?
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Data
Data set - the early Germanic languages
Vikings (C8th)
Goths
(C2nd - )
Angles (C5th)
Saxons (C5th)
Jutes (C5th)
Franks (C4th) “Germans” (BC)
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Data
Data set - the early Germanic languages
Old Norse
Old English dialects
Anglian
West Saxon
Kentish
Old
Frisian
Old
Saxon Old High
German
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Gothic
Data
Data set - the early Germanic languages
Proto-Germanic
A classic data set...
Gothic
Old English dialects
Anglian
Kentish
West Saxon
Old Frisian
Old Norse
Gothic
Old High German
Old Saxon
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
German
Norse
Low German
High German
“in a broader sense”
Saxon
Frisian
Old Saxon
English
Dutch
Low German
Schleicher’s classification
Data
Data set - source
Old English and the Continental
Germanic Languages: A Survey of
Morphological and Phonological
Interrelations
Hans Frede Nielsen
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Data
Data set - source
Sample entry:
“The [Indo-European genitive singular] ō-stem ending
-ãs is reflected in Gothic gibōs, ON skarar, OS geƀa
and OHG geba, but not in OE giefe and OFris. ieve,
where the original suffix has been analogically
replaced by the [dative singular] ending ([reflecting
Indo- European] -ãi)...”
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Data
Data set - interpretation as binary genome
Reflects IE
Gen Dat
Sample entry:
OE Anglian
“The [Indo-European genitive singular] ō-stem ending
-ãs is reflected in Gothic gibōs, ON skarar, OS geƀa
and OHG geba, but not in OE giefe and OFris. ieve,
where the original suffix has been analogically
replaced by the [dative singular] ending ([reflecting
Indo- European] -ãi)...”
OE Kentish
OE W Saxon
O Frisian
O Saxon
O H German
O Norse
Gothic
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
0
Data
Data set - interpretation as binary genome
Missing data marked with ?
Omitted data (duplicate entries,
“insignificant/late”, too subtle) marked
with - and disregarded
Results in a ‘genome’ of 531 characters
for each language
Can be filtered into sub-genomes for
different linguistic categories (nouns,
verbs, numerals..., vowels, consonants) and
(in principle) weighted
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Reflects IE
Gen Dat
OE Anglian
OE Kentish
OE W Saxon
O Frisian
O Saxon
O H German
O Norse
Gothic
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
0
Data
Statistics - traits per language
3
Traits per language
2.5
A very basic indication of the
completeness of the data
Frequency
2
Gothic under-represented (due to a
lack of texts in Gothic)
1.5
Old English dialects over-represented
(due to subject of book)
1
0.5
0
160
180
200
220
240
260
280
300
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Data
Statistics - traits per language
3
Traits per language
2.5
A very basic indication of the
completeness of the data
Frequency
2
Gothic under-represented (due to a
lack of Gothic sources)
1.5
Old English dialects over-represented
(due to subject of book)
1
0.5
0
160
180
200
220
240
260
280
300
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Data
Statistics - languages per trait
90
Languages per trait
3
Traits per language
80
2.5
Since
book focuses on relationships
between languages it does not discuss
traits
seen in only one language
2
70
Frequency
Frequency
60
Traits
seen in all, or none, of the species
1.5
are uninformative
50
40
Flat distribution → timescale of
evolution is long
30
1
20
0.5
10
0
0
1
2
3
4
5
6
7
8
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
0
160
180
200
220
240
260
280
300
Data
Gothic
Old Norse
Old High German
Old Saxon
Old Frisian
West Saxon
Kentish
Anglian
Statistics - distance matrix
Anglian
Kentish
West Saxon
Old Frisian
Old Saxon
Old High German
Old Norse
Gothic
cihtoG
sroN dlO
Wednesday, 3 September 2014
hgiH dlO
oxaS dlO
sirF dlO
xaS tseW
hsitneK
nailgnA
Dario Papavassiliou
Phylogeny of the early Germanic languages
Form distance matrix by
counting differences in
genome
Some relationships
immediately apparent
Data
Minimal spanning tree
ONor
A very crude quantification of
distances between languages
Construct a full graph with edge
weights defined as distance
Delete edges with large weight to give
minimal spanning tree
234
223
OEAn
OSax
42
126
Goth
OEWS
18
OEKt
70
OFri
113
OHGe
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Methods
Maximum parsimony
0
1
Minimises number of changes over tree
to obtain observed genomes
1
1
0
0
{011}
{010}
{010 }
Implemented using the Fitch algorithm
Repeated for each character in
genome, then for each possible tree
topology
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
1
0
{ 01 }
0
1
Methods
Maximum parsimony
Unless ancestral state is a leaf, the tree is unrooted
a
b
c
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
a b c
a b c
Methods
Maximum parsimony
Unless ancestral state is a leaf, the tree is unrooted
a
b
x
a b c
a b c
x
c
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
x
Methods
Maximum parsimony
Unless ancestral state is a leaf, the tree is unrooted
a
b
x
x
a b c
a b c
x
c
Gothic chosen as outgroup due to distance from other languages
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Methods
Maximum parsimony
ONor
?
Gives a sensible tree topology, but
unrooted tree → cannot resolve EG/
WG/NG split!
OEAn
??
OSax
99
94
OEKt
Goth
OFri
OEWS
OHGe
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Gives only information on topology, not
chronology
Methods
Markov chain Monte Carlo - Dollo model
Evolution modelled as a collection of Poisson processes:
Trait born with rate λ
●
Trait dies with rate μ
✖
Lineage splits with rate θ ★
✖
●
●
✖ ●
★
✖
●
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Methods
Markov chain Monte Carlo - Dollo model
Evolution modelled as a collection of Poisson processes:
Trait born with rate λ
●
Trait dies with rate μ
✖
Lineage splits with rate θ ★
✖
●
●
✖ ●
★
✖
▲
●
Catastrophe occurs with rate ρ: each trait dies with P(κ),
Poisson(κλ/μ) new traits born ▲
Equivalent to an edge lengthening
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Methods
Markov chain Monte Carlo - Implementation
Implemented using the TraitLab package*
d et al.
MCMC scheme example moves
Change tree topology
*Geoff Nicholls, Oxford
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Methods
Markov chain Monte Carlo - Implementation
Implemented using the TraitLab package*
d et al.
MCMC scheme example moves
Vary model parameters
Change tree topology
*Geoff Nicholls, Oxford
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Methods
Markov chain Monte Carlo - Implementation
Implemented using the TraitLab package*
d et al.
MCMC scheme example moves
Vary model parameters
Vary locations of catastrophes
Change tree topology
*Geoff Nicholls, Oxford
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Methods
Markov chain Monte Carlo - Implementation
1,000,000 steps performed
First 100,000 discarded (equilibration)
Remaining sampled every 100 steps
Samples averaged to give a consensus tree
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Methods
Consensus tree
Given a set of N trees, a consensus tree representing an
‘average’ topology is constructed:
...
Root node
Most common
split
x%
...
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Results
Results
ONor
Obtain same tree (topologically) as
from parsimony
Chronological resolution groups NG
with WG
OEAn
OSax
99
Goth
Very good consensus between samples
94
OEKt
OFri
OEWS
OHGe
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Results
Results
Morphology
Phonology - vowels
ONor
ONor
71
75
OEAn
OEAn
OSax
OSax
81
80
Goth
Goth
98
74
OEKt
OFri
97
OEKt
OEWS
OHGe
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
OFri
OEWS
OHGe
Results
Results
We obtain the following phylogeny...
Proto-Germanic
WG
EG
Gothic
Old High
German
NG
Old Norse
Old Saxon
Old English
Old Frisian
Anglian
West Saxon
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Kentish
Results
Conclusions
Compares (mostly) favourably to Schleicher’s classification
Proto-Germanic
Proto-Germanic
WG
EG
Gothic
Gothic
Old High
German
NG
Old Norse
German
Low German
High German
Old Saxon
“in a broader sense”
Saxon
Old English
Old Frisian
Anglian
Kentish
West Saxon
Frisian
Old Saxon
English
Dutch
Low German
as well as quantitative (lexical) analyses by others
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Norse
Results
Conclusions
Compares (mostly) favourably to Schleicher’s classification
Proto-Germanic
Proto-Germanic
??
WG
EG
Gothic
Gothic
Old High
German
NG
Old Norse
German
Low German
High German
Old Saxon
“in a broader sense”
Saxon
Old English
Old Frisian
Anglian
Kentish
West Saxon
Frisian
Old Saxon
English
Dutch
Low German
as well as quantitative (lexical) analyses by others
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Norse
Results
Conclusions
Phonetic, particularly vocalic, data emphasise later contact...
Proto-Germanic
WG
EG
Gothic
North Sea
Germanic
Continental
Germanic
Old High
German
NG
Old Norse
Old Saxon
Old English
Old Frisian
Anglian
West Saxon
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Kentish
Results
Conclusions
Phonetic, particularly vocalic, data emphasise later contact...
Proto-Germanic
WG
EG
Gothic
North Sea
Germanic
Continental
Germanic
Old High
German
NG
Old Norse
Old Saxon
Old English
Old Frisian
Anglian
West Saxon
Kentish
...criterion to determine breakdown of phylogenic model?
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Thanks
Keith Briggs
UWE & BT Research
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Dario Spanò
Warwick
Geoff Nicholls
Oxford
Thanks
Keith Briggs
UWE & BT Research
Dario Papavassiliou
Phylogeny of the early Germanic languages
Wednesday, 3 September 2014
Dario Spanò
Warwick
Geoff Nicholls
Oxford
Download