Five waves of Indo-European expansion from the South Asian Urheimat: An OIT Model (some preliminary linguistic considerations and relative chronology) by Igor A. Tonoyan-Belyayev July 2018 Saint-Petersburg, Russia §1. The present paper is an attempt to build a linguistically based model for both South Asian origin of all Indo-Europeans and for the process of gradual expansion of the Indo-European languages outside South Asia during Late Neolithic, Bronze, and Iron Ages. The core set of non-linguistic evidence in favour of the OIT (Out-of-India) or OSAT (Out-of-South-Asia) theory itself is discussed elsewhere. Beforehand, only one point should be noted. As of 2018, we have first documentally corroborated evidence of Indo-Europeans in Eastern Anatolia since c.2000-1800 BC (Hittite words in Assyrian texts); then, we have documents in Anatolian languages from ca.1600 BC on, Asia Minor; finally, we have both Mycenaean Greek documents in Greece and Crete from ca.15001400 BC and the remnants of Aryan-like names, titles, and special terms in Hurrite and Hittite texts from ca.1500-1400 BC. This is what is already wellknown and well-established. This shows that no simple fabrication basing upon the idea of "Caucasian" or "Europeoid" race being present already in Neolithic Europe "means" that Indo-European languages tacitly equated with a definite race type (we know which one) were also present there, can be approved as a matter of fact. Nothing, except simple wish, can speak in favour of any early Indo-European presence in both Western and Eastern Europe (including the ideas of the so-called Kurgan hypothesis which can in no way be taken as based on facts due to the lack of any linguistic evidence in the absence of documents of language). On the other hand, we have two still controversial evidences: the idea of the earliest Indo-European presence or contact with Southern Mesopotamia (as Whittaker, myself, and others have tried to show), possibly via the maritime trade with the Mature Harappan civilization (which is illustrated with some words of possible Indo-European origin in Sumerian); and the fact that the Mature Harappan script is still undeciphered, and as it represents the major among the greatest civilizations of the Bronze Age, taken together with the facts of archaeology (no traces of Aryan invasion after during or after the Mature Harappan epoch, 2600-1900 BC), the character of Old Indo-Aryan literary tradition (an abnormally huge corpus containing texts in different stages or sub-stages of Old Indic), and so on, it can appear to be either the earliest Indo-European script (thousand years earlier than that of Hittites and Mycenaeans) or the script of a separate archaic branch of Indo-European, that is, Proto-Indic. This idea is usually tacitly or mockingly refused by most Indo-European scholars so far, but there is nothing non-verifiable (or, non-falsifiable) here, so it is principally a scientific problem still requiring elaboration of the means of its solution, instead of hushing it up. §2. The system of consonants as a starting point. Lexical isoglosses and lexicostatistics themselves are very useful, but being considered in short lists (like that of Swadesh) they give nearly nothing (especially, without precise and deep analysis of diachronic shift of meanings, sets of synonyms, and sets of derivatives from the same root). Grammatical indicators and syntactical structures give more. But phonetics and phonology is something which shows both internal divergence of the group of related dialects and some systematic substratum/adstratum influences. Anyway, they all should be subject to complex analysis, but we still need something coherent but relatively simple to begin with. All Indo-European dialects can be roughly divided into four groups with respect to their systems of consonants: 1) the system of only one series of plosives (no voiced/unvoiced and no aspirate/non-aspirate pairs), which is represented by Anatolian and Tocharian (with some further traces of previous weak~strong opposition). 2) the system of two series of plosives, voiced and unvoiced, which is represented by Baltic, Slavic, Nuristani, many late Eastern Iranian dialects, and for the most part by Celtic (though with some traces of a third series, cf. Italic in the next group below). 3) the system of three primordial series of plosives, itself forming two subgroups – of [3A] Greek (with unvoiced aspirates in place of voiced ones) and of [3B] Germano-Armenian (with shift and with simple voiced plosives in place of voiced aspirates) types, respectively. Italic, with its remnants of the third series, occupies a somewhat intermediate position, with its "h" and "f" of Greek-like type, while "d" and "b" of Germano-Armenian type. 4) the system with secondarily developed "fourth" unvoiced aspirate series, again forming two subgroups – of [4A] Indic (with four series of plosives, two non-aspirate and two aspirate ones) and of [4B] DardoSouthern/Western-Iranian type (with new unvoiced aspirates present, but older aspirates merged with simple voiced plosives). It is difficult so far to classify minor Indo-European branches with respect to this scheme, but the whole set of main branches is enough to begin complex analysis. Even when looked at superficially, this list shows, according to the general rule of archaic periphery and innovative center, that the center (and hence, the point of departure) of Indo-European expansion occupied the territory of later dialects of the fourth group. Of course, superficial observation cannot be accepted as a strong argument as is. It should be further studied. What is "surprisingly" obvious here is that the old scheme of centum~satǝm branching is not that important as the systematic analysis of the whole system of plosives, as centum and satǝm forms may be present in th same language, while two different systems of plosives themselves cannot be present in one and the same language. Moreover, the satǝm~centum division as well as the problem of labiovelars vs. palatals are much more sophisticated and need more case study not allowing real generalizations which are attempted to do. Going back to the rule of center and periphery, we can also notice that Anatolian and Tocharian, accepted as split off first basing upon other reasons, here are also shown as the first group to be separated and thus preserving archaic model of plosive system (two series of weaks and strongs), though gradually dying out. Going on, we also see that Nuristani, Balto-Slavic, mostly Celtic and Eastern Iranian, though superficially showing a different system, in fact represent the following stage of evolution where the archaic binary opposition was mostly rethought than fully lost or expanded; of course, this process was complex, and dialects of the remaining two groups significantly influenced it, but the these groups show long stability as to preserving a binary opposition, but not a ternary one. This makes us suppose that areally they represent the second wave periphery. It also speaks in favour of the idea that original Proto-Indo-European did not possessed any aspirate series, whether voiced or unvoiced, all this being the product of subsequent regional evolution; the series of voiced aspirates of Indic and unvoiced aspirates in Greek (and mix in Italic) was thus originally likely a strong series, then rethought as a voiced one, and only thereafter as an aspirated one. But then, Anatolian, Tocharian, Baltic, Slavic, Nuristani and the like together show, that it is highly possible that the distinction between voiced and unvoiced non-aspirates was just a kind of positional evolution of former allophones (two frequent positions, namely, before a pause and after the smobile, originally fully neutralized this distinction). Before we continue, let's look at a minute example of some other kind, that is, a lexical isogloss. §3. Indo-European words for "fire". There are a number of interesting isoglosses, but here we are going to consider only one of them, so far. Most Indo-European dialects are grouped about two main words for the fire, namely, *pur/paḫḫur and *agni/ogni stems. In the archaic periphery already established, there prevails the *pur/paḫḫur stem, namely, in Hittite. It is therefore obvious that all branches that split off later might have borrowed this word coming to the new periphery, in case it coincided with the place where Anatolians (and ProtoTocharians) previously lived. As we have only two directions of the Out-ofIndia expansion, namely, the Western Route and the Northern Route, and two end points of the most archaic IE languages there, it is Eastern Anatolia and, roughly, Eastern Kazakhstan (to the east of the Aral sea) that were "attractors" for those later dialects borrowing this stem. Actually, we have at least Greek pyr, Armenian hur, etc. in the west, and Germanic Feuer/fire, etc. in the north (note that Balto-Slavic does not have this word). If this stem was not borrowed from some non-IE language by proto-Hittito-Tocharians at the moment of their going out from South Asia, then it should have an IndoEuropean root. Actually, this root is preserved in Aryan pū- "to purify", in Latin purus "pure", and so. So, this is just a ritual epithet of fire, which was later substantivized. Moreover, Indic already has other fire's epithets from the same root, like pāvaka and the like, thus representing (together with the early center of innovation) the richest original pool of elements, which an independent evidence in favour of South Asia's being the first common Urheimat for all the Indo-Europeans. If the precise stem would have been preserved in Indic, it would have had the form of something like *puvar. But Indic uses agni instead, which is not an epithet, as it could not be analysed within the scope of Indo-European, though it can be possibly far-related to the Semitic root and word meaning "blazing" (cf. Arabic ajj-). Now, within the second old periphery, *agni/ogni stem is used as well, that is, at least in Italic (sic!), Baltic, Slavic, etc. So, if the same stem is used in both innovative center and considerable part of the periphery, it is likely to be the original common IE word for "fire", being stable enough to be unsubstitutingly preserved in Indic as well. But, "surprisingly", this *agni stem is unknown to the youngest (with respect to its going out from the South Asian Urheimat) Indo-European branch, that is, to Iranian. But Iranian has a special stem, *athar/atar, peculiar to both Indic and Iranian, though in Indic it was used only as a poetic name for the tongues of flame, cf. Vedic atharī. But only in Iranian the stem *athar/atar functions as a standard word for "fire" which severely separates Indic from Iranian. Such cases are in fact numerous (cf. Avestan staman- vs. Greek stoma, Avestan z[u]rvan- vs. Greek khronos, etc.), but they cannot be consistently explained through the Aryan Invasion hypothesis and the idea of previous "Indo-Iranian state", but they are very well explained through the more plausible view that Iranian is simplified Indic spread over the "substratum" territory of the previous waves of IndoEuropean in Iran and the Steppe, that is, over Balto-Slavic (with which it shares the presence of z and ž sounds never occurring in Indic) and over Greek, Armenian, and Phrygian. This also explains why such a classical centum Greek language has too much in common with Aryan (with which it is sometimes irrationally united into a separate "Graeco-Armeno-Aryan"); in the western direction, it was proto-Greek that received "Iranian" (i.e. Aryan, of pre-Mitanni times) adstratum, even preserving some special archaic features thereof (that is, adverbial function of -bhi which was still a suffix and not an inflection), this adstratum being to weak to convert proto-Greek into a satǝm language, but enough strong to convert Armenian into a satǝm language and to heavily influence Luwian in contrast to Hittite (it is due to this process that Greek and Armenian obtained some adstratum traces of Aryan secondary unvoiced aspirates, like in Armenian sxal ~ Indic skhalati, Greek konkhos ~ Indic śaṅkha, etc. the same with extremely rare Slavic examples, as Russian soxa vs. Indic śākhā; it is of particular interest as Balto-Slavic does not have traces of much older and more widespread "voiced aspirates" as aspirates as they are absent from Iranian as well, which feature is inversely obtained by Iranian from "post-Balto-Slavic tail substratum" in Western Central Asia and Northern Greater Iran). All this happened along the Western Route of IndoEuropean expanse. But in the Northern Route, that is in Central Asia – Kazakhstan, Baltic and Slavic languages were mostly deep-influenced by this satemization and, partially, the RUKI rule; thus, both these processes can be now approximately dated as having taken place ca.1900-1400 BC. But this means that Aryan loanwords in Uralic languages are also likely to be borrowed during the same period, while before this time neither proto-Balts, nor proto-Slavs had not crossed the Urals westwards and were still "Asian branches of Indo-European". This places their original speakers in the area of Southern Urals, Western Andronovo/Arkaim and the like. So, Yamnaya culture, of course, could have some early Indo-European speaking admixture, but IndoEuropean dialects were hardly prevailing there as its territory is too far from the Urheimat to give feedback (not only Iranian-Aryan influenced Balto-Slavic, but the ancestor of Balto-Slavic was a substratum for Steppe Iranian as opposed to both Indic and Western Iranian). Let us now get back from this short excursus to the problems of systematic phonology. §4. The myth of homogeneous group of satǝm dialects. "Satǝm IndoEuropean dialects" is too wide a grouping to represent some kind of parallel evolution. Its existence is much explained with the help of a model of radial spreading. But this time South Asia is already at center of innovation, becoming a new periphery. Let us dive into some details. As is known from comparative studies, the original form of the "first palatal" series was plosive. Moreover, it was likely not even affricate type but a pure plosive series. Its palatal or velar character is of subordinate import here. Now, from this point of view, all the so-called satǝm languages can be divided into the following groups: 1) the most archaic one, where they are all mostly preserved as plosives. There is only one such group, namely, Nuristani. 2) the older group, where at least part of them is not represented by sibilants or spirants (affricates are allowed here), i.e. Indic (ś, h, but j, a plosive-to-affricate, together with plosive ch for *sk', where Iranian has simple s, and archaic pause sandhi like dik, dṛk, and spṛk from stems already ending in -ś), Armenian (s, -z-, but ts (c) and dz (j) for *g' and *g'h); a transitional form [to the next] is represented by Old Persian, where there are special spirant θ and apparent plosive d, interpreted as a kind of δ by some scholars. 3) the younger group, where there is its own "archaic periphery" (Baltic, with its š and ž) and "innovative center" (Avestan, Eastern and Steppe Iranian, and Slavic, with plain s and z), being the youngest and thus being the actual center of "satemizing radiation". As this process is "clamped" between Armenia, Western South Asia and some place to the north of Iran where Balts originally lived (the fact that they moved later from east to west through modern Russia is attested in Central Russian hydronyms), it should have begun within this triangle between Armenia, the Southern Urals, and Afghanistan, and as Avestan is the oldest attested language within this group, with its own geography mentioning both Eastern Iran and North-Western Indostan (under the name of Hapta Hendu, presumably the name used by the people of Mature Harappan civilization to call their own state or group of states and cultural space), this initial center of satemization can be plausibly placed somewhere in Northern Afghanistan / Southern Uzbekistan. A relatively appropriate event to explain the beginning of this trend is the collapse of the BMAC civilization ca.1700 BC, with partial intrusion into the Indus valley. Thus, the process of satemization must have likely begun for the first time after the collapse of the Mature Harappan civilization, after ca.1900-1700 BC, which means that before that time most Indo-European dialects were still centum-like (i.e. with plosives for this consonantal series). This again means that before the beginning of that time the ancestors of Balts and Slavs were still mostly not in Europe (the Urals being taken for the Europe-Asia border); and when we speak about some earlier periods, like the Early Harappan Era (3300-2600 BC), there were no Balto-Slavic ancestors in Europe, only Western Route already touching Europe from its South-East, i.e. Southern Balkans, Greece, and Aegean-Ionic islands. To illustrate how much Indic and Iranian differ with respect to their "satemness", let us list some examples: Vedic ajina- ~ Avestan izaēna-. Vedic pracch- ~ Avestan pras-, Slavic pros-. Vedic kṣam- ~ Avestam zam-, M.Persian zamin, Russian zemlia. Vedic viś-, N.sg. viṭ (plosive sandhi) ~ Russian viesi. Vedic aja- ~ Lithuanian ožys. Vedic jar- ~ Russian zrie-ti, and so on. As for the "labiovelars" forming the "second palatal" series in most satǝm languages (Lithuanian and Armenian again appearing to be relatively more archaic here), it is also of special interest. We would present here a short table from Szemerenyi. IE Armenian Lithuanian Indic Avestan O.Slavonic *kw kh, ĉh k k, c k, č k, č, c k g g, j g, j, -ž- g, ž, (d)z g, ĵ g gh, h g, j, -ž- g, ž, (d)z *g w w *g h Special relation between Iranian (especially, Avestan and Steppe Iranian) and Slavic is again too striking to be just a case of happening. In case of Armenian and Indic, on the one hand, looking somewhat similar to Iranian and Slavic (as regards the presence of palatal affricates), they, on the other hand, preserve much more archaic common feature (being common with centum languages, that is three series of plosives), the two facts together looking controversial only for those who just abstractly postulate some Urheimat far from South Asia, it gets an explanation with the OIT being quite satisfactory just showing shift of the center of innovation, from South Asia to Iran and Central Asia. Now let us get back to the problem of plosives in general (the idea that "labiovelars" were originally simple uvulars, with no etymological, separate labial element, is discussed elsewhere). §5. Relative chronology of major Indo-European branching. Let us first consider a preliminary table generalizing the above data (this table not being absolutely conclusive as it rests mainly on the structure of system of plosives, still awaiting further refinement with the help of other data). This may be further used as a road map for detailed studies in special inter-branch relations. Western Route Zone contiguous with the original Urheimat but forming a special kind of "sanctuary" Northern Route Proto-Anatolian IE Stem* Proto-Tocharian Proto-Italo-Celtic, etc. IE Stem* Proto-Balto-(Slavic?) (Proto-Albanian?) Proto-Nuristani (Proto-Slavic?) Proto-Graeco-Armenian Proto-Indo-Aryan Proto-Germanic Western Iranian Proto-Dardic Eastern Iranian (including Thracian and Phrygian) (+ IE Stem*) (+ Classic Indo-Aryan) Now, before and during the first wave, the system of plosives included only pairs of weaks vs. strongs, i.e. k~kk, q~qq, t~tt, p~pp, indifferent towards voicedness and aspirations. In the periphery uvulars became mostly labiovelars thus obtaining additional quasi-articulation, while this was not the case in the Stem zone. By the end of this period it was voicedness/unvoicedness which appeared to be the first allophonic feature suffering some kind of phonematization. During the second wave, the former opposition accepted the form of k/g~kk/gg, k/kw/g/gw~kk/kkw/g/ggw, t/d~tt/dd, p/b~pp/bb (b* being the special case suffering confusion with v and, more seldom, m); this picture can be still partly from the graphic form of Hittite. It is during this period that aspiration began to become a perceptible allophonic feature, while first cases of losing s-mobile began to take place producing additional types of alteration. Namely, those simples which had no preceding s, became voiced simples, while those having it, became unvoiced (this is only a statistical rule, with numerous exceptions). It is during this period, that together with voicedness development, the future Indo-European "root structure" appeared, anticipating such future cases as Greek pythmēn, Indic budh-, yielding further the so-called "aspiration shifts" which inherited from the previous simplification of the preceding strong in two sequent syllables when the latter of them was not standing before a pause or s (cf. Greek thriks, trikhos, Indic -dhuk[s], -duh-, etc.). During this period, all instances of plosives already had their own voicedness or unvoicedness features, the process of selection (on the basis of former weaks vs. strongs) already finished, while the emergence of aspiration within the remaining Stem* continued. It is from here on that the process of reinterpretation of velars and uvulars began within the Stem area finally leading to the development of different palatalization and, much later, spirantization effects (those later-spreading dialects which "covered" zones of Proto-Tocharian and Proto-Anatolian waves, developed labiovelar-like phonemes as well, while those never reaching their borders did not do so, developing only "split palatals" instead). The third wave may be largely treated as a later sub-wave of the former so far, as minute details are yet not being discussed here. By the end of its period, in the Stem zone aspiration became phonological thus yielding three classes of stops, and with the spread of the next wave the previously peripheral dialects obtained some very rara cases of specific "ternary plosive adstratum" though without the secondary development of any third series, but with some oscillation of already selected voicedness/unvoicedness and with some secondary spirantization leading to the development of previously nonexisting phonemes (cf. Italic f, h, Slavic x, etc.; also, later z). The fourth wave demonstrates some interesting features. Whereas its far-western edge yields unvoiced aspirates in place of former strongs, both western Armenian and northern Germanic suffer some special effect of "consonantal shift", according to which mostly only former strongs retained their voicedness, while simple voiced lost their voicedness, simple unvoiced acquiring secondary unvoiced aspiration. All this leads to the supposition that in the previous period when aspiration only appeared as an allophonic feature it always yielded unvoiced aspirateness, but this developed in GermanoArmenian in former positionally-free (non-clustered) original weaks, while in Graeco-Aryan in former, original strongs. Greek and Phrygian free aspirates' unvoicedness might be explained by special shift (archaic voiced strongs directly to unvoiced aspirates) or via secondary loss of voicedness (the latter is the mainstream explanation although it leaves unexplained the fact that simple "deaspirated" phonemes are voiceless, too). In addition, Indic shows some traces of previous unvoicedness of its aspirates, in some rare oscillations. Anyway, by the end of this fourth wave Indic zone gradually became periphery, the center having moved to Iran and southern Central Asia (we've discussed it in brief above, in context of the process of satemization or, to be more precise, "post-palatalization spirantization"). By the end of this period, there were areas to the west (Graeco-Armeno-ThracoPhrygian, partly overcovering the previously Italo-Celtic zone, whence adstratum-like third series in these groups having developed), to the east! (Indo-Aryan), and to the north and later north-west (Germanic), where fully developed three-series systems were present. In the area of new center, i.e. Greater Iran and most part of southern Central Asia, there remain zone of still two series ("second wave's legacy"), until the last exodus from the Indus valley took place after the end of the Mature Harappan period, when newly developed fourth series spread over the new center zone, thus yielding secondary unvoiced aspirates and fricatives only in Iranian and Dardic languages (the traces of this exodus are on the north-western route, and this process was later partly redirected backwards, yielding the late loss of voiced aspiration in New Indo-Aryan Punjabi language, being in its core purely IndoAryan). The fifth wave may be called "equestrian explanse", as it is during this Late-to-Post-Harappan period that proto-Iranians covered an extremely large territory to the north and to the west during the relatively short period of historic time. It is since this time that the word with IE stem *ek'wo began to mean only horse, but not any transportation/onerary/race animal (including bulls [the most archaic version, where *ekwo was a masculine counterpart to *gwōu, "a cow", only later assuming epicene-to-masculine gender locally], asses, and various types of "not so perfect" equids), as it was before in various Indo-European dialects (and as the Rigveda, being prior to this period, frequently shows by its contexts where the word aśva is used). This automatically implies that all the previous waves were very slow, using mainly oxen and imperfect equids as packing animals. And this also implies that until the Iranian expanse into the Steppe, all other previously separated Northern Route dialect-speakers lived much closer to South Asia (not so in case of Western Route, as there existed an intensive trade route between modern Pakistan and Western Anatolia from prehistoric times; I prefer to call this Western Route the Ancient [Indo-European spreading] Route of Silver and Lapis Lazuli, the former component being explained with the help of rajata-isogloss along this path, but not in the north, cf. Indic rajata, Avestan ǝrǝzata, Armenian artsath, Latin argentum, Celtic argat, Anatolian and Greek being not that susceptible and thus having only argyros for "silver" from the same root, while Hittite ḫarki- for just "white"). §6. Preliminary considerations concerning absolute chronology of the five waves. It is only the last wave that can be dated with a certain degree of precision, i.e. between c.1900 and c.1200 BC. The dates for the previous waves are mostly speculative. Nevertheless, we would like to share some considerations. As during Mature Harappa no large expansion was likely to take place, it is possible that the population of the BMAC (c.2300-1700 BC) spoke some kind of either Proto-Graeco-Armeno-Phrygian or, less probably, Proto-Germanic language or group of dialects. If so, the process of Out-ofIndia separation of the three groups of dialects should have taken places c.2800-2600 BC. This is what concerns the fourth wave. The second and third waves might have taken place somewhere between 3300 and 2800 BC, as their placing earlier faces an insoluble problem of too close relation between the Indo-European dialects: be their split much earlier, they would not look and sound, as well as structurally resemble each other as they actually do. Here, even rough glottochronology furnishes us with generally correct warnings: before the beginning of the Equestrian Era (see above) and, probably, the Iron Age, the path was too long, the number of non-IndoEuropean tribes to meet with too high, so IE-dialects would otherwise diverge too much to resemble each other so much as they are. Armenian and Albanian represent a special case of highly creolized languages, but not so with most other IE-branches. Residually, the first wave may be preliminarily place before c.3300 BC, but hardly earlier than c.3700 BC, so all those waves of exodus perfectly fit with the gradual and pulsating expansion of the ancient civilized areas between Mehrgarh, Bhirrana, and PamiroHindukush. All this needs long and deep further investigation. I hope that it would finally begin not only among archaeologists and single linguists, but among the most part of academic linguistic community, which is invited to finally leave aside fruitless speculations and dreams of the originally European Urheimat (while the Anatolian hypothesis is still partly valid, both as a secondary Urheimat for an important grouping of later Indo-European branches and for its possible being the ultimate Urheimat of the Pre-ProtoIndo-Europeans, when common Proto-Indo-European was no separate family being much more related to the stems of other modern families such as Dravidian, Turkic, and, at a deeper level of linguistic prehistory, Burushaski, Uralic, Semitic, Tibeto-Burman, and North-Caucasian).