First names in the Netherlands from preferences of parents to socio-geographic representations Gerrit Bloothooft Institute of Linguistics OTS Utrecht University Dutch studies on first names Limited scientific work – Dictionary (20.000 entries) – Few socio-linguistic studies • Limited scope, small samples Topic is extremely popular in the media Linguistics Groningen - 2005 2 First names data Hard to get from civil registration – privacy issues New horizons because of digitization of the population administration (and archives) – but distributed storage Linguistics Groningen - 2005 3 A full population study First names from the National Social Security Bank (SVB) All children born since 1983 – first name (official, no call name, but..) – year of birth – family code (separate table) unique! – postal code Linguistics Groningen - 2005 4 A very rich source 4.2 million children (1983-2002) – 200.000 per year 1.9 million families 176.800 different first names – 108.500 unique names – 3.120 names with frequency > 100 represent 85% of the children Linguistics Groningen - 2005 5 Datareduction needed Far too many names to describe one by one Names with common properties – Not from etymological point of view – Not from linguistic point of view – Based on choices of parents Linguistics Groningen - 2005 name use! 6 Naming and subcultures Hypothesis: There are subcultures with own naming preferences These subcultures may relate to – culture/language (Frisian, Arabic, Turkish, Surinam, Antillean,..) – religion (Catholic, Protestant, Islam,..) – sociological status (education, income,..) – geography (urban, rural, regional,..) Linguistics Groningen - 2005 7 Naming and subcultures Research aims: Identification of subcultures (and their naming preferences) on the basis of the first names of children per family Study of the relation between these subcultures (first names) and sociocultural and geographic factors Linguistics Groningen - 2005 8 Once again Analysis (grouping) of first names on the basis of the choices of the parents, i.e. name use NOT on any other scientific assumption Linguistics Groningen - 2005 9 Contents Method Sets of first names A map of name sets Geographic distribution of name sets Regional name profiles Socio-cultural factors of name sets Conclusions Linguistics Groningen - 2005 10 Method (a chain of names) Parents choose first names from a set that is popular in their subculture (relatives, friends, neighbors,..) (with higher probability) This is informative only if there is more than one child (more than one name) in a family Pairs of first names (from a family) as unit for analysis Linguistics Groningen - 2005 11 Method (a chain of names) Family: Mark, Peter, Linda If Mark is popular in a subculture, then Peter and Linda may be popular as well Name pairs: Mark - Peter, Peter - Mark, Mark - Linda, Linda - Mark, Peter - Linda, Linda - Peter Linguistics Groningen - 2005 12 Method (a chain of names) Select all families with two or more children (1.17 million families, 2.81 million children) Derive all pairs of first names (from a single family) (in all, 2.12 million different pairs) Compute the frequency of each pair The higher the frequency of a pair, the more likely the first names in the pair belong to the same set Linguistics Groningen - 2005 13 Most frequent name pairs Frequency Pair of first names 1091 790 Johannes Johannes Maria Johanna 754 727 …. 572 459 Jeroen Johanna Martijn Maria Mohamed Lars Fatima Niels Linguistics Groningen - 2005 14 Clustering of first names Define measure that reflects relationship between two names Combine names that mutually have a strong relationship into a set – Johannes, Maria, Johanna, … Linguistics Groningen - 2005 15 Name relationship measure Esther – 7.967 girls – 12.973 brothers and sisters – 276 times sister Judith (= 2.1 %) Judith – 4.828 girls – 8.033 brothers and sisters – 276 times sister Esther (= 3.4 %) Geometric average (2.7 %) – A symmetric measure of relationship between the two names Linguistics Groningen - 2005 16 Alternative measure In terms of probablities Prob(name_pair) / indepentProb(name_pair) indepentProb = if there is no specific preference Series of problems – High-frequent name pairs should get a stronger weight (estimation inaccuracies for low-frequent pairs) Linguistics Groningen - 2005 17 Clustering of first names Name pairs from a (subculture-related) set have the highest relation measure Esther: Judith: Judith 2.7 Esther 2.7 Mirjam 2.4 Mirjam 1.6 Ruben 1.2 Ruben 1.0 David 1.1 Miriam 0.8 Linguistics Groningen - 2005 18 Clustering Start with strongly related name-pairs Add new name-pair to existing cluster or start a new cluster Iterative procedure Linguistics Groningen - 2005 19 Clustering results 4.013 first names – Frequency of a pair > 4 result: 340 name sets – Limited number of large sets – High number of small sets top-25 of sets is most illustrative – 2.887 first names – 2.64 million children (75%) Linguistics Groningen - 2005 20 Features of name sets Period of maximum popularity – Traditional, Pre-modern (1950-1980), Modern Language – Dutch, Frisian, English, American, French, Spanish, Italian, [Arabic, Turkish] – Common Western Topic area – Nature, History & Culture, Old Testament Length – Short (one syllable), long Linguistics Groningen - 2005 21 A map of name sets Presentation of a map of name sets – Based on mutual relations between name sets The closer two name sets on the map, the more related the sets Linguistics Groningen - 2005 22 Spanish & Italian Long American & English Short American & English Pre-modern English & French Long names from the Old Testament Names from nature Long names from history and culture Short modern Common Western Pre-modern Common Western Long French Scandinavian Pre-modern Dutch Short modern Dutch Traditional Dutch Latin | Dutch Short traditional Dutch Linguistics Groningen - 2005 Frisian 23 Dimensions Foreign Long Common Western Short Modern Pre-modern Traditional Dutch, Frisian Linguistics Groningen - 2005 24 Spanish & Italian RICARDO Long American & English MICHAEL Short American & English Pre-modern English & French DENNIS Names from the Old Testament DANIËL KIM Names from nature IRIS Names from history and culture LAURENS Short modern TIM Common Western Pre-modern MARK Common Western French Scandinavian NIELS CHARLOTTE Pre-modern Dutch JEROEN Traditional Dutch JOHANNES | JAN Short modern Dutch BART Short traditional Dutch TEUN Linguistics Groningen - 2005 Frisian JELLE 25 Geographical distribution Postal code area level [3584] – Big differences between pc areas • city neighborhoods • villages (religion) – Enough children for characterisation • ~1200 births per pc in 20 years • Some further name grouping needed Linguistics Groningen - 2005 26 Further grouping Traditional names (Latin form) Traditional names (Dutch) Frisian names Pre-modern names (Dutch, Western) Foreign names (English) Short modern names (Dutch, Western, Skand) Names from OT, history, culture, nature Arabic & Turkish names [unrelated group] Other [low frequent] Linguistics Groningen - 2005 % 8 5 3 12 24 13 7 5 23 27 Spanish & Italian Long American & English Short American & English Foreign Pre-modern English & French Names from the Names from nature Old Testament History & Culture Names from history and culture French Pre-modern Western Scandinavian Pre-Modern Pre-modern Dutch Traditional Traditional Latin Dutch Dutch Short modern Western Short Short modern Dutch Short traditional Dutch Linguistics Groningen - 2005 Frisian 28 Traditional (Dutch) Aaltje Barend Dirkje Evert Geertje Harm Jantje Klaas Margje Teunis Linguistics Groningen - 2005 29 Traditional (Latin form) Adriana Bernardus Christina Eduard Elisabeth Franciscus Geertruida Hubertus Johanna Krijn Maria Linguistics Groningen - 2005 30 Frisian names Aafke Bauke Douwe Froukje Joppe Jitske Jelle Menno Sietske Onno Wietske Wiebe Linguistics Groningen - 2005 31 Pre-modern names (Dutch, Western) Anniek Anita Carla Frank Jochem Jeroen Linda Mark Marloes Paul Suzanne Linguistics Groningen - 2005 32 Foreign names (English) Amanda Dennis Danny Chantal Henry Isabella Kim Kevin Melissa Ricardo Samantha Stephen Linguistics Groningen - 2005 33 Short names (modern, Dutch, Western, Skand) Anne Bart Eva Gijs Lisa Kaj Niels Sanne Sofie Tim Linguistics Groningen - 2005 34 Religion Short names - Religion None Catholic Linguistics Groningen - 2005 35 Old testament history, culture, nature Daniël Esther Judith Naomi Willemijn Diederik Frederieke Maurits Iris Fleur Jasmijn Linguistics Groningen - 2005 36 Income Religion Lowest Highest Linguistics Groningen - 2005 37 Arabic and Turkish names Fatima Mohamed Noura Hamza Sara Yassin Fatma Mustafa Hatice Mehmet Linguistics Groningen - 2005 38 Further geographical analysis Per pc area: percentage of children per name group (8 values) These percentages reflect social composition of the pc area Factor analysis on data from 3584 pc areas 10 typical profiles Linguistics Groningen - 2005 39 10 profiles Traditional – Latin form Traditional – Dutch Transitional, Traditional Dutch to pre-modern Transitional, Traditional Latin form to foreign Pre-modern Foreign Short Elite Arabic-Turkish Frisian Linguistics Groningen - 2005 40 Example profile Traditional – Latin form Traditional – Latin form Traditional – Dutch Frisian names Pre-modern names Foreign names Short names Names from OT, history, culture, nature Arabic and Turkish names other Linguistics Groningen - 2005 % 37 18 1 8 12 6 6 0 12 41 Naming map of the Netherlands Frisian premodern elite foreign Arab Turkish trad. Dutch trad. Latin short Linguistics Groningen - 2005 foreign 42 EU constitution votes Education level Linguistics Groningen - 2005 43 Educational Education level level Highest Linguistics Groningen - 2005 Lowest 44 Naming map of Groningen province trad. Dutch > pre-modern foreign elite Linguistics Groningen - 2005 premodern 45 Naming map of Groningen city (typical city pattern) % households with income in highest 20% class Linguistics Groningen - 2005 46 Linguistics Groningen - 2005 47 Typical Groningen names Oldambt: Boelo, Doeko, Adzo, Elzo, Popko, Rienko, Wubbo | Grieto, Trienko Frisian: Alke, Bouktje, Rikste, Eisse, Wiert Peat-colonies: Hinderika, Harmannes, Geessien, Hillechien regional names are becoming rare Linguistics Groningen - 2005 48 Conclusions Successful representation of Linguistics Groningen - 2005 49 Further studies Changes in naming – Missing data 1940-1982; towards full population data – Current study 1983-2002; towards 5 year period analysis • Who starts name renewal, how does it spread Names – Call names & official names (using consumer questionnaires) – Spelling choices Social factors in naming – Role of naming after relatives (in first, second, third name) – Gender dependencies – Income, education, religion Mathematics of naming (chaos theory) Name pronunciation (for speech synthesis) Linguistics Groningen - 2005 50 Contact E-mail: Gerrit.Bloothooft@let.uu.nl Homepage: www.let.uu.nl/~Gerrit.Bloothooft/personal Mail: Trans 10, 3512 JK Utrecht, The Netherlands Linguistics Groningen - 2005 51