Read-me file Data associated to the publication entitled “Influence of ethnolinguistic diversity on the sorghum genetic patterns in subsistence farming systems in eastern Kenya”, by: Vanesse Labeyrie, Monique Deu, Adeline Barnaud, Caroline Calatayud, Marylène Buiron, Peterson Wambugu, Stéphanie Manel, Jean-Christophe Glaszmann, Christian Leclerc. Accepted for publication by PLOS ONE. Corresponding author: Vanesse Labeyrie, vanesse.labeyrie@gmail.com DATA FILES DESCRIPTION: Genotypic data (SSRs) Alleles’ size without the M13 tail. Genotyping was done using a 24-capillary 3500xL System (Applied Biosystems) and GeneMapper v 4.1 (Applied Biosystems) was used for genotype scoring. Missing data are coded “999”. Morphological data Morphological characteristics of the panicles collected on the plants which were genotyped. Eight morphological descriptors were selected from the IPGRI descriptors and were completed by seven additional descriptors for seeds and glumes characteristics that showed variability on the sorghum collected in our study area. Missing data are coded “NA”. Fields’ names Morphological descriptors No. of modalities Modalities Panicle shape 8 Broom; Very loose; Loose; Semiloose; Semi-compact long; Semicompact ; Compact elliptic; Very compact Grain shattering 2 Mid; High Seed color 5 White; Cream; Grey; Brown; Red Seed shape (side view) 2 Asymetric; Non-asymetric Panicle_Shpe Grain_shattering Seed_color Seed_shpe_side Pericarp thickness 2 Thin; Thick Subcoat 2 Present; Absent Endosperm texture 2 Mainly vitreous, Mainly floury Glume adherence 3 High; Mid; Low Glume opening 4 Half-open; Highly open; Mid; Tight Glume covering 2 Full; Mid Awn 2 Present; Absent Glume transversal wrinkle 2 Present; Absent Glume texture 2 Hard; Papery Glume color 3 Black; Red; Tan Glume hairiness 2 Mid; High pericarp_thick Subcoat Endo_texture Glume_adherence Glume_opening Glume_covering Awn Glume_wrinkle Glume_text Glume_color Glume_hair Descriptive data (description) Other data associated to the sorghum individuals genotyped and phenotyped. Fields’ names A_ACNUM Accession number date_collection Month of on-farm collection vty_collected Variety name (as reported by farmers) houehold_ethny Ethnolinguistic group of the farm’s house head Cycle_length Growth cycle length of the variety (according to the farmer in charge of seed selection in the farm visited. S: short cycle, R: long cycle) Origin Improvement status of the variety (I : improved – formal breeding system, L: Local – landrace) Structure_K4_q_0.8 Assignment according to STRUCTURE software for K = 4. individuals whose estimated proportion of genome originating from one population (q, hereafter admixture coefficient) was below a 0.8 threshold were considered as resulting from admixture between the populations (unassigned). Individuals whose q value was equal to or above 0.8 for a population were assigned to that population (hereafter cluster).