Supplementary material Supplementary Tables ................................................................................................................................... 2 Supplementary Figures ................................................................................................................................. 7 Appendix S1 – Details on vegetation plots: the DIVGRASS project ............................................ 10 Appendix S2 – Details on phylogeny reconstruction ....................................................................... 13 Appendix S3 – Details on trait preparation ......................................................................................... 15 Appendix S4 – Distance Metrics based on Co-occurrence to study invasions ....................... 19 Appendix S5 – Influence of evolutionary history on invasion success ..................................... 23 Appendix S6 – Cross Validation of model for Invasion Success .................................................. 24 1 Supplementary Tables Table S1. Introduction pathways of alien species based on the DAISIE hierarchical classification system. Reproduced from Lambdon et al. (2008). Pathway Level1 Description Level2 Level3 Introductions have been introduced deliberately by humans, for commercial or recreational reasons Species have been released deliberately into the wild (e.g., for the enrichment of the native flora, landscaping, etc.). Intentional Released Escaped Species have escaped into the wild from cultivation Forestry Amenity Ornamental Species are cultivated for timber on a large-scale, or as part of re-/aforestation programmes Species are cultivated on a large to moderate scale in public places for landscaping purposes (e.g., for soil stabilization or aesthetic enhancement). Species are cultivated for ornament on a small scale (especially in private gardens). Agricultural Species are cultivated on a field scale as commercial non-timber crops. Horticultural Species are cultivated for edible or other useful products on a small-scale (e.g., in private gardens). Introductions have arrived as a result of human actions but have not been introduced deliberately. Unintentional Unaided Transported Seed Mineral Commodity Stowaway Species have spread via natural (spontaneous) means from introduced populations elsewhere in non-native range. Species have been introduced accidentally via shipping, air, road or rail freight, directly by humans or with domestic animals. Have been introduced as a contaminant of crop seed or propagules. Have been introduced during the deliberate movement of soil or other minerals. Contaminants have been introduced as contaminants of non-seed crop commodities (e.g., wool, organic refuse). Have been introduced accidentally but are not known to be associated with any particular commodity, e.g., on car tyres or in the hulls of ships. 2 Table S2. The alien species used in the study, with their family, biogeographic origin, invasion success index and Rabinowitz commonness class. Anecophytes are species, which have been created from their wild ancestors by plant breeding, and subsequently have become alien. Therefore they have no native range in strict sense. Origin Freq Local Env Invasion Rabinowitz (Nr abund class volume success plots) (% cover) Alien species Family Achillea crithmifolia Asteraceae Eurasia 1 0.5 0.0000 -2 117 H Aegilops cylindrica Poaceae Eurasia, Africa 1 15 0.0000 -1 533 G Agave americana Asparagaceae N America 1 0.5 0.0000 -2 117 H Agrostemma githago Caryophyllaceae Eurasia, Africa 6 2.92 0.0323 0.92 C Allium cepa Amaryllidaceae Asia, S America 5 9.9 0.0002 -1 017 G Allium porrum Amaryllidaceae Anecophyte 6 0.92 0.0047 -0.279 H Allium sativum Amaryllidaceae Asia 3 1.33 0.1814 0.855 D Amaranthus deflexus Amaranthaceae S America 29 7.71 0.0393 1 813 A Amaranthus hybridus Amaranthaceae Americas 4 4.75 0.0245 0.452 C Amaranthus retroflexus Amaranthaceae N America 24 1.94 0.0761 2 135 B Ambrosia artemisiifolia Asteraceae Americas 36 3.5 0.0060 1 087 E Anethum graveolens Apiaceae Asia temp 1 0.5 0.0000 -2 117 H Artemisia annua Asteraceae Eurasia 24 2.54 0.0016 0.276 E Arundo donax Poaceae Asia, S America 23 9.04 0.0038 0.517 E Asclepias syriaca Apocynaceae N America 3 2.17 0.0004 -0.907 H Avena sativa Poaceae Eurasia 180 7.27 0.0353 3 114 A Avena strigosa Poaceae Europe 2 7.75 0.0197 0.889 C Barbarea intermedia Brassicaceae Eurasia 25 1.5 0.1929 2 402 B Barbarea stricta Brassicaceae Eurasia 1 0.5 0.0000 -2 117 H Berteroa incana Brassicaceae N America 124 4.7 0.0040 2 194 E Bidens frondosa Asteraceae N America 5 7.3 0.0001 0.601 G Brassica napus Brassicaceae Anecophyte 4 0.5 0.0035 0.251 H Brassica rapa Brassicaceae Anecophyte 5 2 0.0034 0.451 H Bromus catharticus Poaceae 9 11.94 0.2385 2 382 C Bromus inermis Poaceae 21 6.43 0.0316 1 457 A Bunias orientalis Brassicaceae S America Eurasia, Americas Eurasia 21 5.64 0.0214 0.995 A Calendula officinalis Asteraceae Europe 3 0.5 0.0001 -1 209 H Camelina sativa Brassicaceae Eurasia 2 0.5 0.0001 -1 495 H Claytonia perfoliata Montiaceae Americas 5 3.9 0.0000 -1 282 G Collomia grandiflora Polemoniaceae N America 16 5.34 0.0022 0.993 E Conringia orientalis Brassicaceae Eurasia, Africa 6 0.92 0.0519 0.546 D Cortaderia selloana Poaceae S America 11 0.95 0.0050 1 333 H Cotula coronopifolia Asteraceae Africa 124 22.46 0.0373 2 928 A 3 Crepis bursifolia Asteraceae Americas 34 2.24 0.0037 0.427 F Cuscuta campestris Convolvulaceae Americas 3 1.33 0.3516 1 230 D Cuscuta suaveolens Convolvulaceae Americas 2 0.5 0.0003 -1 517 H Cymbalaria muralis Plantaginaceae Europe 16 1.44 0.0827 2 105 B Cyperus eragrostis Cyperaceae Americas 6 3.75 0.0143 0.962 C Cyperus reflexus Cyperaceae Americas 1 62.5 0.0000 -1 288 G Datura stramonium Solanaceae N America 6 0.92 0.0089 0.874 D Dianthus caryophyllus Caryophyllaceae Europe 2646 1.57 0.0773 3 600 B Dipsacus sativus Caprifoliaceae Anecophyte 1 3 0.0000 -1 809 G Duchesnea indica Rosaceae Asia 1 3 0.0000 -1 809 G Epilobium ciliatum Onagraceae Asia, Americas 7 0.5 0.0050 0.234 H Eragrostis pectinacea Poaceae Americas 2 0.5 0.0001 -1 322 H Erigeron annuus Asteraceae N America 249 3.16 0.0178 2 446 A Erigeron karvinskianus Asteraceae Americas 1 3 0.0000 -1 809 G Erysimum cheiri Brassicaceae Anecophyte 43 0.91 0.1532 2 477 B Eschscholzia californica Papaveraceae N America 8 7.25 0.0128 1 230 C Euphorbia lathyris Euphorbiaceae Eurasia 2 1.75 0.0003 -1 076 H Galinsoga parviflora Asteraceae Americas 1 0.5 0.0000 -2 117 H Galinsoga quadriradiata Asteraceae N America 3 0.5 0.0575 0.318 D Glycyrrhiza glabra Fabaceae Eurasia, Africa 1 0.5 0.0000 -2 117 H Helianthus tuberosus Asteraceae N America 3 0.5 0.0072 0.002 H Heliotropium curassavicum Boraginaceae Australasia, Americas 2 0.5 0.0074 -1 002 H Hemerocallis fulva Xanthorrhoeaceae Asia 1 0.5 0.0000 -2 117 H Hordeum bulbosum Poaceae Eurasia, Africa 1 0.5 0.0000 -2 117 H Hordeum distichon Poaceae Asia temp 1 0.5 0.0000 -2 117 H Hordeum vulgare Poaceae Eurasia, Africa 7 6.5 0.0131 1 568 C Hypericum hircinum Hypericaceae Eurasia, Africa 3 0.5 0.0465 -1 139 D Impatiens glandulifera Balsaminaceae Asia trop 2 50 0.0035 0.135 G Iris germanica Iridaceae Eurasia 11 10.55 0.0015 1 202 G Juncus tenuis Juncaceae Americas 62 8.13 0.0270 2 380 A Lathyrus odoratus Fabaceae Europe 1 0.5 0.0000 -2 117 H Lathyrus sativus Fabaceae Anecophyte 4 0.5 0.0065 -0.108 H Lens culinaris Fabaceae Eurasia 1 0.5 0.0000 -2 117 H Linum austriacum Linaceae Eurasia, Africa 36 4.1 0.0148 1 601 A Linum usitatissimum Lycopersicon esculentum Matricaria discoidea Linaceae Anecophyte 560 3.33 0.0552 3 501 A Solanaceae Anecophyte 6 0.5 0.0025 -0.63 H Asteraceae Asia temp 105 4.69 0.0424 2 875 A Medicago intertexta Fabaceae Europe, Africa 1 3 0.0000 -1 809 G Medicago sativa Fabaceae Eurasia, Africa 1293 3.14 0.0332 3 655 A Mentha spicata Lamiaceae Eurasia 8 7.56 0.0081 0.941 C 4 Mesembryanthemum crystallinum Aizoaceae Eurasia, Africa 25 37.26 0.1571 0.361 A Nothoscordum borbonicum Amaryllidaceae S America 1 3 0.0000 -1 809 G Oenothera biennis Onagraceae N America 75 2.53 0.0500 2 444 A Oenothera glazioviana Onagraceae Anecophyte 8 1.13 0.0029 0.458 H Oenothera laciniata Onagraceae N America 1 0.5 0.0000 -2 117 H Oenothera parviflora Onagraceae N America 1 0.5 0.0000 -2 117 H Onobrychis viciifolia Fabaceae Europe 2364 6.29 0.0821 3 720 A Opuntia ficus-indica Cactaceae S America 4 0.5 0.0614 0.095 D Opuntia monacantha Cactaceae S America 1 0.5 0.0000 -2 117 H Opuntia stricta Cactaceae Americas 1 0.5 0.0000 -2 117 H Oxalis articulata Oxalidaceae S America 7 2.93 0.0113 1 360 C Oxalis pes-caprae Oxalidaceae Africa 7 23.5 0.0031 0.609 G Panicum capillare Poaceae Americas 1 3 0.0000 -1 809 G Panicum miliaceum Poaceae Asia 1 0.5 0.0000 -2 117 H Papaver somniferum Papaveraceae Europe, Africa 9 1.06 0.0505 0.462 D Paspalum dilatatum Poaceae S America 19 3.32 0.0254 1 340 A Paspalum distichum Poaceae Americas 11 18.91 0.0454 1 772 C Pennisetum villosum Poaceae Africa, Asia 8 1.44 0.0014 -0.274 H Petroselinum crispum Apiaceae Europe 2 0.5 0.0007 -1 657 H Phalaris canariensis Poaceae Europe, Africa 3 2.17 0.0497 0.349 D Phytolacca americana Phytolaccaceae N America 7 1.57 0.0292 1 169 D Portulaca oleracea Portulacaceae Eurasia 22 7.36 0.1025 2 380 A Potentilla intermedia Rosaceae Eurasia 2 1.75 0.0006 -1 526 H Rubia tinctorum Rubiaceae Eurasia 1 3 0.0000 -1 809 G Rumex patientia Polygonaceae Eurasia 31 2.81 0.0461 1 385 A Ruta graveolens Rutaceae Europe 6 0.5 0.0247 1 050 D Salvia nemorosa Lamiaceae Eurasia 1 15 0.0000 -1 533 G Satureja hortensis Lamiaceae Eurasia 2 0.5 0.0296 -1 033 D Secale cereale Poaceae Eurasia, Africa 4 1.75 0.0572 0.044 D Senecio squalidus Compositae Europe, Africa 1 3 0.0000 -1 809 G Setaria parviflora Poaceae Americas 1 0.5 0.0000 -2 117 H Silene dichotoma Caryophyllaceae Eurasia 1 0.5 0.0000 -2 117 H Sisymbrium altissimum Brassicaceae Eurasia 1 15 0.0000 -1 533 G Sisyrinchium montanum Iridaceae N America 2 26.25 0.0001 -0.312 G Solidago canadensis Asteraceae N America 29 7.79 0.0048 1 506 E Solidago gigantea Asteraceae N America 66 3.39 0.0068 1 480 E Solidago graminifolia Asteraceae N America 1 0.5 0.0000 -2 117 H Sorghum bicolor Poaceae Africa 1 0.5 0.0000 -2 117 H Sorghum halepense Poaceae Africa, Asia 6 1.75 0.0096 0.059 D 5 Sporobolus indicus Poaceae Africa, Asia, Americas 2 9 0.0061 -0.638 G Sporobolus vaginiflorus Poaceae N America 1 62.5 0.0000 -1 288 G Symphytum asperum Boraginaceae Asia temp 1 0.5 0.0000 -2 117 H Symphytum orientale Boraginaceae Eurasia 9 6.5 0.0000 -0.977 G Triticum aestivum Poaceae Asia temp 8 0.81 0.0485 1 451 D Tropaeolum majus Tropaeolaceae S America 2 0.5 0.0014 -0.498 H Veronica filiformis Plantaginaceae Eurasia 5 7.9 0.0856 0.93 C Veronica peregrina Plantaginaceae Americas 3 1.33 0.0857 0.427 D Veronica persica Plantaginaceae Eurasia, Africa 252 2.32 0.0609 3 170 A Vicia ervilia Fabaceae Eurasia 2 0.5 0.0637 -0.282 D Vicia faba Fabaceae Anecophyte 1 0.5 0.0000 -2 117 H Vicia tenuifolia Fabaceae Eurasia, Africa 594 4.55 0.0362 3 072 A Xanthium spinosum Asteraceae S America 14 2.93 0.0029 0.661 E Xanthium strumarium Asteraceae Americas 2 1.75 0.0444 0.331 D Zea mays Poaceae Americas 1 87.5 0.0000 -1 231 G 6 Supplementary Figures d=2 Eigenvalues MEDICAGO ARBOREA LONICERA JAPONICA CUSCUTA CAMPESTRIS env_volume ERYSIMUM CHEIRI ALLIUM SATIVUM BARBAREA INTERMEDIA LEPIDIUM DIDYMUMCARPOBROTUS ACINACIFORMIS GALINSOGA QUADRIRADIATA VICIA ERVILIA OPUNTIA FICUS-INDICA PYRUS COMMUNIS RUTA GRAVEOLENS HYPERICUM HIRCINUM TRITICUM AESTIVUM PRUNUS DULCIS SATUREJA HORTENSIS HELIANTHUS TUBEROSUS PRUNUS SEROTINA BRASSICA NAPUS CYMBALARIA MURALIS LATHYRUS SATIVUS VERONICA PEREGRINA EPILOBIUM CILIATUM TROPAEOLUM MAJUS CONRINGIA ORIENTALIS ACACIA DEALBATA HELIOTROPIUM CURASSAVICUM LYCOPERSICON ESCULENTUM MAHONIA AQUIFOLIUM CENTAUREA PULLATA ERAGROSTIS PECTINACEA CALENDULA OFFICINALIS CAMELINA SATIVA CUSCUTA SUAVEOLENS PAPAVER SOMNIFERUM SENECIO ANGULATUS PETROSELINUM CRISPUM NICOTIANA GLAUCA OENOTHERA GLEDITSIA SOLIDAGO GALINSOGA COTONEASTER ACHILLEA ANETHUM ERODIUM OPUNTIA OENOTHERA SYMPHYTUM GLYCYRRHIZA HEMEROCALLIS PITTOSPORUM LATHYRUS HORDEUM SETARIA ATRIPLEX KUNDMANNIA PANICUM BARBAREA SORGHUM CREPIS RICINUS SILENE CYDONIA AGAVE OPUNTIA LILIUM LENS VICIA CULINARIS MONACANTHA DICHOTOMA AMERICANA CRITHMIFOLIA MICRANTHA AETHIOPICUM CANDIDUM PARVIFLORA GRAVEOLENS TRIACANTHOS GRAMINIFOLIA COMMUNIS MILIACEUM SAGITTATA PARVIFLORA BULBOSUM OBLONGA DISTICHON ODORATUS PARVIFLORA FABA STRICTA BICOLOR STRICTA LACINIATA ASPERUM GLABRA SIMONSII SICULA TOBIRA FULVA HELIANTHEMUM SYRIACUM BIDENS CONNATA BROMUS CATHARTICUS XANTHIUM ORIENTALE TANACETUM ANREDERA PRUNUS CINERARIIFOLIUM ARMENIACA CORDIFOLIA AMARANTHUS RETROFLEXUS DIANTHUS CARYOPHYLLUS DATURA STRAMONIUM ACER NEGUNDO STRICTUM ERYSIMUM CORTADERIA SELLOANA FICUS CARICA XANTHIUM SECALE STRUMARIUM CEREALE ALLIUM PORRUM PHYTOLACCA AMERICANA BUDDLEJA DAVIDII ARTEMISIA VERLOTIORUM Y_range JUGLANS REGIA VERONICA PERSICA GLAZIOVIANA PHALARIS CANARIENSIS X_range SENECIOOENOTHERA INAEQUIDENS OENOTHERA BIENNIS EUPHORBIA MACULATA OENOTHERA LONGIFLORA ALCEA BIENNIS SYRINGA VULGARIS PENNISETUM VILLOSUM ELAEAGNUS ANGUSTIFOLIA SORGHUM HALEPENSE LINUM USITATISSIMUM RUMEX PATIENTIA CAMELINA MICROCARPA PRUNUS CERASUS AGROSTEMMA GITHAGO BRASSICA RAPA PORTULACA OLERACEA PRUNUS EUPHORBIA CERASIFERA LATHYRIS COCHLEARIA GLASTIFOLIA PRUNUS DOMESTICA OXALIS ARTICULATA MEDICAGO SATIVA POTENTILLA INTERMEDIA PLANTAGO LANCEOLATA ULEXERIGERON EUROPAEUS PASPALUM DILATATUM Nr_releves SETARIA VERTICILLATA ASCLEPIAS SYRIACA MATRICARIA DISCOIDEA CREPIS BURSIFOLIA ROBINIA PSEUDOACACIA ANNUUS GOMPHOCARPUS FRUTICOSUS LOTUS DREPANOCARPUS ONOBRYCHIS VICIIFOLIA VERONICA FILIFORMIS MENTHA X NILIACA CYPERUS ERAGROSTIS ARTEMISIA ANNUA XANTHIUM SPINOSUM VICIA TENUIFOLIA SOLIDAGO GIGANTEA AMARANTHUS HYBRIDUS LINUM AUSTRIACUM AMBROSIA ARTEMISIIFOLIA SENECIO SQUALIDUS NOTHOSCORDUM AESCULUS PARTHENOCISSUS ERIGERON MEDICAGO CENTAUREA OENOTHERA PANICUM DUCHESNEA DIPSACUS RUBIA FUMARIA HIPPOCASTANUM TINCTORUM KARVINSKIANUS CAPILLARE INTERTEXTA AGRARIA SATIVUS ACAULIS BORBONICUM ISSLERI INDICA INSERTA AILANTHUS ALTISSIMA BROMUS INERMIS AVENA SATIVA BERTEROA INCANA HORDEUM VULGARE BACCHARIS HALIMIFOLIA AMARANTHUS DEFLEXUS BUNIAS ORIENTALIS DYSPHANIA AMBROSIOIDES DYSPHANIA MALUS MULTIFIDA DOMESTICA AVENA STRIGOSA COLLOMIA GRANDIFLORA CLAYTONIA PERFOLIATA CASTANEA SATIVA JUNCUSESCHSCHOLZIA TENUIS CALIFORNICA SPICATA BIDENS FRONDOSA SOLIDAGOMENTHA CANADENSIS MESEMBRYANTHEMUM CRYSTALLINUM EUPHORBIA PROSTRATA SYMPHYTUM ORIENTALE SPOROBOLUS INDICUS GERMANICA PASPALUMIRIS DISTICHUM ARUNDO DONAX QUERCUS RUBRA CARPOBROTUS EDULIS ALLIUM CEPA LUTEA BROUSSONETIA STERNBERGIA COTULA CORONOPIFOLIA ARBORESCENS PAPYRIFERA CORISPERMUMARTEMISIA DECLINATUM PARTHENOCISSUS QUINQUEFOLIA SISYMBRIUM ALTISSIMUM PSEUDOTSUGA ABIES AEGILOPS SALVIA NORDMANNIANA NEMOROSA CYLINDRICA MENZIESII RANUNCULUS MACROPHYLLUS OXALIS PES-CAPRAE SISYRINCHIUM MONTANUM CEDRUS ATLANTICA IMPATIENS GLANDULIFERA SPOROBOLUS CYPERUS VAGINIFLORUS REFLEXUS PASPALUM ZEA MAYS VAGINATUM Ave_Cover Fig. S1. Scatterplot of PCA of invasion success measures for all alien species occurring in the dataset (n= 203). Occasionally occurring shrub and tree species were subsequently removed in order to focus on herbaceaous species. 7 0.2 -0.3 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -4 start_year_EU (8%) 1.0 -1 0 1 0.8 1.0 0.0 1.0 4 6 8 4 6 8 0.02 0.2 0.4 0.6 0.8 1.0 fitted function MDMCO.pool.plot.phylo (1.2%) fitted function -0.02 AgriForest OtherUnknown 0.2 0.4 0.6 0.8 logSLA.pmm (0.6%) 1.0 -1 0 1 2 0.0020 PH_hier_Hab (0.8%) -0.0010 -0.004 0.0 -2 fitted function fitted function 0.01 -0.03 2 0.0 path.mostcommon (0.9%) -0.01 fitted function 0.03 0.01 0 SM_hier_Hab (0.7%) 2 0.04 -1.0 SLA_hier_Hab (0.9%) -0.01 -2 1 -0.04 -2.0 logSM.pmm (1%) -4 0 0.02 fitted function 0.00 0.02 0.6 -1 SLA_hier_Plot (2%) -0.04 0.4 0 -0.10 -2 -0.02 fitted function 0.06 0.02 0.2 0.10 2 PH_hier_Plot (4.8%) -0.02 0.0 0.05 fitted function -0.05 -2 0.00 0.8 0.004 0.6 0.000 0.4 0.00 0.2 0.1 -0.1 0.0 fitted function 0.0 -0.1 fitted function -0.2 0.2 MDMCO.pool.plot.func (5.1%) -2 SM_hier_Plot (6.1%) 0.3 MDMCO.pool.func (9%) -0.02 0.2 -0.06 1.0 0.02 0.8 0.00 0.6 0.0005 0.4 -0.3 0.0 fitted function 0.1 -0.1 0.0 0.2 MDMCO.pool.phylo (58.7%) fitted function 0.0 fitted function 0.1 0.2 fitted function -0.1 0.6 0.4 0.2 fitted function 0.0 -0.5 fitted function -1.0 -1.5 0.0 0.4 0.5 0.6 0.7 0.8 0.9 logPH.pmm (0.1%) graminoid herb herb/shrub GrowthForm (0%) Fig. S2. Response curves of invasion success to all variables in the BRT. All variables are rescaled between zero and one (apart for the hierarchical indices to highlight positive vs. negative trait differences). Invasion success is expressed by the first axis of a PCA. Negative values relate to low invasion success while positive values relate to high invasion success. 8 C A 0.25 0.35 MDMCS.hab.func MDMCO.pool.func 0.45 0.6 0.5 0.4 0.3 0.2 MDMCO.pool.phylo MDMCS.hab.phylo B H D G E B C A H D G E B C A Fig. S3. The two most important explanatory variables in BRT separated according to Rabinowitz classes. Represented are the means and bars show standard errors. The least successful aliens are in the class on the left of the graph (class H - small regional distribution, locally not abundant, small niche breadth) and the most successful aliens are on the right (class A - regionally widespread, locally abundant, large niche breadth). See contingency table in Fig. 1 for a full explanation of the labels. 9 Appendix S1 – Details on vegetation plots: the DIVGRASS project Permanent grasslands are broadly defined as “Land on which vegetation is composed of perennial or self-seeding annual forage species which may persist indefinitely. It may include either naturalized or cultivated forages” (Allen et al. 2011). According to European Union laws, this definition is further restricted to grasslands that have been used for at least five years to produce forage, and which have not been ploughed nor re-seeded during this period (Plantureux et al. 2012). In France, permanent grasslands are mainly found in fodder region where they account for more than 20% of total land surface areas (cf. Fig S4). Here we give further details on the rationale and data structure of the DIVGRASS project conducted at the CESAB, the French Centre for the Synthesis and Analysis of Biodiversity. The DIVGRASS project was aimed at integrating and sharing existing knowledge on both taxonomic and functional plant diversity, as well as on ecosystem properties and functioning of the C3 French permanent grasslands. Specifically it allowed to assemble plant community data (species’ occurrences and abundances) and plant trait data, along with multiple environmental layers relevant to characterize climate, soil and land use, within a coherent platform. The resulting DIVGRASS database comprises 51,486 vegetation plots from multiple data sources (see Violle et al. 2015 for full data sources). Plots, of 50 to 100 m2 on average, are homogenous with respect to the type of vegetation sampled. These plots are representative not of a locality but of an ecological situation, in other words as the example of coexistence between a flora and an environment. The data consist in visually estimated relative cover of all present species in plots using a 6-level abundance scale derived from the Braun-Blanquet (1932) cover scale : [0%,1%], ]1%,5%], ]5%,25], ]25%,50%], ]50%75%] and ]75%,100%]. We used the median of each class to derive a percentage cover for each species, i.e. 0.5%, 3%, 15%, 37.5%, 62.5% and 87.5%, respectively. The vegetation plots are dominated by graminoids with ca. 20 species per plot. Vegetation databases such as DIVGRASS have undoubtedly many advantages and great potential, but can also have limitations and potential biases. For example, collating historical vegetation plots is often biased by unbalanced sampling efforts towards patrimonial plant communities (e.g. the species-rich dry calcareous meadows) and by differences in the timing of sampling across plots. However while vegetation plots such as those included in DIVGRASS are generally not repeatedly resampled, in the large majority of cases they are visited in the period of greatest species detectability (late spring to summer), i.e. the best time 10 for identifying flowering and vegetative parts. Moreover the authors of the sampling were generally skilled botanists able to identify seedlings and dead parts, so that ephemeral and rare species should still be adequately sampled (if not even over-represented). Finally, the great majority of plots included in this dataset were sampled after the 1990s, so that they would at least reflect current ecological conditions relatively well. Soil characteristics and land use are key to assess the drivers of vegetation changes and functioning but there is little reliable information at large scales. However, in the context of this project, we benefit from unique country-wide soil and land use databases available for France. Figure S4. (A) Spatial distribution of French permanent grasslands (FPGs) and (B) location of the 51,486 vegetation relevés collated in the DIVGRASS database. In (A), the green colour scale represents the coverage (%) of French Permanent Grassland (FPG) in a 5km x 5km grid cell. In (B) the heat colour scale represents the number of relevés per pixel; Red colour represents a number of relevés higher or equal to ten relevés in a grid cell. Grid cells with a cover percentage of permanent grasslands lower than 20% are not shown (grey colour). Reproduced from Violle et al. (2015). References Allen, V.G., batello, C., Beretta, E.J., Hodgson, J., Kothmann, M., Li, X. et al. (2011). An international terminology for grazing lands and grazing animals. Grass Forage Sci, 66, 2-28. Braun-Blanquet, J. (1932). Plant sociology. McGraw-Hill, New York. 11 Plantureux, S., Pottier, E. & Carrere, P. (2012). Permanent grassland: new challenges, new definitions? Fourrages, 211, 181-193. Violle, C., Choler, P., Borgy, B., Garnier, E., Amiaud, B., Debarros, G. et al. (2015). Vegetation ecology meets ecosystem science: Permanent grasslands as a functional biogeography case study. Science of The Total Environment, in press. 12 Appendix S2 – Details on phylogeny reconstruction We reconstructed a genus-level phylogeny for the entire species pool using the procedure proposed by Roquet et al. 2013). We retrieved from Genbank 3 conserved chloroplast regions (matK, rbcL, ndhF) for all available genera, plus two chloroplastic regions (rpl16 and trnL-F) and the nuclear ribosomal ITS region for certain families (taxonomical clustered alignment at the family level was performed for these 3 regions). Sequences were aligned with MAFFT (Katoh et al. 2005), checked by eye, and depurated with TrimAl (Capella-Gutierrez et al. 2009). Maximum likelihood phylogenetic inference analysis was performed with RAxML (Stamatakis et al. 2008): 100 independent searches were carried out to retrieve 100 trees applying a supertree constraint at the family-level based on Davies et al. (2004) and Moore et al. (2010); and node support was assessed by bootstrap (BS) analysis with 1000 replicates. Given that the likelihoods of the 100 ML trees obtained varied only slightly (from -585751.9 to -585791.1 for the best ML tree), and that BS analysis showed a high robustness for most of the nodes (71% of nodes obtained a BS support > 70%, and an additional 11% of nodes obtained a moderate BS support of 50-70%), we chose to conduct all analyses only based on the best maximum likelihood tree. Species of the same genus were included as polytomies. The tree was dated using penalized likelihood as implemented in r8s (Sanderson 2003) based on fossil information extracted from Smith et al. (2010) and Bell et al. (2010). To calculate distance-based phylogenetic metrics (see below), we extracted the cophenetic distance from the phylogenetic tree. References Bell C.D., Soltis D.E. & Soltis P.S. (2010). The age and diversification of the angiosperms rerevisited. American Journal of Botany, 97, 1296-1303. Capella-Gutierrez S., Silla-Martinez J.M. & Gabaldon T. (2009). trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25, 19721973. Davies T.J., Barraclough T.G., Chase M.W., Soltis P.S., Soltis D.E. & Savolainen V. (2004). Darwin's abominable mystery: Insights from a supertree of the angiosperms. Proceedings of the National Academy of Sciences of the United States of America, 101, 1904-1909. Katoh K., Kuma K., Toh H. & Miyata T. (2005). MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research, 33, 511-518. 13 Moore M.J., Soltis P.S., Bell C.D., Burleigh J.G. & Soltis D.E. (2010). Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicots. Proceedings of the National Academy of Sciences of the United States of America, 107, 4623-4628. Roquet C., Thuiller W. & Lavergne S. (2013). Building megaphylogenies for macroecology: taking up the challenge. Ecography, 36, 13–26. Sanderson M.J. (2003). r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics, 19, 301-302. Smith S.A., Beaulieu J.M. & Donoghue M.J. (2010). An uncorrelated relaxed-clock analysis suggests an earlier origin for flowering plants. Proceedings of the National Academy of Sciences of the United States of America, 107, 5897-5902. Stamatakis A., Hoover P. & Rougemont J. (2008). A Rapid Bootstrap Algorithm for the RAxML Web Servers. Systematic Biology, 57, 758-771. 14 Appendix S3 – Details on trait preparation We collated information on four key functional traits for both alien and native species: Specific Leaf Area (SLA; the ratio of leaf area to dry mass), plant maximum height at maturity (Height), seed mass (SM) and growth form. These trait data were extracted from the TRY database (Kattge et al. 2011), the Androsace database (see Thuiller et al. 2014) and additional local datasets (see Violle et al. 2015 for details). We kept only species for which we had information on at least two traits (2930 species from the initial 4280). Note that for all the studied traits, trait availability increased with species frequency (Fig. S5, reproduced from Violle et al. 2015). In other words, trait data were more available for frequent native species than for rare species, so that we can be reasonably sure that community patterns were well represented. Also, note that growth form was available for all species. Figure S5. Trait availability in the plant traits module of the DIVGRASS platform. The x-axis represents the species’ frequency in the plots module of the DIVGRASS platform, based on occurrence data across vegetation plots. The y-axis represents the proportion of species for which trait values are available in the Plant traits module of the DivGrass platform (black colour: available values; grey colour: lack of data). Traits are ordered by decreasing data availability (n = total number of species with available trait values). Partially reproduced from Violle et al. (2015). For the species for which we still lacked partial information, we used multivariate imputation by chained equations based on predictive mean matching to estimate the missing values based on the relationships between the continuous traits, as implemented by the ‘mice’ package for R (van Buuren & Groothuis-Oudshoorn 2011). The algorithm imputes the incomplete trait 15 columns (the target column) by generating 'plausible' synthetic values given the other trait columns in the data. In the predictive mean matching method, for each missing value, the imputed value is randomly chosen from a set of observed values whose predicted values are closest to the predicted value for the missing value from the simulated regression model based on the predictors (i.e. all other traits in the dataset) (van Buuren & Groothuis-Oudshoorn 2011). This is one of the most commonly used imputation methods (e.g., Baraloto et al. 2010; Paine et al. 2011) and allows preserving non-linear relationships among traits (van Buuren & Groothuis-Oudshoorn 2011). In an ad-hoc evaluation study, Penone et al. (2014) artificially removed trait values from a trait dataset on mammals to simulate percentages of missing values ranging from 10% to 80%, and showed that ‘mice’ produced less biased results than other imputation methods and than those obtained with datasets in which missing values were simply removed. In our study, imputed values had means and ranges comparable to the observed values (Fig S6) for all traits (the range of imputed values was only slightly narrower for Seed Mass). This underlines that our imputation approach did not introduce any directional bias in the dataset. Specifically missing values were 35% for SLA, 14% for Height Imputed 1e+02 1e+00 Seed Mass 5e-01 Height 1e-04 5e-02 5e-03 5 Observed 1e-02 5e+01 5e+00 100 50 20 10 SLA 1e+04 and 15% for Seed Mass (in average 20% per trait). Observed Imputed Observed Imputed Figure S6. Boxplots for the observed (green) and imputed (orange) trait values used in the study. Width of the boxplots for imputed values is proportional to the number of values imputed (35% for SLA, 14% for Height and 15% for Seed Mass). Note that the y-axis is logarithmically scaled. References Baraloto C., Paine C.E.T., Poorter L., Beauchene J., Bonal D., Domenach A.M., Herault B., Patino S., Roggy J.C. & Chave J. (2010). Decoupled leaf and stem economics in rain forest trees. Ecology Letters, 13, 1338-1347. 16 Kattge J., Diaz S., Lavorel S., Prentice C., Leadley P., Bonisch G., Garnier E., Westoby M., Reich P.B., Wright I.J., Cornelissen J.H.C., Violle C., Harrison S.P., van Bodegom P.M., Reichstein M., Enquist B.J., Soudzilovskaia N.A., Ackerly D.D., Anand M., Atkin O., Bahn M., Baker T.R., Baldocchi D., Bekker R., Blanco C.C., Blonder B., Bond W.J., Bradstock R., Bunker D.E., Casanoves F., Cavender-Bares J., Chambers J.Q., Chapin F.S., Chave J., Coomes D., Cornwell W.K., Craine J.M., Dobrin B.H., Duarte L., Durka W., Elser J., Esser G., Estiarte M., Fagan W.F., Fang J., FernandezMendez F., Fidelis A., Finegan B., Flores O., Ford H., Frank D., Freschet G.T., Fyllas N.M., Gallagher R.V., Green W.A., Gutierrez A.G., Hickler T., Higgins S.I., Hodgson J.G., Jalili A., Jansen S., Joly C.A., Kerkhoff A.J., Kirkup D., Kitajima K., Kleyer M., Klotz S., Knops J.M.H., Kramer K., Kuhn I., Kurokawa H., Laughlin D., Lee T.D., Leishman M., Lens F., Lenz T., Lewis S.L., Lloyd J., Llusia J., Louault F., Ma S., Mahecha M.D., Manning P., Massad T., Medlyn B.E., Messier J., Moles A.T., Muller S.C., Nadrowski K., Naeem S., Niinemets U., Nollert S., Nuske A., Ogaya R., Oleksyn J., Onipchenko V.G., Onoda Y., Ordonez J., Overbeck G., Ozinga W.A., Patino S., Paula S., Pausas J.G., Penuelas J., Phillips O.L., Pillar V., Poorter H., Poorter L., Poschlod P., Prinzing A., Proulx R., Rammig A., Reinsch S., Reu B., Sack L., Salgado-Negre B., Sardans J., Shiodera S., Shipley B., Siefert A., Sosinski E., Soussana J.F., Swaine E., Swenson N., Thompson K., Thornton P., Waldram M., Weiher E., White M., White S., Wright S.J., Yguel B., Zaehle S., Zanne A.E. & Wirth C. (2011). TRY - a global database of plant traits. Global Change Biology, 17, 29052935. Paine C.E.T., Baraloto C., Chave J. & Herault B. (2011). Functional traits of individual trees reveal ecological constraints on community assembly in tropical rain forests. Oikos, 120, 720-727. Penone C., Davidson A.D., Shoemaker K.T., Di Marco M., Rondinini C., Brooks T.M., Young B.E., Graham C.H. & Costa G.C. (2014). Imputation of missing data in lifehistory trait datasets: which approach performs the best? Methods in Ecology and Evolution, 5, 961-970. Thuiller W., Guéguen M., Georges D., Bonet R., Chalmandrier L., Garraud L., Renaud J., Roquet C., Van Es J. & Zimmermann N.E. (2014). Are different facets of plant diversity well protected against climate and land cover changes? A test study in the French Alps. Ecography, 37, 1254-1266. 17 van Buuren S. & Groothuis-Oudshoorn K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45, 1-67. Violle, C., Choler, P., Borgy, B., Garnier, E., Amiaud, B., Debarros, G. et al. (2015). Vegetation ecology meets ecosystem science: Permanent grasslands as a functional biogeography case study. Science of The Total Environment, in press. 18 Appendix S4 – Distance Metrics based on Co-occurrence to study invasions We calculated a set of functional and phylogenetic similarity metrics to the natives for each alien species based on co-occurrence information at two scales (invasibility metrics, Thuiller et al. 2010). We chose to calculate indices based on co-occurrence because, in our modeling framework at the scale of France, we needed to assess a single index value for each alien species, representing its similarity to the overall native grassland assemblages. Classical distance metrics used in invasion ecology, such as the weighted mean distance to the native species (WMDNS) or the distance to the nearest native species (DNNS), are generally calculated at the level of single communities/plots resulting in many different values for each alien species within a region (Gallien et al. 2014, Thuiller et al. 2010). By contrast, a category of indices exists in community ecology, which describes community level similarity patterns by expressing the correlation between a matrix of phylogenetic/functional distances between species and a matrix of pairwise co-occurrence indices between species (Hardy et al. 2008). Many examples of these indices exist which differ only in the way co-occurrence is estimated (Cavender-Bares et al. 2004, 2006, Helmus et al. 2007). These indices allow estimating a single value for a region by relying on the information obtained from a high number of sampling plots or communities. Here we built on this latter approach to derive an index which expresses, only the dissimilarity of a single invader to the rest of the species in the region, which can be used in the context of invasion ecology. A part from being based on a single focal alien species, instead of relating the dissimilarity to the natives with their co-occurrence through a correlation index, we only focused on the distance to the most often co-occurring native species (upper right corner in correlation plot, Fig. S7). This should make the index less dependent on the number of observations available for each focal alien species. Also, conceptually, the species that most often co-occur with an alien are also the ones that are expected to convey the maximum amount of information for that particular alien species. This is parallel to what is done in the case of the classical index based on the distance to the most abundant species (DMAS), which is then taken to be a good indicator of overall community resistance (Thuiller et al. 2010, Gallien et al. 2014). Note that in the case in which only one native species had the highest co-occurrence value, distances were calculated to that single native species, while when several natives had the same co-occurrence value, an average distance was calculated. 19 MOST CO-OCCURRING SPECIES Functional Distance Functional Distance MOST CO-OCCURRING SPECIES Co-occurrence COMPETITION Co-occurrence ENVIRONMENTAL FILTERING Figure S7. Conceptual figure for the Mean Distance to the Most often Co-occurring Species (MDMCS) index. Represented are hypothetical regression lines of pairwise functional distance of a focal alien to the natives as a function of its pairwise co-occurrence metric with the natives (CavenderBares et al. 2004, 2006). On the left side panel the most often co-occurring natives are functionally distant to the focal alien, suggesting that the species is competitively filtered. On the right side panel, the most often co-occurring natives are functionally similar to the focal alien, suggesting environmental filtering. In our specific analysis, for each alien species we calculated 1) the mean distance to the most often co-occurring species (MDMCS) and 2) whether it had higher or lower values than those species for each trait (i.e. its hierarchical position on each trait gradient). MDMCS was calculated based on both phylogenetic and multi-trait functional distances, whereas the hierarchical index was calculated for each trait independently. We identified the native species which most often co-occurred with each alien species by using the V-score as a measure of species co-occurrence (Lepš & Šmilauer 2003), modified to account for species abundances. For a pair of species A and B the v-score based on presence absence data is calculated as : V = (ad – bc) / √(a + b)(c + d)(a + c )(b + d), where a is the number of units where both species are present, b and c are the number of units where only species A or B is present, respectively, while d is the number of units where neither of the two species is present. Computationally, the value of the v-score is equivalent to the value of the Pearson correlation coefficient between the presence/absence vectors of the two species. Hence, in order to take abundance into account here we calculated the Pearson correlation coefficient for each pair of species using the abundance vectors of the species instead of just the presence/absences, normalized to a logarithmic scale. The values of this co-occurrence index 20 range from -1 (complete segregation) to +1 (complete positive association). V-scores (and therefore MDMCS) were calculated at both the plot (local community) and the habitat scale. The habitat scale was the set of plots belonging to the same grassland type (one of the four categories defined in the main text). Finally, in order to assess the congruence of our novel co-occurrence based index with the more classical indices used in invasion community ecology we performed a comparison of MDMCS with WMDNS (weighted mean distance to the natives) and DNNS (Distance to the closest native relative) metrics. To allow a comparison we calculated both WMDNS and DNNS based on multi-trait functional distances for each plot (and for each habitat) and then pooled them across plots (or habitats) in order to obtain a single value for each alien species summarizing its plot-level and habitat-level functional similarity pattern in French grasslands. Plot level 6 2 3 4 5 6 7 R = 0.68 2 3 4 5 R = 0.8 1 pooled WMDNS Habitat level 3 4 5 6 7 2 4 6 5 2 R = 0.65 1 1.0 2 3 2.0 4 R = 0.68 0.0 pooled DNNS 3.0 1 1 2 3 4 5 MDMCS 6 7 2 4 6 MDMCS Figure S8. Comparison of MDMCS.func (Mean Distance to Most often Co-occurring Species) metric used in this paper with the more classical WMDNS (weighted mean distance to the natives) and DNNS (Distance to the closest native relative) metrics calculated per plot (or habitat) and averaged across plots (or habitats). Significant Pearson coefficients (R) are indicated in each panel. We found that MDMCS.func was reasonably well correlated with both WMDNS and DNNS averaged across communities at both the habitat and plot scale (Fig. S8). This suggests that our novel index is quite congruent with some of the more classical metrics used in invasion studies, if these are averaged across sampling units. Nevertheless we believe that our index is 21 conceptually a more suitable and direct choice in the case in which patterns are of interest at a regional extent such as the one of this study, because species that co-occur more often with the alien are emphasized, while bias introduced by species that might just accidentally be present in one or few plots with the alien is avoided. References Cavender-Bares, J., Ackerly, D.D., Baum, D.A. & Bazzaz, F.A. (2004). Phylogenetic overdispersion in Floridian oak communities. American Naturalist, 163, 823-843. Cavender-Bares, J., Keen, A. & Miles, B. (2006). Phylogenetic structure of floridian plant communities depends on taxonomic and spatial scale. Ecology, 87, S109-S122. Hardy, O.J. (2008). Testing the spatial phylogenetic structure of local communities: statistical performances of different null models and test statistics on a locally neutral community. Journal of Ecology, 96, 914-926. Helmus, M.R., Bland, T.J., Williams, C.K. & Ives, A.R. (2007) Phylogenetic measures of biodiversity. American Naturalist, 169, E68–E83. Lepš, J. & Šmilauer, P. (2003). Multivariate Analysis of Ecological Data using CANOCO. Cambridge University Press, Cambridge. Gallien, L., Carboni, M. & Muenkemueller, T. (2014a). Identifying the signal of environmental filtering and competition in invasion patterns - a contest of approaches from community ecology. Methods in Ecology and Evolution, 5, 1002-1011. Thuiller, W., Gallien, L., Boulangeat, I., de Bello, F., Munkemuller, T., Roquet, C. et al. (2010). Resolving Darwin's naturalization conundrum: a quest for evidence. Diversity and Distributions, 16, 461-475. 22 Appendix S5 – Influence of evolutionary history on invasion success Since species share evolutionary history to some degree (Blomberg and Garland 2002; Felsenstein 1985), modeling species independently of each other may bias analyses if closely related species have very similar values for the response variable of interest. We thus tested for phylogenetic signal of our continuous indicator of invasion success (i.e. PCA1) on the alien species phylogenetic tree using Blomberg’s K statistic. Significance test was based on the variance of phylogenetically independent contrasts (PIC) relative to tip shuffling randomization (Blomberg & Garland 2002). We found no evidence that closely related alien species had similar success. Instead we found that the most successful species were distributed quite randomly throughout the tree (Fig. 2 in main text; K = 0.074, PIC variance p-value = 0.931). Hence, we considered that the success of alien species is independent from phylogenetic relatedness in further analyses. References Felsenstein J. (1985). Confidence-limits on phylogenies - an approach using the bootstrap. Evolution, 39, 783-791. Blomberg S.P. & Garland T. (2002). Tempo and mode in evolution: phylogenetic inertia, adaptation and comparative methods. Journal of Evolutionary Biology, 15, 899-910. 23 Appendix S6 – Cross Validation of model for Invasion Success In order to assess whether our modeling framework also had the potential to estimate future invasion success of newly introduced species, we attempted to validate the predictive performance of our model. Given that we did not have any additional suitable independent data available, we ran a cross validation procedure by following two separate approaches: 1) Repeated sample splitting, and 2) Jackknifing. Repeated sample splitting For our first approach, we randomly split the data into a validation (80%) and a calibration group (20%) of species. Then we fit the GBMs in the calibration subset of species using exactly the same procedure used to fit the full model, and used these models to predict invasion success in the rest of the species (validation data set). The splitting procedure was repeated 50 times and model predictive performance was evaluated in average across repetitions. For each split run performance was assessed by comparing the predicted values of invasion success with the observed values based on several goodness of prediction statistics: the R-squared of the relationship, Pearson’s R of the relationship, the Mean Square Error (MSE), and the G-statistic (Guisan & Zimmermann 2000). The G-statistic measures how effective a prediction might be relative to that which could have been derived using the sample mean. A G-value equal to 100% indicates perfect prediction, a positive value indicates a more reliable model than if one had used the sample mean, and a negative value indicates a less reliable model than if one had used the sample mean. Table S3. Goodness of prediction statistics for the GBM, averaged across 50 splitting runs. R-squared Pearson r G-statistic MSE Value 0.330 0.565 29.64 % 1.410 Jackknifing 24 For our second approach, we removed one species at a time from the full dataset and fit the GBM to model invasion success based only on the remaining species. Then we used the fitted model to predict invasion success of the single removed species, only based on the predictors (traits, invasion history and invasibility metrics). For each prediction we calculated the square error compared to the true value of invasion success. We repeated this procedure 1000 times (with replacement) in order to have at least a few predictions for each species. Finally we averaged within species and checked how the predictions and the MSE values varied depending upon the Rabinowitz class and the frequency, abundance and specialization classes. Figure S9. Observed invasion success and predicted invasion success for the species « excluded » in fitting the model across classes of frequency (first row), abundance (second row) 25 and generalism (third row). Indicated is also the significance (p-value) of the difference of the two classes with a t-test. Pred 1.0 0.5 -0.5 0.0 Invasion Success 1 0 -2 -1.0 -1 Invasion Success 2 1.5 3 2.0 2.5 Obs H D G E B C A H D G E B C A 3.0 Mean Square Error COTULA CORONOPIFOLIA AMARANTHUS DEFLEXUS 2.5 VICIA TENUIFOLIA 1.5 OENOTHERA BIENNIS 1.0 Mean Square Error 2.0 LINUM AUSTRIACUM 0.5 JUNCUS TENUIS PASPALUM DILATATUM BROMUS INERMIS ERIGERON ANNUUS 0.0 RUMEX PATIENTIA H D H - NarrowSparseSpecialist D - NarrowSparseGeneralist G - NarrowAbundantSpecialist G E B E - WideSparseGeneralist B - WideAbundantSpecialist C - NarrowAbundantGeneralist C A A - WideAbundantGeneralist Figure S10. First row : observed and predicted values of invasion success for the « excluded » species across rabinowitz class. Second row : MSE for the excluded species across rabinowitz classes. For the « WideAbundantGeneralist » category two grups of outlier species are highlighted: 1) species that have the highest MSE and are underpredicted (invasive species that are not well predicted), and 2) species that have the lowest MSE and have high predicted invasiveness (correctly identified highly invasive species). 26 Conclusions We found that the model had a moderate predictive performance, but it was still in average 30% better than using the sample mean to estimate invasion success of species (Table S3). According to the Pearson R (0.56) and R-squared (0.33) performance was reasonable (Table S3). In particular, frequent species are in average predicted as more invasive than the narrow ones (Fig.S11, first row, second panel). The same is true for abundant species and generalist species (Fig.S11, second and third rows). Some of the highest prediction errors occured for the NarrowAbundantGeneralist group and for some of the WideAbundantGeneralist species (Fig. S12). The NarrowAbundantGeneralist species (class C) are the ones that are most underpredicted (Fig. S12, first row). These species are locally abundant and generalist, but they are at the moment restricted to a narrow extent, potentially close to introduction sources, which suggests that they are likely not to be in equilibrium. Indeed it is intuitive that this group of species would be the most difficult to predict. The WideAbundantGeneralist species are mostly correctly predicted as having high invasion success (figure S12, first row), but nevertheless MSE for this group of species is quite high. However, we can note that some of the most well-known invaders in this category (O. biennis, E. annuus, B. inermis, J. tenuis, Fig. S12, second row) which have the highest invasion success values, are indeed very well predicted by the model (low MSE), whereas it is the species that are generally considered less problematic which are less well predicted. This is an encouraging finding from a conservation point of view. 27