Can Clone Size Serve As A Proxy For Clone Age? An Exploration using Microsatellite Divergence in Populus tremuloides Ally, D., Ritland, K. and Otto, S.P. Supplementary Material – Section A: Departures from a star-like coalescent history In the text, we related average pairwise divergence, k , per locus and the proportion of polymorphic loci within a clone, Sk , to the age of a clone, TCCA, assuming that the clone has been expanding and that the coalescent history of the ramets is nearly star-like (Slatkin and Hudson 1991). Here we discuss the robustness of these two coalescent history. estimators of clone age to departures from a star-like In particular, we consider the case where there are n sampled ramets, among which m pairs happen to be very closely related (e.g., a parent ramet and its immediate n descendant; Figure S1). Under a strict star-like history, there would be pairwise 2 comparisons, each of which traces back to the most recent common cellular ancestor at n time TCCA. Under the nearly star-like history illustrated in Figure S1, only m 2 pairwise comparisons trace back to TCCA; the remaining pairs are so closely related that we assume that they are genetically identical. Thus, the average pairwise divergence per locus across all sampled ramets, as measured by k , is expected to equal: n m 2 4 Tcca . n 2 By contrast, the proportion of loci exhibiting a polymorphism will depend on the total number of lineages, n m, tracing back to the common cellular ancestor (ignoring the short branches leading to the closely related ramets), because a mutation along any of these lineages will lead to a polymorphism within the clone. Thus, Sk is expected to equal: 2 n m CCA . Thus, if we assume a star-like history even though some sampled ramets are closely n n related, TCCA obtained from k will be underestimated by a factor m , 2 2 whereas TCCA obtained from Sk will be underestimated by a factor n m n (Figure S2). Because k is less sensitive to departures from a strict star-like coalescent history, we recommend that it be used to estimate clone age, rather than Sk . Figure S1: Departures from a pure star-like coalescent history. (A) A star-like coalescent history involving n sampled lineages, each of which traces back to around the point in time when the clone was established (Tcca time units in the past). (B) A nearly star-like ancestral history involving n sampled lineages, only n – m of which trace back to around the time when the clone was established. To simplify the presentation, we assume that each of the clusters of closely related ramets consists of at most two ramets, as illustrated. Figure S2: Error introduced by assuming a star-like coalescent history. Using k and Sk to estimate the time since the clone was established assuming a star-like tree will underestimate Tcca if, in fact, the sample includes some closely related ramets (as illustrated in Figure S1). The extent of this underestimation is shown for a sample of size n = 50 as a function of m, the number of closely related ramet pairs. In general, k provides a more robust estimator, which is less sensitive to small departures from a starlike coalescent history than Sk . Supplementary Material – Section B: The relationship between microsatellite divergence and clone size. Figure S3: The relationship between microsatellite divergence ( k ) and the average pairwise distance between two ramets within the kth clone (Dpairs, k) in Riske Creek and Red Rock. a. Riske Creek: Although, the relationship between k and Dpairs,k is significant (F1,18=12.36, p-value=0.00247, R2=0.407), it is driven by the presence of an the relationship is no longer outlier. b. Riske Creek : When this observation is removed, significant and the model explains very little of the variation in k (F1,17=0.3007, p value=0.591, R2=0.017). c. Red Rock: We did not find a significant relationship between k and Dpairs,k , however, the power to detect a relationship in this dataset was low (F1,5=0.8013, p-value=0.412, R2=0.14, 1-ß err prob=0.12). Table S1: The relationship between microsatellite divergence and clone size for Riske Creek. Linear regression models with square root transformed k and logtransformed size metrics are presented. The transformed data met all the assumptions of a linear regression. In this table, we show no significant relationship among neutral divergence and clone size. Estimate±S.E. Model Fstat pvalue RISKE CREEK Intercept Only Model A 0.2975 Area p=0.59 df Intercept 19 0.072±0.016 p=0.00030 0.0029±0.025 p=0.907 0.0057±0.035 p=0.878 -0.038± 0.025 p=0.145 1,18 Model B Perimeter 0.1941 p=0.67 1,18 Model C Dmax 3.809 p=0.07 1,18 Log(Area) log(Perimeter) log(Dmax) 0.0011±0.003 p=0.440 R2 0.016 0.0031±0.0069 p=0.665 0.0106 0.011±0.006 p=0.07 0.17 Figure S4: The relationship between microsatellite divergence and maximum distance between two ramets across clones in Riske Creek. a. The relationship between microsatellite divergence and Dmax is driven by the presence of an extremely old clone (as indicated by the solid arrow). b. When this observation is removed, the relationship is no longer significant and the model explains very little variation in k (F1,17=0.0101, p-value=0.921, R2=0.0006). Although the outlier may be a biologically important observation, it likely represents an extremely old clone, that survived from an early and different pool of colonizers. c. On the transformed data, there is no significant relationship between microsatellite divergence (square root transformed k ) and the log of Dmax (F1,18=3.809, p-value=0.07, R2=0.17). Table S2: The relationship between microsatellite divergence and the most frequent clone in a patch for Riske Creek. The most frequent clone was the clone inhabiting the largest area of a patch. In a single case two clones were equally large. We found that when either clone was chosen as the most frequent clone, the results did not change. We present here only one set of results. Estimate±S.E. Model Fstat p-value RISKE CREEK Intercept Only Model A1 0.02968 (Area) p=0.865 df Intercept 16 0.01010.0047 p=0.0471 9.39 10-3 6.42 10-3 p=0.164 7.33 10-3 9.17 10-3 p=0.436 7.78 10-3 9.72 10-3 p=0.364 -6.42 10-3 6.15 10-3 p=0.314 6.44 10-3 3,87 10-3 p=0.118 1,15 Model A2 Sqrt(Area) 0.1278 p=0.726 1,15 Model B 0.0077 p=0.785 1,15 Model C 11.22 p=0.004 1,15 Model C (without outlier) 0.03994 p=0.845 1,14 Area Perimeter Dmax 3.15 10-7 1.83 10-6 p=0.866 6.84 10-5 1.91 10-4 p=0.726 R2 0.0020 0.0085 1.22 10-5 4.38 10-5 p=0.785 0.0051 1.59 10-4 4.75 10-5 p=0.00438 0.42 -7.33 10-6 3.67 10-5 p=0.844 0.0028 Table S3. The number of somatic mutations detected was not affected by whether a ramet was found on the perimeter or within the centre of the patch. Our dataset consisted of 19 clones from Riske Creek population, where aspen grew in discontinuous and clearly defined patches. Ramets for each clone were classified by location as either a perimeter ramet or a central ramet. Similarly, mutations for each clone were classified as either a perimeter mutation or a central mutation; each unique mutation was assigned to a single ramet, even if carried by a cluster of multiple neighboring ramets (no clusters straddled both locations). In each cell of the table, we compare the observed count (across 19 clones) with the expected count in parentheses. The expected count was calculated assuming the Ho was true (i.e., the number of somatic mutations is independent of the location in the patch). If the birth of new ramets occurs primarily at the perimeter of a clone, then we expect more somatic mutations near the edge of a patch. Under the alternative hypothesis that there are more somatic mutations on the perimeter than in the centre of a clone, we performed a one-tailed Fisher exact test on the observed data (pvalue of 0.1404). Ramet Type Mutant Perimeter Central Row Totals 5 15 20 (2.845) (17.154) 66 413 479 Non-mutant (68.154) (410.846) 71 428 499 Column Totals Figure S5. The observed data was not skewed towards more clones with only central ramets. Although a Fisher exact test suggests that there are no differences in the location of mutant ramets, a non-significant result might be because we had one particular clone, which only had centrally located ramets. To test for this effect, we randomly allocated mutant and non-mutant ramets to a clone, without regard to location. Across the 5000 randomized datasets, we then asked how often do we observe more extreme tables with more central ramets? For this we calculated a Chi Square statistic (2) on the observed data and for each of the randomized datasets. The 2 test statistic was calculated using the expected number of mutant or non-mutant ramets on the perimeter or centre obtained from Table S3. We found that there was 503 datasets that were more skewed towards clones with more perimeter ramets than what we observed (or 4497 datasets that had clones with an equal number or more central ramets than what we observed).