1 Data S1 2 3 Methods for house sparrow case study 4 Data selection for house sparrow case study 5 Many morphological traits change between- and within-ages, but some are stable 6 after birds mature (e.g. tarsus length). To test for the stability of measurements, we 7 used all measurements of every adult male (≥ 1 year old) from 2000 to 2012 to 8 calculate the within- and between-year repeatability values (intra-class correlations, 9 or ICC; Nakagawa & Schielzeth 2010) of the seven morphological traits (five body- 10 size indicators and two ornaments). The observer error was accounted for when 11 estimating the within-year repeatability, but not in the between-year repeatability, 12 because in the latter case only the annual mean could be used. The within-year 13 repeatability values of all the morphological traits were higher than 0.56 (Table S1). 14 Moreover, when we accounted for the monthly differences in tail length and wing 15 length (note that the length of feathers changes through time within a year), the 16 within-year repeatability values of these two traits increased from 0.66 to 0.74, and 17 from 0.77 to 0.83, respectively (Table S1). The between-year repeatability values of 18 tarsus length, beak length and wing length were greater than 0.62, but tail length 19 was not (0.45; Table S1). We note that, when calculating the between-year 20 repeatability, we averaged all measurements within one year, so we could not 21 account for monthly differences, which might have influenced the wing and tail length. 22 The relatively low between-year repeatability of tail length, which is particularly 23 subject to wear between moults, might be a result of this limitation, because it was 24 averaged across months. 1 25 26 27 28 29 Table S1. The within- and between-year repeatability of phenotypic traits in the house sparrow population on Lundy Island from 2000 to 2012. The number of years recorded (Ny), number of individuals (Ni) and the number of records (Nr) for males are used in calculating the mean values, and within- and between-year repeatability. The within-year repeatability was the mean value over 13 (7 for mask) years. Male 30 31 Female Repeatability (Male) Trait Ny Ni Nr Mean ± sd Ny Ni Nr Mean ± sd Within year Within year* Between year Tarsus length 13 (2000-2012) 358 959 18.58 ± 0.86 13 (2000-2012) 533 1261 18.37 ± 0.87 0.84 - 0.83 Beak length 10 (2003-2012) 291 757 13.06 ± 0.75 10 (2003-2012) 421 984 12.92 ± 0.93 0.62 - 0.62 Tail length 10 (2003-2012) 286 741 58.58 ± 2.37 10 (2003-2012) 371 861 56.59 ± 2.31 0.66 0.74 0.45 Wing length 13 (2000-2012) 367 1018 78.52 ± 2.21 13 (2000-2012) 551 1349 75.58 ± 2.45 0.77 0.83 0.71 Mass 13 (2000-2012) 369 995 28.05 ± 1.98 13 (2000-2012) 561 1365 27.65 ± 2.33 0.62 0.62 0.62 Badge size 13 (2000-2012) 316 2841 35.51 ± 4.37 - - - - 0.56 0.66 0.40 Mask area 7 (2006-2012) 179 967 15.12 ± 3.20 - - - - 0.75 0.76 0.37 *within-year repeatability after accounting for monthly differences 2 Where we had multiple measurements of a male, we included the measurements of body-size indicators temporally closest ± two years to a focal mating event thus making measurements as relevant as possible to any given event. We also extracted another dataset that included only the measurements of body-size indicators taken in the year in which a particular mating event occurred, and we ran the same statistical analyses using this reduced dataset. The exploratory analyses showed that the results based on this reduced dataset were similar to the results based on the two-year-tolerance dataset. Hereafter, we present only results based on the two-year-tolerance dataset for bodysize indicators. Statistical analysis for house sparrows case study We note that we did not have measurements of every trait for every individual in our dataset, so there were missing values for the response variable in some of the models. Deleting these missing values might lead to biased estimates (Nakagawa & Freckleton 2008). MCMCglmm uses a data augmentation procedure to treat missing values in the response under the assumption of ‘missing at random’ (Hadfield 2010). Therefore, we kept the missing values in our dataset in most analyses (see below for an exception). The proportion of missing values in each dataset is listed in Table S3. We used the MCMCglmm default priors for fixed effects. We specified an inverse Wishart prior for all random effects and residuals as V = 1 and nu = 0.05, where V defined variance, and nu defined the degree of belief in V. The only exception is the random slope, where we specified V = diag(2) and nu = 0.05. For each model, we ran three parallel chains and used Gelman–Rubin diagnostics to check for convergence (Gelman & Rubin 1992). Also, we checked the within-chain independence by calculating the autocorrelation between successive samples for fixed and random effects, separately, in each chain. 3 Table S3. The sample size of the comparisons between extra-pair and cuckolded male house sparrows for each trait, including the number of broods (N[Broods]), number of adult males (N[Males]), and the number of trios (N[Unique trios] and N[Trios], see the following explanation). A trio includes a female, her cuckolded male and her focal extra-pair male. The N[Unique trios] indicates the number of unique trios in the dataset, and the N[Trios] indicates the number of trios used in the comparisons. The N[Trios] is usually larger than N[Unique trios] because some trios occurred in more than one year. The missing value (%) is the percentage of missing values in the record entry. rQG is the genetic similarity measurements following Queller and Goodnight’s method (Queller & Goodnight 1989), and rLR is the measurements following Lynch and Ritland’s method (Lynch & Ritland 1999). N[Broods] N[Males] N[Unique trios] N[Trios] N [Records entry] rQG 387 274 445 461 922 0.3 rLR 387 274 445 461 922 0.3 Tarsus length 356 237 400 414 828 2.2 Beak length 356 237 400 414 828 15.2 Mass 356 237 400 414 828 5.4 Wing length 356 237 400 414 828 0.6 Tail length 356 237 400 414 828 15.6 Badge size 247 163 225 231 1524 1.6 Mask area 234 161 223 229 649 48.3 423 274 445 461 922 0 Trait Missing value (%) Genetic similarity Body-size indicator Ornaments Age Age 4 Specific statistical models In the models for genetic similarity, we ran random-slope generalized linear mixed models, GLMMs, with the variable combination described in the main text. In the models for the body-size indicators, we included the following four extra random effects: observers who measured the traits (to account for measurement bias), male age at capture (because body size may change with age), the year when the males were captured (because body sizes may change between years), and also male identity. Among the models for the body-size indicators, we included capture month from the end of the breeding season (Oct = 0, Nov = 1, Dec = 2, Jan = 3, and so on) as a fixed effect, instead of a random effect, for tail length and wing length, because sparrows undergo a complete prebasic moult each year and in our population sparrows usually finish this moult in October and the length of feathers may gradually decline after this point due to abrasion. In the models for the ornaments, we included all the random effects in the models analysing body-size indicator, except the year of capture. In addition, we included capture event as another random effect because we measured each ornament three times once a male was caught. To improve the MCMC model convergence, we used a subset of data for these analyses, where we removed trios in which both extra-pair and cuckolded males had no capture record in the year of mating. For mask size, the models with the predictive variable combination listed above did not converge with the subset of data, potentially due to the high proportion of missing values (~48%). Therefore, to help the model to converge, we reduced the number of random effects by taking out less important effects (according to the magnitude of the variance components). Consequently, we included only observers, sire identity and the capture event as random effects in the final model. Results from the model with fewer random effects were quantitatively similar with results from the full model. Thus, we only report the results from the reduced model. 5 In the models for male mating age, we included female mating age as an extra fixed effect (because there could be assortative mating on age; Potti 2000; Auld et al. 2013) and female identity and the year of mating as random effects. Meta-analyses Data collection Article collection We screened through references included in previous meta-analytic studies, comparative studies or reviews (i.e., Arnqvist & Kirkpatrick 2005; Akçay & Roughgarden 2007; Kempenaers 2007; Cleasby & Nakagawa 2012). We also conducted a keyword search, using “extra-pair / extrapair / extra pair”, “paternity” and “polyandry” to search on the Web of Science and Google Scholar from 1987 to 2013. We screened titles and/or abstracts of all the articles in the search results and only examined the texts in detail for the studies whose titles and abstracts seemed relevant. For the database, we only included articles in which the authors conducted pairwise comparisons between extrapair and cuckolded males on at least one of the four trait categories. Estimating Zr for each comparison To acquire Zr from each comparison, we first calculated the standardized mean differences (d) between the measurements of extra-pair males and those of cuckolded males. When the mean (m) and standard deviations (s) of the measurements of extrapair and cuckolded males (mE, mW and sE, sW, respectively) were provided, we used these values to calculate d using the equations below (cf. Nakagawa & Cuthill 2007): d= mE - mW spooled , eqn 1 6 spooled (n 1)(sW sE ) 2n 2 , eqn 2 where n is the number of pairs of extra-pair and cuckolded males. The proportions of effect sizes calculated from m and s are 58% in the dataset of genetic similarity, 66% in body size, 36% in secondary sexual traits, and 39% in age; we note that, when available, we always used m and s to obtain effect sizes. For articles without m and s, we used the reported test statistics (e.g. paired t or its non-parametric equivalent, z, from matched-pair tests) to estimate d using the following equation: d = t paired 2(1- rWE ) n , eqn 3 where rWE is Pearson’s correlation coefficient between the measurements of extra-pair males and those of cuckolded males. However, only very few articles provided the actual rWE values, which are required for d calculations; when rWE was provided (the number of effect sizes with provided actual rWE: n[Genetic similarity] = 3, n[Body size] = 7, n[Secondary sexual trait] = 6, and n[Age] = 6), the values were distributed mostly between 0 and 0.8. We note that a high proportion of the effect sizes in the dataset of each trait category (37% in genetic similarity, 33% in body size, 54% in secondary sexual traits and 46% in age) required this calculation using Equation 3, but we did not have the actual rWE to conduct such a calculation. Therefore, we prepared three different datasets for each trait category: (1) assuming the rWE value of 0 for all the effect sizes, (2) assuming the rWE value of 0.8, and (3) using an rWE representative value for each trait category. For articles with actual rWE, the d values in each of these datasets are the values estimated from the actual rWE. In order to obtain a representative rWE value for each trait category for comparisons without actual rWE, we conducted a meta-analysis in each trait category to obtain meta-analytic means for rWE. In each of these meta-analyses, we used the Zr-transformed rWE values as the response, weighted with their corresponding sampling variance, 1/(n – 3). The meta-analytic mean rWE value for genetic similarity is 0.2262, body size 0.0482, secondary sexual traits 7 0.5367, and age 0.1394 (note that these meta-analytic means were obtained by using random effect models with the R package ‘metafor’; Viechtbauer 2010). These metaanalytic means of rWE were used as representative values of rWE in further calculations. Once we calculated all the d values in each of the three datasets for each trait category, we transformed d into a correlation coefficient, r, using the formula, r d d2 4 eqn 4 Then, we converted r into Zr using the following equation, æ 1+ r ö Zr = 0.5 ln ç è 1- r ÷ø eqn 5 We also obtained the sampling variance for each Zr using 1/(n – 3). In our analyses, a positive Zr indicates that extra-pair males have larger measurements on a particular trait than the cuckolded males; that is, extra-pair males are genetically more similar to focal females, have larger body size, more exaggerated secondary sexual traits, or are older. In addition, we note that the r-values of 0.1, 0.3 and 0.5 are considered to be small, moderate and large, respectively (sensu Cohen 1988); these r-values translate into the Zr values of 0.10, 0.31 and 0.55, respectively. Meta-analyses and meta-regression For each meta-analysis and meta-regression models, we conducted all analyses for each of the three datasets (rWE = 0, 0.8, or representative values). The exploratory results from each of the three datasets (differing in the setting of rWE) were similar within the same trait category, so we only present results based on the dataset of rWE representative values using the tree with Hackett’s backbone for the publication bias tests. 8 For each of the meta-analytic models in MCMCglmm, we ran all models for 5,000,000 burn-in iterations, followed by 5,000,000 iterations and a thinning interval of 500, which resulted in 10,000 samples for each parameter’s posterior distribution. We specified an inverse Wishart prior for all random effects and residuals as V = 1 and nu = 0.002. For all models, we ran three parallel chains and used Gelman–Rubin diagnostics to check for convergence (Gelman & Rubin 1992). We also calculated the autocorrelation between successive samples for fixed and random effects, separately, in each chain to check the within-chain independence. For each meta-analytic model, we report the means of the posterior distributions and their 95% credible intervals (95% CIs) as our parameter estimates. We conducted a phylogenetic meta-analysis and a meta-regression in each of the four trait categories using the method described by Hadfield & Nakagawa (2010). For each meta-analysis, we used the topology of two phylogenetic trees from Jetz et al. (2012): the one based on Hackett’s backbone and the other on Ericson’s backbone (Ericson et al. 2006; Hackett et al. 2008). The exploratory results from these two phylogenetic trees are similar to each other; this is expected because parameter estimates in phylogenetic comparative analyses are known to be robust to tree misspecification to a certain degree (Rohlf 2006; Stone 2011). We calculated a heterogeneity statistic I2 for multilevel meta-analytic models, described by Nakagawa & Santos (2012), which is modified from Higgins and Thompson’s I2 (Higgins & Thompson 2002). Low, moderate and high heterogeneities refer to I2 of 25%, 50% and 75%, respectively (Higgins et al. 2003). We also calculated the phylogenetic heritability, H2, as the proportion of total variance in Zr that can be explained by the variance of additive genetic values (i.e. phylogenetic variance; Lynch 1991) equivalent to Pagel’s λ (Pagel 1999; Hansen & Orzack 2005; Hadfield & Nakagawa 2010). To test for potential publication bias, we conducted Egger’s regression on meta-analytic residuals (sensu Nakagawa & Santos 2012) to test for evidence of publication bias in our datasets (Egger et al. 1997). Significant intercepts away from zero indicate a possibility of publication bias. To quantify the potential publication bias, we performed the trim-and-fill tests on the meta-analytic residuals with the R0 estimator (Duval & 9 Tweedie 2000a, b). We obtained the required adjustment by estimating the difference between zero and the intercept after applying the trim-and-fill procedure. Also, to visualize the distribution of the data points for potential publication bias, we plotted funnel plots with both the raw data (original effect sizes) and the meta-analytic residuals, respectively. Results Publication bias for meta-analyses Egger’s regression tests showed that there was only weak, if any, evidence for publication bias in any of our datasets (genetic similarity: the intercept, b0 = -0.30, 95% CI = -0.73 to 0.13; body size: b0 = -0.10, 95% CI = -0.35 to 0.16; secondary sexual traits: b0 = 0.30, 95% CI = -0.15 to 0.75; age: b0 = 0.50, 95% CI = -0.13 to 1.21). The trim-andfill tests did not identify any missing studies in any trait category apart from secondary sexual traits (8 missing studies, p = 0.002). The meta-analytic mean for the residuals in the secondary sexual trait dataset incorporating ‘filled data points’ (or missing data points) was -0.037; this indicates that the original meta-analytic mean might be slightly overestimated. The funnel plots also showed little sign of publication bias for both the raw data and the meta-analytic residuals (Figure S3). 10 Figure S3. Funnel plots of Zr against its precision (1/standard error for Zr) and residual against the precision in each of the four trait categories. The dashed grey lines in the figures of original Zr indicate the meta-analytic means; the dotted grey lines in the figures of residual Zr indicate the adjusted means according to the results of trim-and-fill tests. Solid circles represent the collected data points, whereas the empty circles are data points filled by the trim-and-fill method. 11 1 Data S2: Details of paternity assignments and genetic 2 pedigree 3 We assigned paternity and constructed the genetic pedigree using the genotypes at 4 13 microsatellite loci (Dawson et al. 2012; Schroeder et al. 2012). We assigned the 5 genetic fathers with 95% confidence for approximately 90% of all offspring (Table S2) 6 in software CERVUS 3.0 (Marshall et al. 1998; Hadfield et al. 2006; Kalinowski et al. 7 2006; Schroeder et al. 2012). We genotyped at least two different DNA samples for 8 the great majority of adult birds. 9 Where we obtained two mismatching genotypes from two separate tissue samples 10 from a single individual, the respective samples were genotyped repeatedly. This 11 allowed for precise genotyping and detection of any sample mix-ups. After 12 confirming the genetic maternity of the observed social mothers, we allowed all 13 potentially-alive males to be assigned as sires. We carefully checked each offspring– 14 sire combination where the assigned sire was not the female’s pair-bonded mate. 15 We took an iterative approach and compared the assigned sire with the next-best 16 matching sire. If both sires mismatched at ≤1 locus then we assigned the social mate 17 if it was among the two best sires. If the social mate was not one of these two 18 matching sires then we compared the observation records of both individuals and 19 assigned paternity to the one that was seen or caught closest in time to the 20 respecitve breeding season (in years). This cleared up all ambiguities, and allowed 21 us to distinguish between brothers where both were assigned. 22 In cases where two or more loci mismatched, if the social sire was not assigned by 23 the software, we took the conservative approach and refrained from assigning a sire 12 24 or the extra-pair status. This is the reason we did not assign complete parentage for 25 all sampled chicks. Among those individuals that could not be assigned parentage 26 with fewer than two mismatches, 77% of the DNA samples were extracted from very 27 small tissue or blood samples, dead embryos in rotten eggs, or similarly 28 compromised samples. In these cases, the DNA may have been severely degraded, 29 or present in very low concentration, such that fewer than six loci amplified and we 30 did not assign parentage. This also implies that a larger proportion of unassigned 31 individuals died early in life (Table S2), which should be considered when 32 interpretating these results. However, we currently have no reason to believe that 33 early-life mortality is linked with extra-pair status (Hsu et al. 2014). A more detailed 34 description of the pedigree construction, including information on the probability of 35 sample mix-ups and genotyping error, is available in Schroeder et al. (In revision). 36 37 13 Data S3: Tables and figures with supporting information *Note: Table S1, S3 and Figure S3 are embedded in corresponding places in Data S1. Table S2. Sample size of offspring in house sparrow case study. *complete pedigree means that we identified all social and genetic parents for that individual. S2a. Sample collection of all offspring including unhatched eggs and chicks Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 N[Egg laid] N[Offspring sampled] 244 259 385 515 862 1041 499 409 136 241 291 593 155 219 323 459 761 857 430 356 117 222 213 550 Offspring sampled (%) 63.5 84.6 83.9 89.1 88.3 82.3 86.2 87 86 92.1 73.2 92.7 N[Offspring with genetic paternity assignment] 143 202 300 403 736 844 391 312 104 165 168 478 Sampled offspring with genetic paternity assignment (%) 92.3 92.2 92.9 87.8 96.7 98.5 90.9 87.6 88.9 74.3 78.9 86.9 N[Offspring with complete pedigree] 138 200 278 393 733 844 323 257 71 156 155 391 Sampled offspring with complete pedigree (%) 89.0 91.3 86.1 85.6 96.3 98.5 75.1 72.2 60.7 70.3 72.3 71.1 S2b. Sample collection of unhatched eggs Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 N[Egg N[Egg N[Sampled hatched] unhatched] unhatched eggs] 166 225 317 373 664 776 328 254 93 184 180 474 78 34 68 142 198 265 171 155 43 57 111 119 0 14 40 75 145 144 113 128 32 50 36 76 Sampled unhatched eggs (%) 0.0 41.2 58.5 52.8 73.2 54.3 66.1 82.6 74.4 87.7 32.4 63.9 N[Unhatched eggs with genetic paternity assignment] 0 6 32 58 134 135 81 92 20 0 3 25 Sampled eggs with genetic paternity assignment (%) 0.0 42.9 80.0 77.3 92.4 93.8 71.7 71.9 62.5 0 8.3 32.9 N[unhatched eggs with *complete pedigree] 0 5 25 52 132 135 47 66 8 0 3 20 Sampled eggs with *complete pedigree 0.0 35.7 62.5 69.3 91.0 93.8 41.6 51.6 25.0 0.0 8.3 26.3 14 S2c. Sample collection of chicks that died before fledging Year N[Chicks died before fledging] 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 70 63 139 154 436 436 169 110 25 69 50 112 Chicks died before fledging (%) 42.2 28.0 43.8 41.3 65.7 56.2 51.5 43.3 26.9 37.5 27.8 23.6 N[Sampled Dead chicks sampled (%) dead chicks] 59 43 105 130 388 373 158 84 17 57 47 112 N[Dead chicks with genetic paternity assignment] 84.3 68.3 75.5 84.4 89.0 85.6 93.5 76.4 68.0 82.6 94.0 100.0 47 39 96 129 377 369 151 78 16 52 37 107 Sampled dead chicks with genetic paternity assignment (%) 79.7 90.7 91.4 99.2 97.2 98.9 95.6 92.9 94.1 91.2 78.7 95.5 N[Dead chicks with *complete pedigree] 45 38 87 127 376 369 124 70 13 47 31 83 Sampled dead chicks with *complete pedigree (%) 76.3 88.4 82.9 97.7 96.9 98.9 78.5 83.3 76.5 82.5 66.0 74.1 S2d. Sample collection of fledglings Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 N[fledglings] 96 162 178 219 228 340 159 144 68 115 130 362 Chicks fledged (%) 57.8 72.0 56.2 58.7 34.3 43.8 48.5 56.7 73.1 62.5 72.2 76.4 N[Sampled fledgling] 96 162 178 216 228 340 159 144 68 115 130 362 Sampled fledgling (%) 100.0 100.0 100.0 98.6 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 N[Fledgling with genetic paternity assignment] 96 157 172 216 225 340 159 142 68 113 128 346 Sampled fledgling with genetic paternity assignment (%) 100.0 96.9 96.6 100.0 98.7 100.0 100.0 98.6 100.0 98.3 98.5 95.6 N[Fledgling with *complete pedigree] 93 157 166 214 225 340 152 121 50 109 121 288 Sampled fledgling with *complete pedigree 96.9 96.9 93.3 99.1 98.7 100.0 95.6 84.0 73.5 94.8 93.1 79.6 15 Table S4. Results from the random slope GLMMs, explaining variation in each male house sparrow genotypic or phenotypic trait in the Lundy population. Male status is the male mating status with extra-pair male = 1 and cuckolded male = 0 (the baseline). Posterior means and 95% credible intervals (95% CIs) are presented. rQG is the genetic similarity measurements following Queller and Goodnight’s method, and rLR is the measurements following Lynch and Ritland’s method. Characteristics Fixed effects (Intercept) Mean (95%CI) Genetic similarity rQG rLR Body-size indicators Tarsus Beak length -0.04 (-0.13 to 0.06) 0.09 (-0.02 to 0.21) - -0.01 (-0.10 to 0.08) - 0 (-0.37 to 0.36) 0.01 (-0.05 to 0.08) - Ornaments Badge size Mask area 0.53 (-0.24 to 1.28) 0 (-0.06 to 0.06) -0.09 (-0.11 to 0.07) - -0.61 (-2.01 to 0.89) 0 (-0.04 to 0.03) - 0.22 (-0.93 to 1.47) 0 (-0.06 to 0.05) - -0.5 (-0.82 to -0.2) - - 0.19 (0.13 to 0.25) 0 (0 to 0) 0 (0 to 0) 0 (0 to 0.01) 0 (0 to 0) 0.31 (0.19 to 0.43) -0.46 (-0.6 to -0.33) 0 (0 to 0) 0 (0 to 0) -0.46 (-0.6 to -0.33) 0 (0 to 0.01) 0.01 (0 to 0.01) 1.5 (1.21 to 1.78) Mass Wing length Tail length -0.1 (-0.52 to 0.33) -0.01 (-0.05 to 0.02) - -0.17 (-0.88 to 0.54) 0.04 (-0.02 to 0.1) -0.11 (-0.69 to 0.47) -0.02 (-0.05 to 0.01) -0.04 (-0.05 to 0.03) - Male age ♂ status Mean (95%CI) Month for feather# Mean (95%CI) ♀ age Mean (95%CI) - - - - - Random effects (Intercept) : (Intercept).trio ♂ status : (Intercept).trio Mean (95%CI) Mean (95%CI) ♂ status : ♂ status.trio Mean (95%CI) 0.02 (0.01 to 0.03) 0.01 (0 to 0.02) -0.01 (-0.02 to 0.01) -0.01 (-0.02 to 0.01) 0.06 (0.02 to 0.1) 0.02 (0.01 to 0.03) 0.01 (0 to 0.03) 0 (-0.02 to 0.01) 0 (-0.02 to 0.01) 0.04 (0.01 to 0.08) Observer Mean (95%CI) Mean (95%CI) - - 0.19 (0.01 to 0.49) 0.03 (0 to 0.09) 0.58 (0.07 to 1.48) 0.05 (0 to 0.14) 0.39 (0.05 to 0.97) 0.17 (0.03 to 0.39) 0.49 (0.04 to 1.45) 0.44 (0.08 to 1.04) 3.62 (0.65 to 8.88) 0.04* (0 to 0.13)* 2.16 (0.24 to 5.86) - - - ♂ capture Year Mean (95%CI) - - 0.13 (0.03 to 0.28) 0.23 (0.04 to 0.54) 0.13 (0.02 to 0.28) 0.06 (0.01 to 0.15) - - - ♂ identity Mean (95%CI) - - 1.16 (0.95 to 1.39) 0.66 (0.52 to 0.81) 0.94 (0.77 to 1.13) 0.56 (0.45 to 0.7) 0.07 (0 to 0.16) 0.13 (0 to 0.37) - ♂ capture event ♂ capture month Mean (95%CI) Mean (95%CI) - - 0.01 (0 to 0.02) 0 (-0.02 to 0.01) 0 (-0.02 to 0.01) 0.04 (0.01 to 0.08) 0.1 (0.01 to 0.3) 0.06 (0.01 to 0.14) 0.11 (0.03 to 0.23) 0.8 (0.64 to 0.97) - 0.01 (0 to 0.01) -0.01 (-0.01 to 0) Mean (95%CI) 1.01 (0.88 to 1.14) -0.74 (-0.89 to 0.61) -0.74 (-0.89 to 0.61) 1.50 (1.30 to 1.70) 0.01 (0 to 0.01) 0 (-0.01 to 0) (Intercept) : ♂ status.trio 1 (0.83 to 1.08) -0.72 (-0.85 to 0.59) -0.72 (-0.85 to 0.59) 1.53 (1.34 to 1.74) - - - - - - - - - - 0.41 (0.15 to 0.63) - - - 0.33 (0.23 to 0.43) 0.06 (0 to 0.15) ♂ capture age 0.03 (-0.08 to 0.14) 0 (-0.01 to 0) - -0.01 (-0.01 to 0) 0.22 (0.09 to 0.33) - - - 16 Mating year ♀ identity Dispersion Male (95%CI) Male (95%CI) - - - - - - - - - - - - - - - - - - Mean (95%CI) 0 (0 to 0) 0 (0 to 0) 0.03 (0.02 to 0.03) 0.11 (0.09 to 0.13) 0.07 (0.06 to 0.09) 0.02 (0.02 to 0.03) 0.09 (0.07 to 0.11) 0.03 (0.03 to 0.03) 0.01 (0.01 to 0.01) 0.21 (0.04 to 0.44) 0.26 (0.17 to 0.35) 0.11 (0.04 to 0.2) For ornaments, the male capture age is the male mating age. 17 Table S5. Results from the meta-analyses, explaining the pairwise difference of male genotypic or phenotypic traits between extra-pair and cuckolded males. Positive values indicates that extra-pair males are genetically more similar to focal females, have larger body size, more exaggerated secondary sexual traits or are older than cuckolded males who mated with the same female(s). Here we present results based on Zr with the estimated correlation between extra-pair and cuckolded males (rEW) with Hackett’s phylogenetic tree (see Method). Results from two models were presented in each trait category. The meta-analyses with intercept and no fixed effects show the overall effect size in that trait category. The meta-regression has no intercept but fixed effects (Trait type), showing the trait-specific effect sizes. In the analysis of genetic similarity, trait type indicates the genetic markers used to estimate genetic similarity. In the analyses of body size and secondary sexual trait, trait type indicates which phenotypes were measured. In cases where the original author did not specify which phenotype they measured, we classified such comparisons as (general) body size. In age, the trait type indicates how the original authors classify male mating age, either by age class (first year or older), known specific age, or that the authors did not provide such information. Characteristics Fixed effects (Intercept) Mean (95%CI) Trait type Genetic similarity MetaMeta-regression analysis Body size Metaanalysis -0.03 (-0.15 to 0.09) 0.05 (-0.05 to 0.14) Mean (95%CI) DNA fingerprinting MHC Mean (95%CI) Mean (95%CI) Microsatellite -0.02 (-0.27 to 0.26) -0.07 (-0.30 to 0.15) -0.03 (-0.16 to 0.10) Meta-regression Beak Body size Tarsus Mean (95%CI) Weight Mean (95%CI) Wing length Random effects Phylogeny Mean (95%CI) Species Mean (95%CI) Study Mean (95%CI) 0.01 (0 to 0.02) 0.01 (0 to 0.02) 0.004 (0 to 0.01) 0.003 (0 to 0.01) 0.003 (0 to 0.01) Secondary sexual trait MetaMeta-regression analysis Age Metaanalysis 0.10 (-0.02 to 0.21) 0.09 (-0.01 to 0.20) 0.03 (-0.11 to 0.17) 0.04 (-0.10 to 0.17) 0.06 (-0.05 to 0.18) 0.03 (-0.09 to 0.16) 0.05 (-0.06 to 0.17) 0.004 (0 to 0.01) 0.003 (0 to 0.01) 0.003 (0 to 0.01) Ornament Song 0.07 (-0.04 to 0.18) 0.25 (0.07 to 0.42) Meta-regression Age class Known age Unknown 0.01 (0 to 0.02) 0.01 (0 to 0.02) 0.01 (0 to 0.03) 0 (0 to 0.02) 0 (0 to 0.01) 0.01 (0 to 0.02) 0.01 (0 to 0.02) 0.06 (-0.10 to 0.22) 0.11 (-0.02 to 0.23) -0.05 (-0.57 to 0.49) 0.01 (0 to 0.02) 18 Dispersion Mean (95%CI) 0.003 (0 to 0.01) 0.003 (0 to 0.01) 0.002 (0 to 0.004) 0.002 (0 to 0.01) 0.01 (0 to 0.02) 0.01 (0 to 0.02) 0.01 (0 to 0.02) 0.01 (0 to 0.02) 19 Table S6. The heterogeneity (I2 %) explained by random components of meta-analytic models in each of the four trait categories. Each I2 was shown in the form of a posterior mean with 95% credible intervals, 95% CIs, in parentheses. I2% of 25%, 50% and 75% are referred to as low, moderate and high heterogeneity, respectively. The H2 is the phylogenetic heritability, which is equivalent to Pagel’s λ (Pagel 1999; Hansen & Orzack 2005; Hadfield & Nakagawa 2010) Trait category I2[Article] I2[Species] I2[Phylogeny] I2[Residual] I2[Total] H2 Genetic similarity - - 17.30 (0.68 to 47.31) 10.50 (0.68 to 27.72) 27.79 (5.51 to 56.38) 58.01 (13.09 to 97.95) Body size 8.24 (0.66 to 22.06) 8.42 (0.48 to 22.98) 11.80 (0.70 to 33.60) 4.78 (0.51 to 11.91) 33.25 (13.12 to 56.69) 33.25 (16.04 to 72.83) Secondary sexual trait 11.08 (0.28 to 32.49) 7.16 (0.29 to 20.37) 7.86 (0.25 to 24.24) 8.64 (0.30 to 25.42) 34.74 (12.26 to 58.26) 22.60 (0.63 to 61.35) Age - - 16.38 (0.47 to 47.17) 21.38 (0.65 to 49.57) 37.76 (8.93 to 66.11) 43.03 (1.99 to 92.61) 20 Figure S1. Heterozygosities in each cohort year based on the 13 microsatellite loci that we used for paternity analysis. In each cohort year, we included only individuals born or first caught in that year, and excluded the known immigrants, to estimate the heterozygosities. The numbers in parentheses, represent sample sizes in each cohort year. The means and standard errors of the heterozygosities are presented. The solid line represents the observed heterozygosity values and the dashed line represents the expected heterozygosity values. 21 Figure S2. The proportion of extra-pair offspring among all offspring (EPP %) per cohort in Lundy Island house sparrows. The sample sizes (n) represent the total number of offspring with known social and genetic parents in each year. 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 References Akçay E, Roughgarden J (2007) Extra-pair paternity in birds: Review of the genetic benefits. Evolutionary Ecology Research 9, 855-868. Arnqvist G, Kirkpatrick M (2005) The evolution of infidelity in socially monogamous passerines: The strength of direct and indirect selection on extrapair copulation behavior in females. American Naturalist 165 Suppl 5, S26-37. Auld JR, Perrins CM, Charmantier A (2013) Who wears the pants in a mute swan pair? Deciphering the effects of male and female age and identity on breeding success. Journal of Animal Ecology 82, 826-835. Cleasby IR, Nakagawa S (2012) The influence of male age on within-pair and extra-pair paternity in passerines. Ibis 154, 318-324. Cohen J (1988) Statistical power analysis for the behavioral sciences Academic Press, New York. Dawson DA, Horsburgh GJ, Krupa AP, et al. (2012) Microsatellite resources for Passeridae species: A predicted microsatellite map of the house sparrow Passer domesticus. Molecular Ecology Resources 12, 501-523. Duval S, Tweedie R (2000a) A nonparametric "trim and fill" method of accounting for publication bias in meta-analysis. Journal of the American Statistical Association 95, 89-98. Duval S, Tweedie R (2000b) Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics 56, 455-463. Egger M, Smith GD, Schneider M, Minder C (1997) Bias in meta-analysis detected by a simple, graphical test. British Medical Journal 315, 629-634. Ericson PGP, Anderson CL, Britton T, et al. (2006) Diversification of Neoaves: Integration of molecular sequence data and fossils. Biology Letters 2, 543-U541. Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Statistical Science 7, 457-511. Hackett SJ, Kimball RT, Reddy S, et al. (2008) A phylogenomic study of birds reveals their evolutionary history. Science 320, 1763-1768. Hadfield JD (2010) MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. Journal of Statistical Software 33, 1-22. Hadfield JD, Nakagawa S (2010) General quantitative genetic methods for comparative biology: Phylogenies, taxonomies and multi-trait models for continuous and categorical characters. Journal of Evolutionary Biology 23, 494-508. Hadfield JD, Richardson DS, Burke T (2006) Towards unbiased parentage assignment: Combining genetic, behavioural and spatial data in a Bayesian framework. Molecular Ecology 15, 3715-3730. Hansen TF, Orzack SH (2005) Assessing current adaptation and phylogenetic inertia as explanations of trait evolution: The need for controlled comparisons. Evolution 59, 2063-2072. Higgins JPT, Thompson SG (2002) Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 21, 1539-1558. Higgins JPT, Thompson SG, Deeks JJ, Altman DG (2003) Measuring inconsistency in metaanalyses. British Medical Journal 327, 557-560. Hsu YH, Schroeder J, Winney I, Burke T, Nakagawa S (2014) Costly infidelity: Low lifetime fitness of extra-pair offspring in a passerine bird. Evolution 68, 2873-2884. Jetz W, Thomas GH, Joy JB, Hartmann K, Mooers AO (2012) The global diversity of birds in space and time. Nature 491, 444-448. 23 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 Kalinowski ST, Wagner AP, Taper ML (2006) ML-RELATE: A computer program for maximum likelihood estimation of relatedness and relationship. Molecular Ecology Notes 6, 576-579. Kempenaers B (2007) Mate choice and genetic quality: A review of the heterozygosity theory. Advances in the Study of Behavior 37, 189-278. Lynch M (1991) Methods for the analysis of comparative data in evolutionary biology. Evolution 45, 1065-1080. Marshall TC, Slate J, Kruuk LEB, Pemberton JM (1998) Statistical confidence for likelihoodbased paternity inference in natural populations. Molecular Ecology 7, 639-655. Nakagawa S, Cuthill IC (2007) Effect size, confidence interval and statistical significance: A practical guide for biologists. Biological Reviews 82, 591-605. Nakagawa S, Freckleton RP (2008) Missing inaction: The dangers of ignoring missing data. Trends in Ecology & Evolution 23, 592-596. Nakagawa S, Santos ESA (2012) Methodological issues and advances in biological metaanalysis. Evolutionary Ecology 26, 1253-1274. Nakagawa S, Schielzeth H (2010) Repeatability for Gaussian and non-Gaussian data: A practical guide for biologists. Biological Reviews 85, 935-956. Pagel M (1999) Inferring the historical patterns of biological evolution. Nature 401, 877-884. Potti J (2000) Causes and consequences of age-assortative pairing in pied flycatchers (Ficedula hypoleuca). Etologia 8, 29-36. Rohlf FJ (2006) A comment on phylogenetic correction. Evolution 60, 1509-1515. Schroeder J, Burke T, Mannarelli ME, Dawson DA, Nakagawa S (2012) Maternal effects and heritability of annual productivity. Journal of Evolutionary Biology 25, 149-156. Schroeder J, Nakagawa S, Rees M, Burke T (In revision) Reduced fitness in progeny from old parents in a wild population. Stone EA (2011) Why the phylogenetic regression appears robust to tree misspecification. Systematic Biology 60, 245-260. Viechtbauer W (2010) Conducting meta-analyses in R with the metafor package. Journal of Statistical Software 36, 1-48. 78 24