Comparison of effect size estimates Effect size estimates describe the proportion of variation in the dependent variable that can be explained by variation in the independent variable. Although probability values will tend to decline with increasing effect sizes, effect size estimates are not tests of statistical significance (these are χ2 for Kendall’s coefficient of concordance and F-ratio for ANOVA) . It is therefore possible to obtain a test-statistic associated with a significant probability value, indicating that the independent variable has an effect on the dependent variable, associated with a low effect size estimate, indicating that the magnitude of the significant effect is low. In the case of Kendall’s coefficient of concordance ‘W’ gives the proportion of ‘agreement’ in the ranking of responses between trials, such that when W = 0 there is no agreement, indicating that ranks are completely random, whereas if W = 1 there is complete consistency between trials. Thus, W gives a scaled measure of the effect size of consistency in ranks between trials. This may be compared directly with equivalent scaled measures of effect size used in other types of statistical test such as the Eta squared values associated with ANOVA tests. Eta squared is given by η2 = SSeffect / SStotal, where SS = sum of squares. This measure estimates the proportion of variance in responses that is due to the effect of treatment group (e.g. ‘trial type’ in this study). If η2 = 0, then none of the variance in response is due to the effect of treatment group, whereas if η2 = 1 then all of the variance in response is due to the effect of treatment group. In one way ANOVA, the partial Eta squared values, η2p = SSeffect / (SSeffect + SSerror), normally provided by statistical software (e.g. SPSS) also scale to 1, giving the same value as ‘classical’ η2. Thus, our reports of η2p for one-way tests as used for comparison with Kendall’s W (see Table 1) are identical to η2. In multivariate models, however, η2p values may not scale to a maximum of 1. It is therefore important to note that in studies relying on comparisons between multivariate tests, η2 should be used instead of η2p (see Levine & Hullett 2002 for a discussion of this point). Note that although we are interested in ‘consistent individual variation in behaviour’ the statistical tests and effect size estimates used here and in previous studies must relate to samples containing multiple individuals. In the case of behavioural plasticity it is intuitive that variation between situations can only be statistically tested for by comparing average responses, generated from a sufficient number of individuals to provide adequate replication. Similarly, consistency in behavioural differences between individuals is a property of a sample of individuals, where tests of correlation or concordance between situations indicate the stability of individual responses relative to one another.