Group size versus individual group size frequency distributions: a nontrivial distinction Roger Jovani a, *, Roddy Mavor b,1 a b Estación Biológica de Doñana, CSIC Seabird Monitoring Programme, JNCC Keywords: colony size crowding group living group size individual group size seabird Understanding group size variation is a major challenge in animal ecology. However, we argue that understanding group sizes from an individual point of view (i.e. individual group sizes) and the relationship with population group sizes may be even more important. This may seem redundant, but in the present study we show that it is not. We analysed colony sizes of 20 seabird species breeding in Britain and Ireland from the Seabird 2000 project (19 978 colonies; 3 779 919 nests) comparing group (¼colony) size frequency distributions (GSFDs) with their individual group size frequency distribution (IndGSFD) counterparts. We did so for the first time for a number of species with semilogarithmic plots, and correlated eight statistics from each GSFDeIndGSFD pair. Shape-related variables (e.g. skewness) of GSFDe IndGSFD pairs were highly unrelated with only 1e15% of redundancy. In fact, species with similar GSFDs had individuals concentrating in either the largest or the medium-sized groups. There was a trend towards those species with higher group size variation having individuals living in a narrower range of group sizes. Some group size-related measures (e.g. mean group size) showed a tight linear correlation in logelog scatterplots between GSFDs and IndGSFDs. However, this correlation disappeared in linear scatterplots for two of the four measures. Moreover, group size-related measures were always a poor surrogate of corresponding individual group size measures. We discuss how animal grouping research could benefit from similar comparisons between GSFDs and IndGSFDs and how this can be carried out in a meaningful way. Most animals live in groups either temporarily or permanently. Group size shapes the cost/benefit payoff of group living, with some group sizes often conferring higher fitness than others (Krause & Ruxton 2002). However, empirical and modelling approaches have shown that even when there is a clear peak in the fitness function of group sizes (i.e. there is an ‘optimal’ group size), a huge variation in group sizes still tends to exist. After decades of study, understanding this variation remains an unsolved challenge in animal ecology research (Giraldeau & Caraco 2000; Gerard et al. 2002; Krause & Ruxton 2002; Safran et al. 2007; Sumpter 2010). A major driver of this research agenda has been the description of group size frequency distributions (hereafter GSFDs; e.g. Götmark 1982; Wirtz & Lörscher 1983; Brown et al. 1990; Stacey & Koenig 1990; Avilés & Tufiño 1998; Krause & Ruxton 2002; Jovani & Tella 2007; Serrano & Tella 2007; Jovani et al. 2008a,b). These studies * Correspondence: R. Jovani, Department of Evolutionary Ecology, Estación Biológica de Doñana, CSIC, Américo Vespuccio s/n, E-41092 Sevilla, Spain. E-mail address: jovani@ebd.csic.es (R. Jovani). 1 R. Mavor is at the Seabird Monitoring Programme, JNCC, Inverdee House, Baxter Street, Aberdeen AB11 9QA, U.K. examined group sizes from a population point of view. However, group sizes can be viewed from an individual point of view as well. Describing individual group size selection, the reasons behind these choices and its constraints has proved to be a powerful mechanistic approach to explaining population group size patterns (Brown & Brown 2000; Safran 2004; Safran et al. 2007; Serrano & Tella 2007; Jovani et al. 2008b). However, surprisingly few studies have analysed, per se, individual group size frequency distribution patterns (IndGSFD; but see Jarman 1974; Wirtz & Lörscher 1983; Weso1owski et al. 1985; Reiczigel et al. 2005, 2008). An illustrative example of this uneven attention to GSFDs versus IndGSFDs is the book on cooperative breeding in birds edited by Stacey & Koenig (1990) in which 14 of 18 chapters (each covering a study species) show a histogram of the GSFD of the population, but only one chapter (Emlen 1990) shows both the GSFD and the IndGSFD. This previous lack of attention paid to IndGSFDs could be because the properties (e.g. mean) of GSFDs and their IndGSFD counterparts are biologically redundant, thus presenting only a mathematical subtlety without biological relevance. In fact, some evidence would suggest that this might be the case. First, a given GSFD has a unique IndGSFD counterpart. For instance, in a hypothetical population of 16 individuals distributed among five Number of groups (colonies) (a) 100 50 0 (b) 400 200 0 (c) 100 50 0 (d) 140 70 0 (e) 14 7 0 (f) 70 35 0 (g) 280 140 0 90 45 0 50 25 0 400 200 0 0 150 000 300 000 300 150 0 0 400 000 800 000 0 400 000 800 000 0 10 000 20 000 0 4000 8000 0 3000 6000 (n) 80 40 0 (o) 400 200 0 (i) 0 14 000 28 000 400 200 0 (j) 0 1300 2600 0 3000 6000 24 12 0 800 400 0 (h) (l) (m) 10 5 0 0 160 000 320 000 0 10 000 20 000 500 250 0 200 100 0 0 12 000 24 000 (k) 0 9000 18 000 0 300 600 0 1400 2800 (p) 0 17 000 34 000 Number of individuals (nests) 100 50 0 (q) 0 100 000 200 000 (r) 0 2000 4000 (s) 0 1000 2000 (t) 0 200 400 Group size Figure 1. Semilogarithmic group (colony) size frequency distributions (in black; left Y axis) and corresponding individual frequency distributions (in grey; right Y axis) for 20 seabird species breeding in Britain and Ireland. Logarithmic bins of the form [Xn,Xnþ1 — 1] with n ¼ 0,1,2,3. are used; for instance, for X ¼ 2, bins are [1e1], [2e3], [4e7], [8e15]. The X axis shows the logarithmic midpoint of the bin (i.e. 10(log(minimum group size of the bin)þlog(maximum group size of the bin))/2), and the linear Y axis shows the number of groups (or individuals groups of sizes 2, 2, 3, 4 and 5, specific individuals will be present in (i.e. experience) groups of sizes 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5 and 5 (individual group sizes). Thus, GSFDeIndGSFD pairs are completely interlocked, and thus potentially redundant. Second, although the mean of an IndGSFD is always larger than that of its GSFD counterpart (Preston 1948, 1962; Lloyd 1967), the distinction between the two may be biologically meaningless. For instance, in the above example, mean group size is (2 þ 2 þ 3 þ 4 þ 5)/5 ¼ 3.2 and the mean individual group size is (2 þ 2 þ 2 þ 2 þ 3 þ 3 þ 3 þ 4 þ 4 þ 4 þ 4 þ5 þ 5 þ 5þ 5þ 5)/16 ¼ 3.625; surely not a large difference in biological terms. Finally, Lloyd (1967) showed that the mean of an IndGSFD is larger than the mean of its GSFD as much as the variance/ mean of its GSFD, thus showing that one is the trivial predictable outcome of the other [e.g. in the above example mean IndGSFD ¼ 3.2 þ (1.36/3.2) ¼ 3.625]. Moreover, Iwao (1968) and recently Reiczigel et al. (2005, 2008) have shown a very tight linear correlation between log(mean GSFD) and log(mean IndGSFD) across different taxa, suggesting that mean GSFDs and mean IndGSFDs hold essentially the same biological information. However, we show here that GSFD measures should not be used as surrogates of corresponding IndGSFD measures, and that the direct study of IndGSFDs combined with GSFDs can reveal interesting nonredundant information about group living. First, understanding IndGSFDs may be biologically even more important than understanding group size variation. This is because most of the processes shaping the ecology and evolution of species (natural selection/ demography) have the individual rather than the group as the unit. For instance, if breeding success is lowered at large group sizes (negative density dependence), an important measure of the impact of these processes upon population demography will not be the proportion of large/small group sizes in the population, but rather the proportion of individuals breeding within such group sizes. Second, contrary to the evidence stated above, IndGSFDs may not yield redundant information about their GSFD counterparts. This is because natural GSFD patterns do not follow the same and ideal theoretical distributions, but are considerably more complex. For instance, in a previous study we showed that similarly shaped GSFDs from 20 seabird species when plotted in standard histograms hide contrasting patterns that are unravelled when the same data are plotted with logarithmic bins (Jovani et al. 2008a). Thus, we predicted that GSFDs with different combinations of skewness, variability or maximum group sizes could have nontrivial impacts on their IndGSFDs. We reanalysed this seabird data set by comparing GSFDs and their IndGSFDs counterparts. METHODS We built on a previous study by Jovani et al. (2008a) in which we analysed the colony sizes (here also called group sizes) of seabird species breeding in Britain and Ireland. This is a data set from Seabird 2000, a collaboration between the Joint Nature Conservation Committee, U.K., and the Royal Society for the Protection of Birds, U.K. The project involved over 1000 surveyors following detailed instructions for the census of each seabird species. No less important was the meticulous checking during the process of data entry, both by routine quality control by the Recorder 2000 software, and later by data entry personnel. The result is the highest-quality data on a snapshot (mainly 1998e2002) of bird colony sizes for a large area, and possibly the largest data set on animal group sizes considering different species in a large area. Overall, it covers 20 seabird species, 19 978 colonies and 3 779 919 nests. For further details of Seabird 2000 see Mitchell et al. (2004), and of the data set analysed here see Jovani et al. (2008a). Plotting Frequency Distributions The data set of individual group sizes for each species was created from their GSFDs as explained in the hypothetical example in the Introduction, that is, with one value (the colony size in which the breeding pair was nesting) for each breeding pair of the species. Our unit of measure is typical in bird coloniality studies, that is, the breeding pair (the nest), and thus we used the nest as the ‘individual’ of IndGSFDs to lend comparability to other studies on group living. IndGSFDs were plotted in semilogarithmic plots (Fig. 1) following the same procedures as for GSFDs detailed in Jovani et al. (2008a), where we used Preston’s (1962) methods with slight modifications (see Fig. 1 and Pueyo & Jovani 2006 for details). GSFD and IndGSFD Statistics Seabird GSFDs are clearly not Gaussian distributions, but show distributions closer to log-normal and power laws (Jovani et al. 2008a). Thus, parametric measures such as the mean (and even lognormal measures such as the geometric mean) are not the most appropriate. The only parametric measure used was the mean to compare our results with those of Iwao (1968) and Reiczigel et al. (2005, 2008). Overall, we calculated eight statistics from each GSFD and IndGSFD. Our aim was to achieve a general description of the characteristics of GSFDs and IndGSFDs to be able to compare these two ways of looking at group size frequency distributions. For the size of the groups (or of individual group sizes) of each species we calculated the 5th percentile, the median, the mean and the 95th percentile. Minimum and maximum group sizes and individual group sizes were not measured because they are, by definition, the same for GSFDeIndGSFD pairs. To characterize the shape of the distributions we calculated the skewness (a measure of asymmetry), fit of GSFDs and IndGSFDs to a log-normal distribution as measured by the KolmogoroveSmirnov statistic, kurtosis (a measure of ‘peakedness’ around the mean), and population variability, a nonparametric counterpart of the coefficient of variation (CV) which quantifies the mean deviation of all group size pairs within populations (see Heath 2006 for details). Statistics were calculated with standard MatLab (MathWorks, Natick, MA, U.S.A.) functions applied to nontransformed group (and individual group) sizes. The fit to a log-normal distribution was calculated with log-transformed group sizes and individual group sizes. Population variability was calculated by modifying the code in version 1.1 of the variability calculator for MatLab by Heath (2006). All measures along with the data necessary to plot the frequency distributions were retrieved from the MatLab algorithm freely available from the Supplementary Material. We used the Pearson productemoment correlation coefficient to calculate the linear correlation of each statistic between each GSFDeIndGSFD pair across the 20 analysed species. Although it is impossible to determine accurately whether our data (20 values for each statistic) follow a normal distribution, we used Pearson instead of rank correlation coefficients (e.g. Spearman correlation) because the latter do not test for the tightness of the correlation to a linear one (which is what we wanted to test) but rather for the level of correlation in the increase in x relative to y. From the scattering of data in Figs 2 and 3, we thought Pearson correlations were better for IndGSFDs) for each bin. Note that all black bars must have their corresponding grey bar below, but because of the highly right-skewed distributions there are some grey bars that are too narrow to be visualized, e.g. in (g). Note the log scale only in the X axis. (a) Uria aalge; (b) Rissa tridactyla; (c) Fulmarus glacialis; (d) Alca torda; (e) Sterna paradisaea; (f) Hydrobates pelagicus; (g) Fratercula arctica; (h) Phalacrocorax aristotelis; (i) Chroicocephalus (¼Larus) ridibundus; (j) Phalacrocorax carbo; (k) Larus argentatus; (l) Cepphus grylle; (m) Larus canus; (n) Sterna albifrons; (o) Sterna hirundo; (p) Larus fuscus; (q) Puffinus puffinus; (r) Larus marinus; (s) Stercorarius skua; (t) Stercorarius parasiticus. suited for Fig. 3, and thus we interpreted Pearson correlations from Fig. 2 with caution. Pearson correlation r ¼ 1 (or —1) would indicate that all data fall along the linear trend line fitted to the data, and values closer to 0 would indicate a complete scatter of values around the fitted trend line. The coefficient of determination, R2 (r squared), was calculated as a measure of the variance in each IndGSFD statistic (e.g. median individual group size) explained as a linear function of the corresponding GSFD counterpart (e.g. median group size of the population), that is, a measure of the redundancy, r ¼ 1, meaning that GSFDeIndGSFD pairs provide essentially the same information about the grouping patterns of the species. important because if all species followed the same distribution, IndGSFDs would be easy to predict from its GSFD (e.g. compare Fig. 1a and b). However, Fig. 1 shows that this is not so trivial for real animal grouping patterns. For instance, GSFDs in Fig. 1d and g are similar, but their IndGSFDs (grey bars in Fig. 1) are very different, while Fig. 1g and p show contrasting GSFDs but similar IndGSFDs. Note that these patterns remain hidden when we plot the same data in linear (standard) histograms (Appendix Fig. A1). This apparent lack of a general rule linking GSFDeIndGSFD pairs leads us to ask whether GSFDeIndGSFD pairs provide redundant or complementary information. GSFD versus IndGSFD Statistics RESULTS GSFDs versus IndGSFDs Histograms Figure 1 (black bars) shows the same GSFDs as those previously reported in Jovani et al. (2008a). These are semilogarithmic plots in which a distribution with a Gaussian shape thus corresponds to a lognormal distribution. Seabirds in Britain and Ireland show contrasting GSFDs, from clear log-normal distributions (e.g. Fig. 1a, b; Jovani et al. 2008a) to very skewed log-normal distributions (Fig. 1ret; following power laws as detailed in Jovani et al. 2008a). However, species show different combinations of kurtosis and skewness (Figs 1, 2) so that many of the distributions depart from neat log-normals (e.g. Fig. 1g, k). This is 2 Kurtosis and population variability showed a negative correlation between IndGSFDs and GSFDs (ca. —0.4), explaining ca. 15% of variance (Fig. 2, Table 1). This did not reach statistical significance, possibly because of low sample size, but also note the potential effect of outliers. In any case, the scattering of data around the trend line was considerable. In general, the shape of the IndGSFDs was not redundant with their GSFDs: only 1e15% of the variance in IndGSFD characteristics was explained by corresponding GSFD characteristics (Fig. 2, Table 1). As expected, all IndGSFDs were more left skewed (with lower skewness values) than their GSFDs counterparts (Fig. 2). This is because the mean IndGSFD is constrained to being larger than the 0.4 (b) (a) 1 q 0.3 s Individual group size frequency distribution t h e km r c l d j n o b a i p f 0 −1 −2 g f −1 0 np o 0.1 i d a bc e q g −2 1 2 8 t j k hl 0 r 0.1 0.2 0.8 m 6 0.7 s k ep c d h o i b a f j ln g r t 5 f 3 0.6 a i l b n oj qch ke 0.5 p q r 0.4 2 st m 1 1 0.4 (d) q g 4 0.3 0.9 (c) 7 m 0.2 s 2 3 4 5 6 7 8 0.3 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Group size frequency distribution Figure 2. Correlation of shape-related statistics describing group size frequency distributions and their corresponding individual group size frequency distributions for the 20 seabird species studied. (a) Skewness, (b) KolmogoroveSmirnov, (c) kurtosis and (d) population variability. Species codes are the same as in Fig. 1. 100 000 (a) (b) 100 000 q g 10 000 10 000 q 1000 g Individual group size frequency distribution 100 10 dpi el q m h ln s 100 c r 10 b dc b 1 100 000 i 1000 a f a f p m k e o hn l j t 1 (c) q 10 000 g f a p j dc m 1000 k e s r 100 h o (d) 100 000 p 10 000 mk e s h r o b 1000 dci g a f b q j j 100 n l t l n t 10 10 1 1 Group size frequency distribution Figure 3. Correlation of log(size-related statistics) describing group size frequency distributions and their corresponding individual group size frequency distributions. See Table 1 for correlation statistics. Species codes are the same as in Fig. 1. (a) 5th percentile, (b) median, (c) mean and (d) 95th percentile. Grey vertical lines show the potential range of log(individual group size) values for a species according to its group size frequency distribution (see Discussion for more details). For instance, since the mean group size of species t was 3.3 and its maximum group size was 107, this species could only have a mean individual group size between 3.3 and 107, and shows an intermediate empirical value of 22. However, species q could have values between 6269.3 and 120 000 and shows a value close to its potential maximum (82 431.7). mean GSFD (see above), and thus the left tail of IndGSFDs extends further than in the corresponding GSFDs (Fig. 1). However, skewness of IndGSFDs and GSFDs was uncorrelated across species (Table 1, Fig. 2). The other three shape-related characteristics showed indistinctly higher or lower values in GSFDs than IndGSFDs (Fig. 2). The fit to a log-normal distribution was uncorrelated between GSFDs and IndGSFDs. In other words, any combination was possible. This is easy to visualize comparing Fig. 1a, g and r. In Fig. 1a, a neat log-normal GSFD leads to a slightly left-skewed log-normal IndGSFD, but in Table 1 Pearson correlation coefficients (r), coefficient of determination (i.e. variance explained by the linear model, R2), and P values for each graph in Figs 2, 3 and A2 Raw data r Shape-related statistics Skewness 0.112 KolmogoroveSmirnov 0.219 Kurtosis —0.388 Population variability —0.392 Size-related statistics 5th percentile 0.201 Median 0.179 Mean 0.973 95th percentile 0.865 Log (data) 2 R P r 0.013 0.048 0.151 0.154 0.639 0.354 0.091 0.087 0.041 0.032 0.950 0.749 0.395 0.449 <0.001 <0.001 R2 P Fig. 1g a slight departure from a log-normal shape of the GSFD produces a highly skewed IndGSFD. The opposite occurs in Fig. 1r. Group size-related measures were also analysed for their correlation between GSFDs and corresponding IndGSFDs. This was done for raw variables and also for their logarithms (to compare with Reiczigel et al. 2008), because these are two approaches that give complementary information. Mean and 95th percentile group sizes were highly correlated in both linear and logelog plots (Fig. 3, Table 2, Appendix Fig. A2), with 75e95% of redundancy (Table 1). However, despite this strong linear correlation, mean and 95th percentile group sizes were very different between GSFDs and corresponding IndGSFDs (Table 2). Median and 5th percentile group sizes were significantly correlated in logelog plots but not when raw data were analysed (compare Fig. 3 and Fig. A2, Table 1), and raw data were highly different between GSFD and IndGSFDs (Table 2). DISCUSSION 0.610 0.734 0.903 0.838 0.372 0.539 0.815 0.702 <0.001 <0.001 <0.001 <0.001 Our results show the first comparison of GSFD versus IndGSFD in semilogarithmic histograms. They have revealed a nontrivial relationship between the group sizes of a population and the group sizes in which individuals live, something difficult to appreciate in standard histograms (compare Fig. 1 and Fig. A1). This challenges Table 2 Group size-related measures for group size frequency distributions (GSFDs) and corresponding individual group size frequency distributions (IndGSFDs) 5th percentile Median Mean 95th percentile SC GSFD IndGSFD SC GSFD IndGSFD SC GSFD IndGSFD SC GSFD IndGSFD d l g k m p r i h j t s n o e c f q a b 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 5 6 1 2 2 4 4 6 7 9 14 14 17 35 46 65 579 50 309 3286 517 165 t r s o m p l n h k e i j d g c f q b a 1 2 2 6 6 6 7 8 8 10 14 16 25 27 29 31 59 61 154 205 8 41 195 157 300 3309 31 50 68 295 200 2500 125 961 40 000 950 6800 101 800 2361 8679 t r s l n h m o j e k p d c i b f g a q 3 9 13 15 18 23 32 33 51 52 52 117 166 181 210 613 845 1226 1534 6269 22 142 994 50 67 217 3767 310 186 740 1565 8005 2728 3055 4424 3894 12 191 33 250 14 856 82 432 t s r l m h n o e k j p d c i b g f a q 12 30 37 50 80 84 85 122 166 190 200 234 613 710 800 2759 3104 4866 7344 41 697 98 2293 983 208 11 219 1720 220 1033 4000 10 129 558 19 487 11 384 12 276 14 575 11 077 59 471 27 297 75 493 120 000 SC: species codes are the same as in Fig. 1. previous evidence suggesting the redundancy of this double approach to animal group sizes (see Introduction). Mean Group Size We confirm the unavoidable mathematical fact that individuals live in larger groups than the average group size in their population, and the strong linear correlation between log(mean group sizes) and log(individual mean group sizes) previously reported by Iwao (1968) and Reiczigel et al. (2005, 2008). Reiczigel et al. (2008, page 719) argued that ‘Since mean group size tends to predict mean crowding (Fig. 3), this approach may also be useful as a rough approximation’. However, our results contradict this interpretation for the following reasons. First, note that the apparent good fit shown in Fig. 3 (Table 1) and the similar Figure 3 in Reiczigel et al. (2008) is not so surprising when considering the potential individual group size values that a species can have with a given GSFD. This is what we have attempted to illustrate in Fig. 3 with the vertical grey lines. Given that a statistic (e.g. mean) of individual group sizes is always larger than its group size counterpart (Fig. 3; Preston 1962; Lloyd 1967), and that individual group sizes can never be larger than the maximum group size of the population (i.e. individuals cannot live in larger groups than the largest group of the population), these grey lines show the range of individual group size values that a given population can have. This shows that a tight linear fit to log(data) is simply a mathematical constraint imposed by the interlocked nature of GSFDeIndGSFD pairs: any random distribution of dots within the grey lines would create a tight linear correlation. Second, even within the narrow range of values that a species could exhibit in Fig. 3, species differ considerably in their relative position within their corresponding grey line. This is very biologically relevant because of the logarithmic scale (compare Fig. 3 and Fig. A2), even hiding paradoxical situations (also acknowledged by Reiczigel et al. 2005, 2008): species with clearly larger group sizes can have individuals living in clearly smaller groups. For instance, Rissa tridactyla has a median group size of 154 nests, clearly larger than the 29 nests for Fratercula arctica (species b and g in Table 2, respectively). However, the median individual of species b lives in groups of 2361 nests and that of species g in groups of 40 000 nests. This could imply a huge difference in the ecology of the population (e.g. for the strength of negative density-dependent processes) and in the evolution of the species (e.g. behavioural adaptations to living in a given social scenario). Third, even for variables showing a tight linear correlation between GSFDs and IndGSFDs on logelog (Fig. 3) and linear axes (Fig. A2), GSFD measures (e.g. mean group size) were a poor approximation of corresponding IndGSFD measures; to be at least a rough approximation they would need to be close to the x ¼ y line in Fig. A2 (see also raw data in Table 2). Note that if one is interested in the mean group size experienced by individuals in a given species, the mean group size is a poor predictor (Table 2). For instance, suppose Chroicocephalus ridibundus (species i) suffers a strong negative density dependence on breeding success when nesting in colonies larger than 2500 pairs. In that case, the median colony size would be highly misleading (i.e. 16 nests) in the evaluation of the demographic consequences of this density dependence, because it would suggest a negligible effect. However, in fact, 50% of the population breeds in colonies larger than 2500 nests (i.e. median individual colony size ¼ 2500; Table 2), thus having a probable effect on individual fitness and population demography. Overall, this shows that while it is true that in a comparative (interspecific) study on seabirds, one can infer the log(mean individual group size) from the log(mean group size) of the species, it is also true that for a given species, one can only predict that mean individual group size will be larger than mean group size and lower than the maximum group size in the population, thus losing relevant information on the group sizes experienced by individuals. In any case, GSFD measures are a poor approximation of corresponding IndGSFD measures (Table 2). Other Group Size Statistics We have analysed not only the mean but also several other statistics of GSFDeIndGSFD pairs and we have found interesting new information potentially linking individual behaviour and population patterns. For instance, in half of the studied species, individuals live in a definite range of intermediate to large group sizes (e.g. Fig. 1a, h, l, r), avoiding the lower half of their GSFDs. In almost another half of the species, individuals cluster in the largest group sizes (e.g. Fig. 1f, g, p, q). In others, there does not seem to be a clear preference (e.g. Fig. 1t). In fact, Fig. 2 shows either a lack of correlation or a negative correlation between group size variation and individual group size variation. This challenges our view about the link between individual behaviour and group size population patterns and poses a paradox: species with larger group size variation have a larger proportion of individuals concentrated in particular group sizes. The Relevance of Logarithmic Binning Animal GSFDs often do not follow normal distributions but show highly right-skewed frequency histograms, with many small groups and very few large ones (Götmark 1982; Wirtz & Lörscher 1983; Brown et al. 1990; Stacey & Koenig 1990; Avilés & Tufiño 1998; Krause & Ruxton 2002; Jovani & Tella 2007; Serrano & Tella 2007; Jovani et al. 2008a,b). However, this apparent uniformity among species in their GSFDs is not real, but the result of a weak plotting history in animal grouping research. For populations/species with small ranges of group sizes (ca. 1e50), using linear bins (e.g. [1,5], [6,10], [11,15].) clearly highlights the underlying distribution even for highly skewed distributions (see several good examples in Stacey & Koenig 1990). Often, however, group sizes range from a few individuals to several hundred or even hundreds of thousands. This makes linear bins a poor choice for detecting differences in GSFD properties across time, space or taxa, because very large groups inevitably confine most of the groups in the smallest one/few bins. Here, we have used logarithmic bins (see Methods). This approach has been key to unravelling the surprising nontrivial relationship between GSFDs and IndGSFDs. This is easy to appreciate in Fig. A1, where we have plotted in standard histograms the same data as in Fig. 1, and where the difficulty of visualizing the contrasting patterns within and between GSFDs and their IndGSFDs found in Fig. 1 is apparent. Also, semilogarithmic plots are a direct way of assessing how individuals are distributed across group sizes. This is a powerful way of identifying either possible preferences of individuals for particular group sizes (something difficult to appreciate from GSFD alone) or the relevance that particular processes (e.g. high negative density dependence in survival in large group sizes) could have upon a population. Individual Group Sizes versus Crowding We have not used the term ‘crowding’ recently coined by Reiczigel et al. (2005). ‘Crowding’ has been an interesting contribution that opens the analysis to any statistic of IndGSFDs instead of only focusing on the mean group size experienced by individuals (i.e. the ‘typical group size’ of Jarman 1974). However, we prefer ‘individual group size’ because it is the logical individual counterpart of ‘group size’ without any connotation about the consequences of group size. ‘Crowding’ suggests that larger group sizes imply greater density. This is true when space is finite as occurs when parasite intensity increases in a host, or similarly sized hosts show different parasite intensities (Poulin 2007). However, this need not be the case in other situations. For instance, nest spacing in seabird colonies is often constant despite colony sizes ranging from tens to thousands of nests (Nelson 1980, page 125). Conservation Implications It was not necessary to plot IndGSFD for seabirds in Britain and Ireland to know that there are some species such as Puffinus puffinus in which a few colonies harbour a large proportion of the total population, and, thus, are colonies of special conservation concern (Mitchell et al. 2004). However, we believe that plotting IndGSFDs of all species together as in Fig. 1 gives a more informed point of view on the degree in which this occurs in the different species. This is especially important because by only knowing the sizes of colonies (black lines in Fig. 1) it is difficult, without plotting them, to predict how concentrated, in a few large colonies, the population is (e.g. compare black and grey bars in Fig. 1d versus g or in r versus s). For instance, knowing the maximum colony sizes of a species is not enough to know the proportion of the total breeding population that will be lost if, for instance, the largest five colonies are destroyed (and birds do not move to other colonies; Fig. 4). Obviously, species with the largest colonies are more sensitive to losing one of their five largest colonies. However, the correlation (r ¼ 0.498, P ¼ 0.026) only explained R 2 ¼ 0.248 of variation and 20e80% of a population can be contained in the five largest colonies of a large-colony species (e.g. a, g, q), clearly a significant range for the purposes of conservation. 1 q Proportion of nests in largest group Group size variation could come from two sources: from individual behaviour (e.g. owing to a larger underlying genetic predisposition for contrasting group sizes, Brown & Brown 2000; Serrano & Tella 2007), or because of formationedestruction dynamics (e.g. all large colonies started with a few nests). What these analyses tell us is that in species with larger colony size variation, individuals live in more specific colony sizes. The paradox is potentially solved by colony size dynamics: intraspecifically, variability in individual behaviour (e.g. owing to underlying genetics) could promote colony size variation. However, because of the imperative colony size dynamics, when a species shows a preference for breeding in large colonies, all smaller colonies also exist in the population (i.e. very small colony sizes are pervasive even in bird species with huge colonies; Brown et al. 1990; see Table 1 in Jovani et al. 2008a), thus leading to higher colony size variation even when most individuals prefer to live in some particularly large colonies. The 5th percentile showed the weakest correlation for group size-related variables between GSFDs and IndGSFDs (Fig. 3, Table 1, Fig. A2). In fact, all species showed a lowest group size of fewer than 10 nests, but even species with a minimum group size of one nest showed contrasting 5th percentile individual group sizes from one to 579 nests. Since minimum group sizes of species often show very low values, often close to one nest, that is, solitary breeders (e.g. Brown et al. 1990; Krause & Ruxton 2002; this study), minimum colony sizes could scarcely be seen as a species-specific trait. However, our analyses show that seabirds in Britain and Ireland differ substantially in the smaller (5th percentile) group sizes in which individuals live, thus presenting the possibility that this could be a species-specific trait. This necessitates a study comparing populations of the same species in different parts of the world. 0.75 f s 0.5 p m g i n o 0.25 e t j r h d k a bc l 0 Maximum group size Figure 4. Correlation between the maximum group (colony) size of each species and the proportion of all the breeding pairs of the species found in their largest five colonies. Modelling Implications References Theoretical approaches are aimed at understanding group size variation, but not individual group size variation. An important (and apparently trivial and obvious) starting point of animal grouping models is that, ideally, mean group size in a population should be the group size conferring the highest fitness to the individuals (reviewed in Clark & Mangel 1986). Posterior modelling approaches, however, have questioned the validity of this assumption showing, for instance, that the difference between optimal and realized mean group sizes depends on whether group members have control over the entrance of newcomers to the group (Giraldeau & Caraco 2000). However, the initial assumption (i.e. that mean group sizes should be close to optimal group sizes) has been not questioned. This could be misleading because in genetically unrelated animals (e.g. a huge seabird colony) what should be expected is not that groups should be of an optimal size, but that most of the individuals of the population should live in such optimal group sizes, that is, show an adaptive behaviour. If group size variation is low, mean population group size and mean individual group size may be essentially the same (see hypothetical example in the Introduction, Lloyd 1967), and thus approaches modelling these kinds of GSFDs remain essentially equally valid. However, our results show that mean individual group sizes could be many times larger than population mean group sizes (e.g. Fig. 1f, g, i, p, q; Table 2). Thus, the empirical finding that group size is often larger than the optimal group size (Giraldeau & Caraco 2000; Krause & Ruxton 2002; Sumpter 2010) is even more intriguing when examined from the individual point of view. More generally, since GSFDs and IndGSFDs have been shown to yield different information, models could be tested (and their design aided, Grimm & Railsback 2005) by how well they reproduce not only mean group sizes of GSFDs but also several of their properties (e.g. skewness), as well as for their IndGSFDs. These new approaches will surely benefit from current advances in the statistical treatment of IndGSFDs (Reiczigel et al. 2008; Neuhäuser 2009; Neuhäuser et al. 2010). Finally, fitting theoretical models (e.g. power laws or truncated power laws) to empirical data has been shown to unravel interesting factors shaping population grouping patterns (e.g. Bonabeau et al. 1999; Sjöberg et al. 2000; Lusseau et al. 2004; Jovani et al. 2008b). Our study clearly shows that contrasting results can be found if individual group sizes are studied instead of population group sizes. Therefore, where the aim of the study demands it, it would be interesting to make this double approach to group sizes either to complement group size analyses or to gain a new perspective on the causes and consequences of group living. Avilés, L. & Tufiño, P. 1998. Colony size and individual fitness in the social spider Anelosimus eximius. American Naturalist, 152, 403e418. Bonabeau, E., Dagorn, L. & Fréon, P. 1999. Scaling in animal group-size distributions. Proceedings of the National Academy of Sciences, U.S.A., 96, 4472e4477. Brown, C. R. & Brown, M. B. 2000. Heritable basis for choice of group size in a colonial bird. Proceedings of the National Academy of Sciences, U.S.A., 97, 14825e14830. Brown, C. R., Stutchbury, B. J. & Walsh, P. D. 1990. Choice of colony size in birds. Trends in Ecology & Evolution, 5, 398e403. Clark, C. W. & Mangel, M. 1986. The evolutionary advantages of group foraging. Theoretical Population Biology, 30, 45e75. Emlen, S. T. 1990. White-fronted bee-eaters: helping in a colonially nesting species. In: Cooperative Breeding in Birds. Long-Term Studies of Ecology and Behavior (Ed. by P. B. Stacey & W. D. Koenig), pp. 487e526. Cambridge: Cambridge University Press. Gerard, J.-F., Bideau, E., Maublanc, M.-L., Loisel, P. & Marchal, C. 2002. Herd size in large herbivores: encoded in the individual or emergent? Biological Bulletin, 202, 275e282. Giraldeau, L.-A. & Caraco, T. 2000. Social Foraging Theory. Princeton, New Jersey: Princeton University Press. Götmark, F. 1982. Coloniality in five Larus gulls: a comparative study. Ornis Scandinavica, 13, 211e224. Grimm, V. & Railsback, S. F. 2005. Individual-Based Modeling and Ecology. Princeton, New Jersey: Princeton University Press. Heath, J. P. 2006. Quantifying temporal variability in population abundances. Oikos, 115, 573e581. Iwao, S. 1968. A new regression method for analyzing the aggregation pattern of animal populations. Research Population Ecology, 10, 1e20. Jarman, P. J. 1974. The social organization of antelope in relation to their ecology. Behaviour, 48, 215e268. Jovani, R. & Tella, J. L. 2007. Fractal bird nest distribution produces scale-free colony sizes. Proceedings of the Royal Society B, 274, 2465e2469. Jovani, R., Mavor, R. & Oro, D. 2008a. Hidden patterns of colony size variation in seabirds: a logarithmic point of view. Oikos, 117, 1774e1781. Jovani, R., Serrano, D., Ursúa, E. & Tella, J. L. 2008b. Truncated power laws reveal a link between low-level behavioral processes and grouping patterns in a colonial bird. PLoS ONE, 3, e1992, doi:10.1371/journal.pone.0001992. Krause, J. & Ruxton, G. D. 2002. Living in Groups. Oxford: Oxford University Press. Lusseau, D., Williams, R., Wilson, B., Grellier, K., Barton, T. R., Hammond, P. S. & Thompson, P. M. 2004. Parallel influence of climate on the behaviour of Pacific killer whales and Atlantic bottlenose dolphins. Ecology Letters, 7, 1068e1076. Lloyd, M. 1967. Mean crowding. Journal Animal Ecology, 36, 1e30. Mitchell, P. I., Newton, S. F., Ratcliffe, N. & Dunn, T. E. 2004. Seabird Populations of Britain and Ireland. London: T. & A. D. Poyser. Nelson, B. 1980. Seabirds. Their Biology and Ecology. Toronto: Hamlyn. Neuhäuser, M. 2009. The importance of the biological system underlying the data when choosing a statistical test: why penguins need to be treated differently to parasites. Animal Behaviour, 77, e1ee3. Neuhäuser, M., Kotzmann, J., Walier, M. & Poulin, R. 2010. The comparison of mean crowding between two groups. Journal of Parasitology, 96, 477e481. Poulin, R. 2007. Evolutionary Ecology of Parasites. 2nd edn. Princeton, New Jersey: Princeton University Press. Preston, F. W. 1948. The commonness, and rarity, of species. Ecology, 29, 254e283. Preston, F. W. 1962. The canonical distribution of commonness and rarity, Part I. Ecology, 43, 185e215. Pueyo, S. & Jovani, R. 2006. Comment on ‘A Keystone Mutualism Drives Pattern in a Power Function’. Science, 313, 1739c. Reiczigel, J., Lang, Z., Rózsa, L. & Tóthmérész, B. 2005. Properties of crowding indices and statistical tools to analyze parasite crowding data. Journal of Parasitology, 91, 245e252. Reiczigel, J., Lang, Z., Rózsa, L. & Tóthmérész, B. 2008. Measures of sociality: two different views of group size. Animal Behaviour, 75, 715e721. Safran, R. J. 2004. Adaptive site selection rules and variation in group size of barn swallows: individual decisions predict population patterns. American Naturalist, 164, 121e131. Safran, R. J., Doerr, V. A. J., Sherman, P. W., Doerr, E. D., Flaxman, S. M. & Winkler, D. W. 2007. Group breeding in vertebrates: linking individual and population-level approaches. Evolutionary Ecology Research, 9, 1163e1185. Serrano, D. & Tella, J. L. 2007. The role of despotism and heritability in determining settlement patterns in the colonial lesser kestrel. American Naturalist, 169, E53eE67. Sjöberg, M., Albrectsen, B. & Hjältén, J. 2000. Truncated power laws: a tool for understanding aggregation patterns in animals? Ecology Letters, 3, 90e94. Stacey, P. B. & Koenig, W. D. 1990. Cooperative Breeding in Birds. Long-term Studies of Ecology and Behavior. Cambridge: Cambridge University Press. Sumpter, D. J. T. 2010. Collective Animal Behavior. Princeton, New Jersey: Princeton University Press. Weso1owski, T., G1az_ewska, E., G1az_ ewski, L., Hejnowicz, E., Nawrocka, B., ‘ ska, K. 1985. Size, habitat distribution and site turnover of Nawrocki, P. & Okon gull and tern colonies on the middle Vistula. Acta Ornithologica, 21, 46e67. Wirtz, P. & Lörscher, J. 1983. Group sizes of antelopes in an east African national park. Behaviour, 84, 135e156. Acknowledgments This and previous work on the Seabird 2000 data set would not have been possible without the collaboration between the Joint Nature Conservation Committee and the Royal Society for the Protection of Birds, and the over 1000 volunteers that have gathered the data. We also thank José L. Tella, Daniel Oro, José A. Donázar, Olga Ceballos, David Serrano, Jaime Potti and Ainara Cortés-Avizanda for discussion, and David Lusseau, Steve Oswald and an anonymous referee for interesting contributions. R.J. is supported by a Ramón y Cajal research contract (RYC-2009-03967) from the Ministerio de Ciencia e Innovación. Supplementary Material Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.anbehav.2011.07.037. Appendix 50 611 25 40 (a) 0 2804 20 75 493 (k) 10 129 0 0 0 160 000 70 320 000 568 35 (b) 350 100 000 1074 175 17 546 0 50 000 0 0 (l) 323 0 60 000 3500 120 000 60 30 2885 10 20 424 0 7000 20 (c) 1551 (m) 11 219 0 0 0 150 000 50 300 000 821 25 (d) 11 384 0 15 000 40 30 000 71 20 (n) 220 0 0 0 32 000 250 64 000 Number of groups (colonies) 892 13 500 40 (e) 20 4000 0 368 1033 0 0 (o) 0 14 000 2000 28 000 40 87 20 0 4000 50 (f) 27 297 976 25 19 487 0 0 (p) 0 21 000 15 000 30 000 40 464 20 42 000 20 (g) 0 47 10 59 471 (q) 120 000 0 0 0 60 000 70 000 120 000 60 1325 30 140 000 70 (h) 1998 35 1720 0 983 0 0 (r) 0 6000 10 000 30 640 15 30 (i) 15 14 575 0 719 0 0 (s) 2293 0 2500 20 000 5000 40 000 70 35 0 160 60 (j) 30 675 0 0 1500 0 Number of individuals (nests) 26 577 (t) 675 0 550 3000 Max 0 1100 Max Group size Figure A1. Group (colony) size frequency distribution (in black; left Y axis) and corresponding individual group size frequency distribution linear histograms (in grey; right Y axis) for 20 seabird species breeding in Britain and Ireland. Grey bars are always shown in their full length, but the first group size bin (the left-hand black bar of each graph) has been cut and its real value depicted by the left-hand value inside each graph. Each graph shows from left to right, 20 linear bins ranging from the smallest to the largest group size in the species. Bins are of different length between graphs, but constant within graphs. Bins are calculated as (maximum group size/20). For instance, in (e) the maximum group size (the number above the largest bin) is 4000. Thus, in (e), bin length is 4000/20 ¼ 200; thus, the first bin is (0e200], the second bin (200e400] and the last bin (3800e4000] (‘(‘ means that the number is not included in the bin and ‘]’ that it is included). See Fig. 1 for label details. 120 000 4000 (b) Individual group size frequency distribution (a) 3000 90 000 2000 60 000 1000 30 000 0 0 0 2 4 6 0 50 100 150 200 250 100 000 (c) (d) 80 000 120 000 60 000 80 000 40 000 40 000 20 000 0 0 0 2000 4000 6000 0 20 000 40 000 Group size frequency distribution Figure A2. Correlation of size-related statistics describing group size frequency distributions and their corresponding individual group size frequency distributions. (a) 5th percentile, (b) median, (c) mean and (d) 95th percentile. Species codes are the same as in Fig. 1. See Table 1 for correlation statistics. Dashed lines depict the x ¼ y line.