Journal of Animal Ecology 2010, 79, 548–555 doi: 10.1111/j.1365-2656.2010.01670.x Mixed conditional logistic regression for habitat selection studies Thierry Duchesne1*, Daniel Fortin2 and Nicolas Courbin2 1 Département de Mathématiques et de Statistique, Université Laval, Sainte-Foy, QC, Canada G1V 0A6; and 2Chaire de Recherche Industrielle CRSNG-Université Laval en Sylviculture et Faune, Département de Biologie, Université Laval, Sainte-Foy, QC, Canada G1V 0A6 Summary 1. Resource selection functions (RSFs) are becoming a dominant tool in habitat selection studies. RSF coefficients can be estimated with unconditional (standard) and conditional logistic regressions. While the advantage of mixed-effects models is recognized for standard logistic regression, mixed conditional logistic regression remains largely overlooked in ecological studies. 2. We demonstrate the significance of mixed conditional logistic regression for habitat selection studies. First, we use spatially explicit models to illustrate how mixed-effects RSFs can be useful in the presence of inter-individual heterogeneity in selection and when the assumption of independence from irrelevant alternatives (IIA) is violated. The IIA hypothesis states that the strength of preference for habitat type A over habitat type B does not depend on the other habitat types also available. Secondly, we demonstrate the significance of mixed-effects models to evaluate habitat selection of free-ranging bison Bison bison. 3. When movement rules were homogeneous among individuals and the IIA assumption was respected, fixed-effects RSFs adequately described habitat selection by simulated animals. In situations violating the inter-individual homogeneity and IIA assumptions, however, RSFs were best estimated with mixed-effects regressions, and fixed-effects models could even provide faulty conclusions. 4. Mixed-effects models indicate that bison did not select farmlands, but exhibited strong interindividual variations in their response to farmlands. Less than half of the bison preferred farmlands over forests. Conversely, the fixed-effect model simply suggested an overall selection for farmlands. 5. Conditional logistic regression is recognized as a powerful approach to evaluate habitat selection when resource availability changes. This regression is increasingly used in ecological studies, but almost exclusively in the context of fixed-effects models. Fitness maximization can imply differences in trade-offs among individuals, which can yield inter-individual differences in selection and lead to departure from IIA. These situations are best modelled with mixed-effects models. Mixed-effects conditional logistic regression should become a valuable tool for ecological research. Key-words: case–control location sampling, farmland, Global Positioning System, likelihoodratio test, mixed multinomial logit model, Prince Albert National Park, Spatially Explicit Landscape Event Simulator Introduction The resource selection function (RSF) is currently one of the dominant tools used to quantify habitat selection (McLoughlin et al. 2010). RSFs link animal distribution to spatial patterns of habitat heterogeneity by contrasting the charac*Correspondence author. E-mail: thierry.duchesne@mat.ulaval.ca teristics of animal locations with those of a set of random locations (Manly et al. 2002). Random locations are often drawn across home-ranges of individuals (Compton, Rhymer & McCollough 2002), in which case observed (response variable coded as ones) and random (response variable coded as zeros) locations are generally contrasted with unconditional logistic regressions. Under such a sampling design, however, estimation methods must consider that a certain 2010 The Authors. Journal compilation 2010 British Ecological Society Mixed-effects models for habitat selection 549 number of random locations might have been visited, in which case they do not all represent true absences (Keating & Cherry 2004; Johnson et al. 2006). The use of a matched design can then become advantageous. With a matched design, each observed location is associated with a specific set of random locations drawn within a limited spatial domain (Boyce 2006), often corresponding to the distance where the animal could have travelled during the relocation time interval (Boyce et al. 2003). Because the animal could not also be at the random locations when its actual location was acquired, random locations represent true absences. Furthermore, matched designs are appropriate when evaluating the habitat selection of animals with home-ranges that are either not well defined or large relative to the distance individuals move between relocations (Arthur et al. 1996; Compton et al. 2002). RSFs based on a matched design are estimated by conditional logistic regression (Compton et al. 2002; Boyce et al. 2003; Boyce 2006; McDonald et al. 2006), an approach that is becoming increasingly used in habitat selection analysis. Despite difficulties in assigning a variance–covariance structure (Craiu, Duchesne & Fortin 2008; Koper & Manseau 2009), the value of random effects in RSFs has been largely recognized in the case of models developed from non-matched designs (Gillies et al. 2006; Hebblewhite & Merrill 2008). Mixed effects should be better suited to analyse unbalanced data sets or when selection for the different landscape attributes vary among individuals (Gillies et al. 2006). Moreover, mixed-effects models can handle the situation where several matched sets of locations come from a same animal and are thus correlated. The addition of random effects also provides these advantages in studies based on matched sampling designs, but mixed-effects conditional logistic regressions have been largely overlooked in ecological research (but see Bruun & Smith 2003; Fortin et al. 2009). Moreover, unlike mixedeffects conditional logistic regression, fixed-effects models rely on assumptions that might not faithfully represent certain ecological systems. Fixed-effects models assume that the strength of selection is homogeneous among individuals within the population and thus estimate the population-averaged selection. Fixed-effects conditional logistic regression also implies independence from irrelevant alternatives (IIA, Revelt & Train 1998). The IIA hypothesis states that the strength of preference for (i.e. the odds of choosing) habitat type A over habitat type B does not depend on the other habitat types also available. Because behavioural decisions reflect trade-offs among multiple competing demands, changes in available options may alter individual preferences, thereby violating the IIA assumption. For example, prey often make greater use of patches located in relatively safe areas (Hay & Fuller 1981; Morrison et al. 2004; Hochman & Kotler 2007). The foraging efforts and selectivity of Nubian Ibex (Capra nubiana F. Cuvier, 1825) vary with the distance from the safety of a cliff (Hochman & Kotler 2007). In other words, the strength of preference for a given type of food patch over the baseline patch type depends on the presence or absence of a cliff at close proximity, a spatial dependency that might violate the IIA hypothesis. In this context, fixed effects may yield inappropriate conclusions, potentially leading to unfavourable management actions. In this study, we illustrate how departures from the assumption of homogeneous selection among individuals due either to inter-individual variability in movement rules or to the violation of the IIA assumption may influence the estimation of RSF parameters under a matched sampling design. We begin by showing how mixed effects can be incorporated into conditional logistic regression model. We follow an approach based on random utility theory (Cooper & Millspaugh 1999) because it is easily interpretable in the resource selection context and because the exponential form of the RSF is robust to misspecification (McFadden & Train 2000). We then explain why, unlike the fixed-effects model, its mixed-effects counterpart remains appropriate under some types of violation of IIA. We follow with a simulationbased investigation of the impact of departures from the homogeneity in selection probabilities and from the IIA assumption on the estimation of RSFs. Finally, we illustrate the methods with an analysis of habitat selection by the freeranging bison Bison bison (Linnaeus, 1758) of Prince Albert National Park (Saskatchewan, Canada) during the springs of 2005–2008. In the spring, bison occasionally leave the park for adjacent private lands where they sometimes damage fences and crops, disturb livestock, and get killed by hunters. It can be beneficial for management to evaluate whether bison use of farmlands results from an active selection, and to quantify whether cross-boundary movements are made by few individuals or whether it is a widespread behaviour. We show that conditional mixed-effects RSFs are better suited than marginal fixed-effects RSFs to achieve this goal. Materials and methods RANDOM EFFECTS IN CONDITIONAL LOGISTIC REGRESSION As with ordinary (unconditional) logistic regression, random effects can be included in conditional logistic regression models by replacing fixed regression coefficients with random coefficients. Because of the conditioning involved, conditional models have no intercept term and random effects are included as random regression coefficients. In the resulting mixed multinomial logit model (sensu Revelt & Train 1998), each animal assigns a value, termed utility (U), to all landscape locations, and among the locations available at a given time selects the one with the highest utility (Cooper & Millspaugh 1999; McDonald et al. 2006). Let n = 1,…,K represent the individuals, t = 1,…,tn the time steps for individual n and j = 1,…,J the available locations (or a sample of all available locations, McDonald et al. 2006) for animal n at time step t. The mixed multinomial logit model considers utilities as random variables, with Unjt being the utility that animal n assigns to the jth location available at time step t. Let xnjt1 ; . . . ; xnjtm represent the values of m covariates (e.g. habitat attributes) measured at the jth location available to animal n at time step t. Now let us assume that the utility assigned to a location depends on its attributes, viz. 2010 The Authors. Journal compilation 2010 British Ecological Society, Journal of Animal Ecology, 79, 548–555 550 T. Duchesne et al. Unjt ¼ b1 xnjt1 þ b2 xnjt2 þ þ bm xnjtm þ bn1 znjt1 þ þ bnq znjtq þ enjt ¼ xnjt 0b þ znjt 0b þ enjt ; eqn 1 where b1,…,bm are the fixed regression coefficients, bn1,…,bnq are animal-level random effects, znjt1 ; . . .; znjtq are fixed values specifying the structure of the random effects (usually equal to the subset of the covariates xnjti for which coefficients are random), enjt are independent and identically distributed random error terms, b = (b1,…,bm)¢, xnjt = (xnjt1 ; . . .; xnjtm )¢, b = (bn1,…,bnq)¢ and znjt = (znjt1 ; . . .; znjtq )¢. We make the assumption that the random errors follow an extreme value distribution, which reduces to the usual exponential RSF when there are no random effects (see below); this assumption is mild and the model thereby specified is very flexible (McFadden & Train 2000) Let the random effects b be independent and identically distributed with density f(b;h), with h a vector of unknown parameters. The probability that an animal chooses location j within the set of J locations {1, 2,…,J}, i.e. Unjt > Unit for all i „ j, is Z expðx0njt b þ z0njt bÞ Pðxnjt Þ ¼ PJ fðb;hÞdb: eqn 2 0 0 i¼1 expðxnit b þ znit bÞ Though the distribution of the random effects is typically chosen as the multivariate normal distribution with mean vector 0 and variance–covariance parameters to be estimated (Gillies et al. 2006; Hebblewhite & Merrill 2008), other distributions such as the lognormal, uniform or triangular can be used (Bhat 2001). When all znjti in eqn (1) take on value zero or when the variance of b is null (i.e. b is identically 0), eqn (2) simplifies to expðx0njt bÞ Pðxnjt Þ ¼ PJ ; 0 i¼1 expðxnit bÞ eqn 3 and we get the ordinary (i.e. fixed effects) conditional logistic regression model (McDonald et al. 2006). RANDOM EFFECTS, HETEROGENEITY IN SELECTION AND THE DEPENDENCE FROM IRRELEVANT ALTERNATIVES The addition of individual-level random effects in RSFs relaxes the assumption of homogeneous selection among animals. For example, adding an animal-level random regression coefficient allows for inter-individual variations in the response to covariate x, which means that each individual may respond differently to changes in x. Because the random effects are unobserved random variables that are common to all the locations of a given individual, the mixedeffects model does not assume that the observations of that individual are uncorrelated (Revelt & Train 1998). Note that, though they do not explicitly model the animal-level heterogeneity, fixed-effects model estimated by methods such as generalized estimating equations can handle correlated matched sets (Craiu et al. 2008). The mixed multinomial logit model relaxes the IIA assumption, but only at a population level. It does so by inducing correlation over alternatives in the stochastic portion of utility (Revelt & Train 1998; Skrondal & Rabe-Hesketh 2003). To illustrate this, we considered a forager, such as the Nubian Ibex (Hochman & Kotler 2007), responding to spatial patterns of risk. Suppose that each location is of one of three types, which we code using covariates xjP and xjC: location j may be a risky food patch, coded as xjP = 1, xjC = 0; a safe cliff, coded as xjP = 0, xjC = 1; or a baseline habitat that offers no food or protection, coded as xjP = 0, xjC = 0. We assume that J > 2 locations are available and that location j = 1 is a food patch (x1P = 1, x1C = 0), and location j = 2 is the baseline habitat (x2P = 0, x2C = 0). If we assume a fixed-effect conditional logistic regression model (McDonald et al. 2006) with RSF proportional to exp(bPxjP + bCxjC), then the ratio of the probability that the animal selects location j = 1 to the probability that the same animal (or another animal chosen at random) selects location j = 2 is given by . expðbP Þ PJ expðbP xjP þ bC xjC Þ expðbP Þ . j¼1 ¼ expðbP Þ; eqn 4 ¼ 1 expð0Þ PJ j¼1 expðbP xjP þ bC xjC Þ which does not depend on whether there is a cliff among the other available locations. Now let us assume the same model, but this time with a random slope bP + b for covariate xP instead of the fixed slope bP. Because b remains fixed for all the locations of a given animal, the ratio of the probability that the animal chooses location j = 1 to the probability that it selects location j = 2 is given by . expðbP þ bÞ PJ expðbP þ bÞ j¼1 expðbP xjP þ bC xjC Þ . ¼ expðbP þ bÞ; ¼ 1 expð0Þ PJ j¼1 expðbP xjP þ bC xjC Þ which still does not depend on the attributes of the alternate locations, but it depends on b, the unobserved animal-specific random effect. Now, if we consider the ratio of the probability that an animal chosen at random selects location j = 1 to the probability that another animal, again chosen at random, selects location j = 2, then we get R expðb þ bÞ.P J P fðbÞdb j¼1 expððbP þ bÞxjP þ bC xjC Þ . R expð0Þ P J fðbÞdb j¼1 expððbP þ bÞxjP þ bC xjC Þ 2R 3 . expðbÞ PJ fðbÞdb expððb þ bÞx þ b x Þ jP 6 7 P C jC j¼1 7; ¼ expðbP Þ6 1 4 R PJ 5 fðbÞdb j¼1 expfðbP þ bÞxjP þ bC xjC g where the quantity in square brackets now depends on the characteristics of all available locations (Train 2003). This model thus relaxes the IIA assumption at the population level. In other words, by adding random coefficients in the conditional logistic regression model, the population-averaged probability of choosing a given habitat type depends on the local alternatives. ESTIMATION AND INFERENCE We now consider maximum likelihood estimation of the parameters of the model described by eqns (1) and (2) on the basis of data obtained with a matched sampling design. To simplify the notation and without loss of generality, we assume that the location chosen by animal n at time step t among the J available locations is assigned label j = 1 (and thus the locations not chosen are assigned labels j = 2, 3,…,J). Maximum-likelihood estimates of the RSF and random effects distribution parameters are obtained by finding the values of b and h maximizing: tn K Z Y Y expðx0n1t b þ z0n1t bÞ Lðb; hÞ ¼ fðb; hÞdb: eqn 5 PJ 0 0 t¼1 n¼1 j¼1 expðxnjt b þ znjt bÞ Because eqn (5) is a valid likelihood function, any likelihood-based inference method for b, such as Wald confidence intervals based on inverting the Hessian of the negative log-likelihood, likelihood-ratio tests, or AIC-based model selection can be applied (McFadden & Train 2000). 2010 The Authors. Journal compilation 2010 British Ecological Society, Journal of Animal Ecology, 79, 548–555 Mixed-effects models for habitat selection 551 According to parsimony principles, the need for random effects in RSFs should be assessed. If random effects are not needed, then fixed-effects conditional regression would improve estimation efficiency and model interpretability (Verbeke & Molenberghs 2000). Fixed-effects model can be considered as a special case of mixedeffects model where the variance and covariance parameters in f(b; h) are zero. A likelihood-ratio test for nested models can thus be used to evaluate the need to increase model complexity through the use of random effects. The likelihood-ratio statistic that tests whether the fixed-effects model is reasonable is given by r = 2(‘1 ) ‘0), where ‘0 and ‘1 are the values of the maximized log-likelihoods of the fixedand mixed-effects models, respectively. Because the value zero is on the boundary of the parameter space for variance parameters, the Pvalue is not simply based on the usual chi-squared distribution but rather on a mixture of chi-squared distributions, with the number of chi-squared variables in the mixture and their respective numbers of degrees of freedom depending on the structure of the variance and covariance parameters set to zero (Verbeke & Molenberghs 2000). Consider for example a mixed-effects model with a single random effect b with distribution N(0,r2). The likelihood-ratio statistic to test whether b is needed follows, under the null model, a mixture of two chi-squared distributions with zero and one degree of freedom, respectively. This reduces the P-value to 05Pr½v21 >r,with v21 representing a chi-squared random variable with 1 degree of freedom. Direct numerical maximization of L(b, h) given by eqn (5) can be difficult, as it involves integrals that cannot be solved analytically. The numerical maximization of the likelihood is often more likely to converge for a fixed-effects RSF than its mixed-effects counterpart. Bhat (2001) described simulation methods based on Halton quasirandom numbers that can efficiently evaluate the likelihood function. Maximization of the likelihood from eqn (5) can be implemented with this method using the mxlmsl package (Train 2006) for matlab r2008a (MathWorks Inc. 2008). We provide the matlab code used for our bison case study in Appendix S3. There are other, albeit less direct, means of maximizing the likelihood from eqn (5). Chen & Kuo (2001) showed how to build a nonlinear Poisson model with random effects whose likelihood is equivalent to a closely related multinomial formulation of eqn (5). The required Poisson model can be fitted by maximum likelihood, where the integrals are evaluated with adaptive Gaussian quadrature or penalized quasi-likelihood. Bruun & Smith (2003) used the latter approach to evaluate habitat selection by European starlings (Sturnus vulgaris Linnaeus, 1758). Mixed conditional logistic regression models can also be fitted with Bayesian methods, but the approach then requires specifying prior distributions (informative or not) for b, h. R.V. Craiu, T. Duchesne, D. Fortin & S. Baillargeon (unpublished data), propose a numerically stable and efficient two-step method that gives accurate approximations to the maximum-likelihood estimates for mixed-effects conditional logistic regression. Perhaps, methods based on the results of the first step (i.e. separate models fitted to each animal) of such a two-step approach could help in determining whether the need for random effects arises from betweenanimal heterogeneity or the violation of IIA. Example 1: Simulation of patch selection under predation risk We use computer simulations to investigate the effect of departures from the assumption of homogeneous habitat selection among individuals. Deviations from the assumption were induced by imposing inter-individual variations in movement rules and by forcing movement decisions that violated the IIA assumption. Individual-based, spatially explicit modelling was conducted using the Spatially Explicit Landscape Event Simulator (Fall & Fall 2001). We simulated the movements of 200 virtual foragers, with each individual starting (time 0) at a random location within the landscape (1000 · 1000 cells), and followed for 50 consecutive moves. Landscapes comprised four types of randomly distributed habitat patches: Patch type H1 offered the most food, followed by H2. Neither H3 nor H4 offered any food. H1 was risky, unless located <15 cells from H3, in which case H1 became safe. H2 was always safe. We tested four scenarios differing in the movement rules of individuals, with distinct statistical implications. Movements for scenarios 1 and 2 were both consistent with the IIA hypothesis; scenario 1 assumed a homogeneous movement rule, whereas scenario 2 involved inter-individual variation in the rules. Scenarios 3 and 4 both led to violation of the IIA hypothesis at the individual level, because the preference for H1 over H2 depended on whether H3 occurs within 15 cells; a homogeneous movement rule was used for scenario 3 whereas inter-individual variation in movement rules characterized scenario 4. The movement rules as well as the landscape used for each of the four scenarios are described in detail in Appendix S1. To assess the effect of varying patch availability on inferences, scenario 3 was applied to five additional landscapes, where the proportions of H1 and H2 remained unchanged but those of H3 and H4 varied according to Landscape 1: 0Æ01%, 69Æ99%, Landscape 2: 0Æ02%, 69Æ98%, Landscape 3: 0Æ03%, 69Æ97%, Landscape 4: 0Æ05%, 69Æ95%, and Landscape 5: 0Æ06%, 69Æ94%, respectively. In all scenarios, each observed location was matched to 10 locations randomly drawn within a 30-cell radius, which was enough to encompass all step distances (Forester, Im & Rathouz 2009). Patch type (H1–H4) was identified at all observed and random locations. Fixed- and mixed-effects conditional logistic regressions were used to build RSFs. Mixed-effects RSFs allowed the coefficient of H1 to vary among individuals according to N(b1,r2). In all models, H2 was used as the baseline patch type. Models were fitted by maximizing the likelihood given by eqn (5) using a publicly available matlab r2008a (MathWorks Inc. 2008) package (Train 2006). Example 2: Habitat selection by free-ranging bison The field study was conducted in the springs of 2005–2008 (9 March– 31 May 2005, 1 March–31 May in 2006 and 2007, and 1 March–10 March 2008) in Prince Albert National Park, where the bison population was comprised of 385 individuals. The bison range is mostly composed of forests (85 %), meadows (10%) and water bodies (5%). The range is adjacent to farmlands, where bison are occasionally found. We followed 24 female bison equipped with Global Positioning System collars (GPS collar 4400M from Lotek Engineering, Newmarket, ON, Canada) taking locations at 06:00 and 18:00 hours. Each observed location was paired with 10 random locations sampled within a 1Æ6-km radius circle (>90% of all travelled distances between relocations). Land-cover types at observed and random locations were characterized based on classified Landsat ETM+ satellite images (Fortin et al. 2009). Land-cover types were (i) meadow, including areas near lakes and rivers dominated by grasses, forbs and sedges (MEADOW); (ii) riparian areas largely comprised shrubs and located near streams and rivers (RIPARIAN); (iii) forest consisting of deciduous, conifer and mixed stands (FOREST); (iv) water bodies (WATER); (v) road including the areas located <15 m from a human-made trail or a road (ROAD); and (vi) farmlands (AGRIC). Fixed- and mixed-effects conditional logistic regressions fitted by maximum likelihood were used to 2010 The Authors. Journal compilation 2010 British Ecological Society, Journal of Animal Ecology, 79, 548–555 552 T. Duchesne et al. build RSFs. Random effects assuming N(0,r2) were investigated for AGRIC, with FOREST as the baseline land-cover type. The IIA assumption remained valid in scenario 2, but this time each animal had a different probability of choosing H1. In this context, the mixed-effects RSF received greater empirical support (Table 1) as its random coefficient was an important addition to the model fit (likelihood-ratio test: P < 0Æ0001). We now consider a situation (scenario 3) where all individuals displayed the same movement strategy, but where the IIA assumption was violated because the odds of choosing H1 depended on whether a refuge patch H3 was at close proximity. Selection coefficients for H1 were then systematically lower when estimated by mixed-effects conditional logistic regression than by their fixed-effects counterpart (Fig. 1). Whether H1 was selected or avoided remained generally consistent with both models, with the exception of when H3 made up 0Æ04% of landscape. In this case, the fixedeffects RSF suggested a significant selection for H1, whereas the better fitting (likelihood-ratio test: P < 0Æ0001) mixedeffects model revealed that the average simulated forager had Results EXAMPLE 1: SIMULATION OF PATCH SELECTION UNDER PREDATION RISK Scenario 1 represented a situation where the IIA hypothesis was valid and where the movement strategy was fixed within the population of simulated foragers. As expected in such cases, the fixed- and mixed-effects RSFs yielded a similar coefficient estimate for H1 of )0Æ91 ± 0Æ03 (±SE) (Table 1), which agrees with the theoretical approximation (Appendix S2). Moreover, the standard deviation (SD) of the random coefficient associated with the mixed-effects model did not differ significantly from 0 (likelihood-ratio test: P = 0Æ46), indicating that a random coefficient for H1 was not required (Table 1). Table 1. Patch selection estimated by fixed- or mixed-effects conditional logistic regressions with normally distributed coefficients, for virtual foragers travelling in landscapes according to four scenarios. The scenarios differed depending on whether movement rules were similar among all individuals of the population and whether the assumption of independence from irrelevant alternatives (IIA) was violated. H2 was the baseline patch type in all resource selection functions Fixed-effects model Variable b SE Scenario 1: no inter-individual variation, IIA assumption respected Fixed coefficient H1 )0Æ908 0Æ031 H4 )1Æ520 0Æ024 Random coefficient H1 – – SD of coefficient – – Max. log likelihood )22 030Æ021 Scenario 2: inter-individual variation, IIA assumption respected Fixed coefficient H1 )0Æ835 0Æ031 H4 )1Æ528 0Æ024 Random coefficient H1 – – SD of coefficient – – Max. log likelihood )22 023Æ359 Scenario 3: no inter-individual variation, IIA assumption violated Fixed coefficient H1 0Æ073 0Æ027 H3 )0Æ736 0Æ528 H4 )1Æ458 0Æ026 Random coefficient H1 – – SD of coefficient – – Max. log likelihood )21 557Æ935 Scenario 4: inter-individual variation, IIA assumption violated Fixed coefficient H1 0Æ019 0Æ028 H3 )1Æ540 0Æ724 H4 )1Æ454 0Æ026 Random coefficient H1 – – SD of coefficient – – Max. log likelihood )21 655Æ263 Mixed-effects model 95% CI b )0Æ969, )0Æ847 )1Æ567, )1Æ473 – )1Æ520 – 0Æ024 – )1Æ567, )1Æ473 )0Æ908 0Æ000 0Æ031 0Æ174 )22 030Æ020 )0Æ969, )0Æ847 – )1Æ528 – 0Æ024 – )1Æ575, )1Æ481 )0Æ873 0Æ368 0Æ041 0Æ041 )21 999Æ292 )0Æ953, )0Æ793 – )0Æ712 )1Æ462 – 0Æ530 0Æ027 – )1Æ751, 0Æ327 )1Æ515, )1Æ409 0Æ006 0Æ752 0Æ060 0Æ046 )21 273Æ523 – )1Æ461 )1Æ450 – 0Æ726 0Æ026 )0Æ062 0Æ764 0Æ061 0Æ047 )21 362Æ713 – )0Æ896, )0Æ774 )1Æ575, )1Æ481 – 0Æ020, 0Æ126 )1Æ771, 0Æ299 )1Æ509, )1Æ407 – )0Æ036, 0Æ074 )2Æ959, )0Æ121 )1Æ505, )1Æ403 – SE 95% CI )0Æ112, 0Æ124 – )2Æ884, )0Æ038 )1Æ501, )1Æ399 )0Æ182, 0Æ058 2010 The Authors. Journal compilation 2010 British Ecological Society, Journal of Animal Ecology, 79, 548–555 Mixed-effects models for habitat selection 553 were used. Population-averaged fixed-effects RSF indicated a general selection for farmlands over forest areas, whereas mixed-effects model revealed that bison had no preference for one land-cover type over the other. The mixed-effects model provided a better depiction of bison selection than the fixed-effects RSF (likelihood-ratio test: P < 0Æ0001). The mixed-effects RSF revealed important heterogeneity in the response to farmlands within the population (Table 2), with 41% (N[)0Æ275, 1Æ538]) of female bison having a positive selection coefficient for farmlands. Discussion Fig. 1. Changes in the selection coefficient (±95% confidence intervals) for patch type H1 by simulated foragers as function of the percentage of the landscape comprised refuge patch H3, as assessed by resource selection functions estimated from fixed- or mixed-effects conditional logistic regression. Simulations were made according to scenario 3 where the probability that a forager selects H1 compared to H2 increased when a refuge H3 was in close proximity. We also indicated the expected proportion of the population having positive ^ r ^2 ) estimate of the coefficient for patch type H1, based on the N(b; distribution of b + b. Notice that values for fixed and random coefficients were slightly offset from one another to increase clarity. no overall selection for H1 (Fig. 1). In the most complex scenario 4, virtual foragers not only violated the premise of IIA, but the strength of selection for H1 also differed among them. Modelling habitat selection under this scenario required, once again, the use of a random coefficient for H1 (likelihood-ratio test: P < 0Æ0001). EXAMPLE 2: HABITAT SELECTION OF FREE-RANGING BISON Compared to the forest matrix, female bison selected meadows, water bodies and roads, but displayed no preference for riparian areas (Table 2). The response of bison to farmlands differed depending on whether fixed- or mixed-effects RSFs We used spatially explicit simulations to demonstrate how mixed-effects conditional logistic regression can capture inter-individual variation in selection induced by differences in movement rules among simulated foragers and by the presence or absence of refuge patches (which led to the violation of the IIA assumption). When the relative preference of resource patches was the same for all individuals and IIA was true (scenario 1), the fixed- and mixed-effects models estimated almost identical regression coefficients. Fixed-effects RSFs then provided an accurate representation of habitat selection within the population and were more parsimonious than mixed-effects RSFs. In contrast, when habitat selection probabilities varied among individuals but the IIA was still a valid assumption (scenario 2), the likelihood-ratio test indicated that the selection for H1 varied significantly within the population, thereby rejecting the fixed-effects model. These conclusions for scenario 2 also held under scenarios 3 (no inter-individual variation in selection and violation of the IIA assumption) and 4 (inter-individual variability in selection and violation of the IIA assumption). In these cases, RSFs that include random effects gave a more accurate representation of habitat selection in the population. The simulation study also demonstrated that individuallevel heterogeneity can be identified and taken into account in RSFs, even when data are collected under a matched sampling design. Furthermore, the simulations (i.e. scenario 3) Table 2. Resource selection functions for radiocollared female bison in Prince Albert National Park during the springs of 2005–2008, as estimated with fixed- or mixed-effects conditional logistic regressions, with normally distributed coefficients Fixed-effects model Variable Fixed coefficient Meadow Water Riparian area Road Farmlands Random coefficient Farmlands H4 SD of coefficient Max. log likelihood Likelihood-ratio test b Mixed-effects model SE 2Æ024 0Æ399 )0Æ315 0Æ942 0Æ348 0Æ046 0Æ094 0Æ163 0Æ143 0Æ118 – )1Æ520 – – 0Æ024 – )5947Æ846 95% CI b SE 95% CI 1Æ934, 2Æ114 0Æ215, 0Æ583 )0Æ635, 0Æ005 0Æ663, 1Æ222 0Æ117, 0Æ579 2Æ024 0Æ401 )0Æ301 0Æ953 – 0Æ046 0Æ094 0Æ163 0Æ143 – – )1Æ567, )1Æ473 )0Æ275 )1Æ520 1Æ243 0Æ377 0Æ024 0Æ344 )5930Æ033 P < 0Æ0001 2010 The Authors. Journal compilation 2010 British Ecological Society, Journal of Animal Ecology, 79, 548–555 1Æ934, 2Æ114 0Æ217, 0Æ585 )0Æ620, 0Æ018 0Æ673, 1Æ233 – )1Æ014, 0Æ464 )1Æ567, )1Æ473 554 T. Duchesne et al. stress that inter-individual variations in movement rules are only one potential source of heterogeneity which may entail the use of mixed-effects conditional logistic regression to analyse habitat selection data gathered from matched sampling designs. The trade-offs between food intake and predator avoidance can shape movement decisions, potentially leading to the violation of the IIA assumption. A faulty assumption of IIA may introduce sufficient heterogeneity in the response of animals to their habitat for random effects to be needed to adequately model animal distribution in response to spatial heterogeneity. Situations where the observed selection violates the IIA assumption can still be modelled with fixed-effects models when animals are homogeneous in their landscape preference. In this situation, the strength of preference for one habitat type over another depends on available alternatives and this dependence has to be modelled correctly and explicitly in the RSF using proper interaction terms. This precise knowledge is likely to be missing a priori in many studies and mixed-effects model offer a robust safeguard in such cases. Findings from the simulations imply that the heterogeneity in selection for farmlands expressed by the female bison of Prince Albert National Park can be due to several factors, including inter-individual variations in movement decisions and the violation of the IIA assumption. Mixed-effects logistic regression can conveniently handle both sources of heterogeneity and thereby provide a robust framework for ecological inference. We concurrently modelled the response of bison to multiple habitat attributes before drawing conclusions about their response to farmlands. For example, we found that bison selected roads, as well as meadows where individuals can find large quantities of high-quality food (Fortin, Fryxell & Pilote 2002; Craiu et al. 2008; Fortin et al. 2009). Fixed- and mixed-effects RSFs then pointed out distinct response of bison to farmlands. Fixed-effects models implied that bison generally made selective use of farmlands, whereas the mixed-effects RSFs refuted this assessment by revealing heterogeneous selection for farmlands. A likelihood-ratio test revealed that the mixed-effects RSF was superior to its fixed-effects counterpart. We thus conclude that the problem of cross-boundary movements is linked to a subset, though a fairly large one, of individuals within the population, with c. 40% of female bison making selective use of farmlands. The mixed-effects RSF thus draw park managers a very different picture from the general selection for farmlands that was implied by the population-averaged fixedeffects RSF. Solving human-wildlife conflicts may depend on whether the problem originates from a restricted number of individuals. In this case, the translocation of ‘problematic’ individuals can be the solution (Sukumar 1991; Jones & Nealson 2003). On the other hand, this management approach might not be as effective when all members of the population adopt an ‘unacceptable’ behaviour. Management or conservation actions should be tailored to the nature of the problem, and mixed-effects models are often better suited than fixed-effects models to evaluate adequately the situation. Our study stressed how drawing robust inference from RSFs may require the use of random effects in conditional logistic regression models. We demonstrated how fixed and mixed conditional logistic regression can lead to different conclusions about animal–habitat interactions. Our simulations illustrated that in some situations models with random coefficients, which yield individual-specific inferences, can provide a more accurate assessment of resource selection by animals compared with fixed-effects models that provide population-averaged inference (Fieberg et al. 2009; Koper & Manseau 2009). We found that the selection for agricultural lands by the population of free-ranging bison of Prince Albert National Park can differ depending on whether random coefficients are used or not. Such differences could have important management and conservation implications. Indeed, habitat selection is commonly used to identify critical resources (Arthur et al. 1996), suitable habitat (Fortin et al. 2008), response to anthropogenic disturbances (Hebblewhite & Merrill 2008), ecological consequences of species reintroduction (Whittaker & Lindzey 2004; Mao et al. 2005). A biased assessment of habitat selection may therefore result in inadequate management or conservation actions. Matched sampling designs and conditional logistic regressions are increasingly used in ecological research (e.g. for RSFs, Boyce 2006; for step selection functions, Fortin et al. 2005), and fixed-effects models may lead to mistaken inferences about selection whenever hypotheses such as IIA or homogeneous strength of selection among animals are not respected. We suggest that mixed-effects conditional logistic regression should become a valuable, and sometimes necessary, statistical tool for valid inference in ecological research. Acknowledgements Funding for this study was provided by Parks Canada Species at Risks Recovery Action and Education Fund, a program supported by the National Strategy for the Protection of Species at Risk, Natural Sciences and Engineering Research Council of Canada, Canada Foundation for Innovation, and l’Université Laval. We are grateful to L. O’Brodovich and D. Frandsen, M.-E. Fortin, K. Dancose and S. Courant for their assistance in the field, and to Pierre Racine for his help with SELES, and James Hodson for his editorial comments on the study. References Arthur, S.M., Manly, B.F.J., McDonald, L.L. & Garner, G.W. (1996) Assessing habitat selection when availability changes. Ecology, 77, 215–227. Bhat, C.R. (2001) Quasi-random maximum simulated likelihood estimation of the mixed multinomial logit model. Transportation Research Part B-Methodological, 35, 677–693. Boyce, M.S. (2006) Scale for resource selection functions. Diversity and Distributions, 12, 269–276. Boyce, M.S., Mao, J.S., Merrill, E.H., Fortin, D., Turner, M.G., Fryxell, J. & Turchin, P. (2003) Scale and heterogeneity in habitat selection by elk in Yellowstone National Park. Ecoscience, 10, 421–431. Bruun, M. & Smith, H.G. (2003) Landscape composition affects habitat use and foraging flight distances in breeding European starlings. Biological Conservation, 114, 179–187. Chen, Z. & Kuo, L. (2001) A note on the estimation of the multinomial logit model with random effects. The American Statistician, 55, 89–95. Compton, B.W., Rhymer, J.M. & McCollough, M. (2002) Habitat selection by wood turtles (Clemmys insculpta): an application of paired logistic regression. Ecology, 83, 833–843. 2010 The Authors. Journal compilation 2010 British Ecological Society, Journal of Animal Ecology, 79, 548–555 Mixed-effects models for habitat selection 555 Cooper, A.B. & Millspaugh, J.J. (1999) The application of discrete choice models to wildlife resource selection studies. Ecology, 80, 566–575. Craiu, R.V., Duchesne, T. & Fortin, D. (2008) Inference methods for the conditional logistic regression model with longitudinal data. Biometrical Journal, 50, 97–109. Fall, A. & Fall, J. (2001) A domain-specific language for models of landscape dynamics. Ecological Modelling, 141, 1–18. Fieberg, J., Rieger, R.H., Zicus, M.C. & Schildcrout, J.S. (2009) Regression modelling of correlated data in ecology: subject-specific and population averaged response patterns. Journal of Applied Ecology, 46, 1018–1025. Forester, J.D., Im, H.K. & Rathouz, P.J. (2009) Acccounting for animal movement in estimation of resource selection functions: sampling and data analysis. Ecology, 90, 3554–3565. Fortin, D., Fryxell, J.M. & Pilote, R. (2002) The temporal scale of foraging decisions in bison. Ecology, 83, 970–982. Fortin, D., Beyer, H.L., Boyce, M.S., Smith, D.W., Duchesne, T. & Mao, J.S. (2005) Wolves influence elk movements: behavior shapes a trophic cascade in Yellowstone National Park. Ecology, 86, 1320–1330. Fortin, D., Courtois, R., Etcheverry, P., Dussault, C. & Gingras, A. (2008) Winter selection of landscapes by woodland caribou: behavioural response to geographical gradients in habitat attributes. Journal of Applied Ecology, 45, 1392–1400. Fortin, D., Fortin, M.E., Beyer, H.L., Duchesne, T., Courant, S. & Dancose, K. (2009) Group-size-mediated habitat selection and group fusion-fission dynamics of bison under predation risk. Ecology, 90, 2480–2490. Gillies, C.S., Hebblewhite, M., Nielsen, S.E., Krawchuk, M.A., Aldridge, C.L., Frair, J.L., Saher, D.J., Stevens, C.E. & Jerde, C.L. (2006) Application of random effects to the study of resource selection by animals. Journal of Animal Ecology, 75, 887–898. Hay, M.E. & Fuller, P.J. (1981) Seed escape from heteromyid rodents – the importance of microhabitat and seed preference. Ecology, 62, 1395–1399. Hebblewhite, M. & Merrill, E. (2008) Modelling wildlife–human relationships for social species with mixed-effects resource selection models. Journal of Applied Ecology, 45, 834–844. Hochman, V. & Kotler, B.P. (2007) Patch use, apprehension, and vigilance behavior of Nubian Ibex under perceived risk of predation. Behavioral Ecology, 18, 368–374. Johnson, C.J., Nielsen, S.E., Merrill, E.H., McDonald, T.L. & Boyce, M.S. (2006) Resource selection functions based on use-availability data: theoretical motivation and evaluation methods. Journal of Wildlife Management, 70, 347–357. Jones, N.D. & Nealson, T. (2003) Management of aggressive Australian magpies by translocation. Wildlife Research, 30, 167–177. Keating, K.A. & Cherry, S. (2004) Use and interpretation of logistic regression in habitat selection studies. Journal of Wildlife Management, 68, 774–789. Koper, N. & Manseau, M. (2009) Generalized estimating equations and generalized linear mixed-effects models for modelling resource selection. Journal of Applied Ecology, 46, 590–599. Manly, B.F.J., McDonald, L.L., Thomas, D.L., McDonald, T.L. & Erickson, W.P. (2002) Resource Selection by Animals: Statistical Design and Analysis for Field Studies, 2nd edn. Kluwer Academic, Dordrecht. Mao, J.S., Boyce, M.S., Smith, D.W., Singer, F.J., Vales, D.J., Vore, J.M. & Merrill, E.H. (2005) Habitat selection by elk before and after wolf reintroduction in Yellowstone National Park. Journal of Wildlife Management, 69, 1691–1707. MathWorks Inc. (2008) MATLAB Software: The Language of Technical Computing, Version R2008a. MathWorks Inc., Natick, MA, USA. McDonald, T.L., Manly, B.F.J., Nielson, R.M. & Diller, L.V. (2006) Discrete-choice modelling in wildlife studies exemplified by Northern Spotted Owl nighttime habitat selection. Journal of Wildlife Management, 70, 375–383. McFadden, D. & Train, K. (2000) Mixed MNL models for discrete response. Journal of Applied Econometrics, 15, 447–470. McLoughlin, P.D., Morris, D.W., Fortin, D., Vander Wal, E. & Contasti, A.L. (2010) Considering ecological dynamics in resource selection functions. Journal of Animal Ecology, 79, 4–12. Morrison, S., Barton, L., Caputa, P. & Hik, D.S. (2004) Forage selection by collared pikas, Ochotona collaris, under varying degrees of predation risk. Canadian Journal of Zoology, 82, 533–540. Revelt, D. & Train, K. (1998) Mixed logit with repeated choices: households’ choices of appliance efficiency level. Review of Economics and Statistics, 80, 647–657. Skrondal, A. & Rabe-Hesketh, S. (2003) Multilevel logistic regression for polytomous data and rankings. Psychometrika, 68, 267–287. Sukumar, R. (1991) The management of large mammals in relation to male strategies and conflict with people. Biological Conservation, 55, 93–102. Train, K.E. (2003) Discrete Choice Models With Simulation. Cambridge University Press, Edinburgh. Train, K.E. (2006) Mixed Logit Estimation by Maximum Simulated Likelihood. Matlab package. Available at: http://elsa.berkeley.edu/Software/abstracts/ train1006mxlmsl.html, accessed 3 February 2010. Verbeke, G. & Molenberghs, G. (2000) Linear Mixed Models for Longitudinal Data. Springer-Verlag, New York. Whittaker, D.G. & Lindzey, F.G. (2004) Habitat use patterns of sympatric deer species on Rocky Mountain Arsenal, Colorado. Wildlife Society Bulletin, 32, 1114–1123. Received 26 October 2009; accepted 15 January 2010 Handling Editor: Fanie Pelletier Supporting Information Additional Supporting Information may be found in the online version of this article. Appendix S1. Detailed description of the four simulation scenarios used to assess the effect of heterogeneous habitat selection among animals. Appendix S2. Calculation of the long-run probabilities of being in a given patch type and theoretical value of the RSF coefficient for patch type H1 under simulation scenario 1. Appendix S3. matlab code to estimate mixed-effects resource selection function for the free-ranging bison of Prince Albert National Park, Saskatchewan, Canada. As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors. 2010 The Authors. Journal compilation 2010 British Ecological Society, Journal of Animal Ecology, 79, 548–555