This site contains statistical tools for analyzing data from satellite

advertisement
This site contains statistical tools for analyzing data from satellite dive recorders, as
described in:
Simpkins, M. A., K. L. Laidre and P. J. Heagerty. 2005. Multivariate regression of
satellite-linked dive recorder data: simultaneous analysis of all bins. Marine Mammal
Science in press.
Specifically, this site includes statistical code and descriptions for two integrated R or
Splus 6.1 functions that use the output from the R / Splus glm function and estimate an
empirical variance-covariance matrix using “working independence” generalized
estimating equations (GEE). Similar analyses can also be conducted using GEE
options of the SAS procedure GENMOD, or using the “cluster(id)” option with the
Poisson regression methods in STATA. The functions shown below were developed by
Thomas Lumley PhD (UW Biostatistics) and used by Simpkins et al. (2005) in their
analysis of harbor seal diving behavior.
To utilize the functions below, one must first create a generalized linear model (GLM)
using the R / Splus function glm. Simpkins et al. (2005) assumed complete
independence among data points for the GLM of harbor seal diving behavior. The
empirical variance estimation conducted within GEE, using the functions shown below,
provides a valid large-sample estimate of the variance-covariance matrix for the
regression estimates even when an incorrect correlation model (e.g., complete
independence) is specified.
After the GLM analysis is complete, the resulting glm object can be used as input into
the infjack.glm function shown below. This function, in turn, calls the estfun.glm function
and estimates an empirical variance-covariance matrix for the GLM, using GEE
techniques. The resulting empirical variance-covariance matrix is estimated based on
the observed variance among individuals. The infjack.glm function requires that each
data point be coded by individual, using a “group” variable (IDvector below). The group
variable should be a single column that codes each data record from the original
database with an identification code for the associated individual (unique code for each
individual). After a GLM is created and the resulting glm object is saved (e.g., as “fit”
below) the empirical variance-covariance matrix for the model could be estimated, using
the infjack.glm and estfun.glm functions. Both functions are shown below and can be
copied and pasted into R / Splus at the command line to create the functions within an
active R / Splus 6.1 workspace. Text files infjack.txt and estfun.txt are available for
download via links at the bottom of this page.
Interested parties should refer to Simpkins et al. (2005) for clarification regarding the
use of these functions for analyzing SDR data. Further questions should be addressed
to the authors directly.
Example Usage:
#
# Fit a regression model
#
fit <- glm( y ~ x1 + x2 )
#
# Assume IDvector is a vector that
# denotes unique individuals (clusters)
#
# Calculate empirical covariance
#
EmpCov <- infjack.glm( fit, IDvector )
#
# Print coefficient estimate, standard error, and Z statistic
#
beta.estimate <- fit$coef
EmpSE <- sqrt( diag( EmpCov ) )
Z.stat <- beta.estimate/EmpSE
#
print( cbind( beta.estimate, EmpSE, Z.stat ) )
#
Function 1: estfun.glm
estfun.glm <- function(glm.obj)
{
########################################
# Original author: T. Lumley
#
#
Biostatistics
#
#
University of Washington
#
########################################
#
##
# Create X matrix from glm object
##
if(is.matrix(glm.obj$x))
xmat <- glm.obj$x
else {
mf <- model.frame(glm.obj)
xmat <- model.matrix(terms(glm.obj), mf)
}
##
# Calculate variance weight and residual (Y-fitted)
##
output <- residuals(glm.obj, "working") * glm.obj$weights * xmat
##
# Output this matrix
##
output
}
Function 2: infjack.glm
infjack.glm <- function(glm.obj,groups)
{
########################################
# Original author: T. Lumley
#
#
Biostatistics
#
#
University of Washington
#
########################################
#
##
# Run estfun.glm to get GLM score function
##
umat <- estfun.glm(glm.obj)
##
# Sum scores within each cluster (individual)
##
usum <- rowsum(umat,groups,reorder=F)
##
# Calculate the empirical variance of the summed scores
# and then compute sandwich variance-covariance matrix
##
modelv <- summary(glm.obj)$cov.unscaled
output <- modelv%*%(t(usum)%*%usum)%*%modelv
##
# Output empirical variance-covariance matrix
##
output
}
Download