Krishna Pacifici
Department of Applied Ecology
NCSU
January 10, 2014
Why, what, and how?
Why collect the data?
What type of data to collect?
How should the data be collected in the field and then analyzed?
Clear objectives help relate all three components.
Why?
Clear objectives
How will the data be used to discriminate between scientific hypotheses about a system?
How the data will be used to make management decisions?
For example:
Determine overall level of occupancy for a species in particular region.
Compare the level of occupancy in two different habitat types within that region.
What?
Many kinds of data
Population-level
Population size/density
Survival
Immigration & emigration
Presence/absence
Community-level
Persistence
Colonization & extinction
Species richness/diversity
How?
Sampling and Modeling
Interest lies in making inference from a sample to a population
Statistics!
Want it to be repeatable and accurate
Others should understand what you have done and be able to replicate
Many different modeling/analysis approaches
Distance sampling, multiple observer, capturerecapture, occupancy modeling…
ESTIMATE ATTRIBUTES (PARAMETERS)
Abundance/ density
Survival
Occurrence probability
ALLOW LEGITIMATE EXTRAPOLATION FROM
DATA TO POPULATIONS
PROVIDE MEASURES OF STATISTICAL
RELIABILITY
ACCURATE– LEADING TO UNBIASED ESTIMATES
REPEATABLE– ESTIMATES LEAD TO SIMILAR
ANSWERS
EFFICIENT– DO NOT WASTE RESOURCES
HOW GOOD “ON AVERAGE” AN ESTIMATE IS
CANNOT TELL FROM A SINGLE SAMPLE
DEPENDS ON SAMPLING DESIGN, ESTIMATOR,
AND ASSUMPTIONS
*
*
*
*
* *
*
*
TRUE VALUE
SAMPLE
ESTIMATE
AVERAGE ESTIMATE
BIAS
*
*
TRUE VALUE
*
*
SAMPLE
ESTIMATE
* *
AVERAGE ESTIMATE
*
*
*
*
* *
*
*
SAMPLE
ESTIMATE
*
*
* SAMPLE
ESTIMATE
*
* *
*
*
CAN BE IMPRECISE BUT UNBIASED..
OR
*
AVERAGE ESTIMATE
*
* SAMPLE
ESTIMATE
*
*
*
*
*
TRUE VALUE
TRUE VALUE
SAMPLE
ESTIMATE
*
*
*
*
*
*
*
*
AVERAGE ESTIMATE
AVERAGE ESTIMATE
*
SAMPLE
ESTIMATE
*
*
*
*
*
TRUE VALUE
*
*
ACCURATE=UNBIASED & PRECISE
TRUE VALUE
SAMPLE
ESTIMATE *
*
*
*
* *
*
*
AVERAGE ESTIMATE
HOW DO WE MAKE ESTIMATES
ACCURATE ?
KEEP BIAS LOW
SAMPLE TO ADEQUATELY REPRESENT
POPULATION
ACCOUNT FOR DETECTION
KEEP VARIANCE LOW
REPLICATION (ADEQUATE SAMPLE SIZE)
STRATIFICATION, RECORDING OF COVARIATES,
BLOCKING
Spatial sampling
Proper consideration and incorporation of detectability
What is the objective?
What is the target population?
What are the appropriate sampling units?
Size, shape, placement
Quantities measured
Field sampling must be representative of the population of inference
Incomplete detection MUST be accounted for in sampling and estimation
What is the objective?
Unbiased estimate of population density of snakes
(e.g., cobras) on Corbett National Park
Coefficient of variation of estimate < 20%
As cost efficient as possible
What is the target population?
Population in the NP
What are the appropriate sampling units?
Quadrats?
Point samples?
Line transects?
Sampling units- nonrandom placement
Road
Advantages
Easy to lay out
More convenient to sample
Disadvantage
Do not represent other (off road) habitats
Road may attract (or repel) snakes
Road
Sampling units- random placement
Advantages
Valid statistical design
Represents study area
Replication allows variance estimation
Disadvantage
May be logistically difficult
Harder to lay out
May not work well in heterogeneous study areas
Stratified sampling
Advantages
Controls for heterogeneous study area
Allows estimation of density by strata
More precise estimate of overall density
Disadvantages
More complex design
May require larger total sample
Single, unreplicated line
Some violations of assumptions can be OK – and even necessary (idea of “robustness”)
These are ideals to strive toward
Good if you can achieve them
If you can’t, you can’t– but study results may need different interpretation
Estimation: from Count Data to
Population (I)
Geographic variation (can’t look everywhere)
Frequently counts/observations cannot be conducted over entire area of interest
Proper inference requires a spatial sampling design that permits inference about entire area, based on a sample
A valid sampling design
Allows valid probability inference about the population
Statistical model
Allows estimates of precision
Replication, independence
Systematic sampling
Can approximate random sampling in some cases
Cluster sampling
When the biological units come in clusters
Double sampling
Very useful for detection calibration
Adaptive sampling
More efficient when populations are distributed
“clumpily”
Dual-frame sampling
Estimation: from Count to
Population (II)
Detectability (can’t see everything in places where you do look)
Counts represent some unknown fraction of animals in sampled area
Proper inference requires information on detection probability
Field sampling must be designed to meet study or conservation objectives
Field sampling must be representative of the population of inference
Incomplete detection MUST be accounted for in sampling and estimation
Species status = present or absent
Coarse measure of population status
Proportion of occupied patches
Data can be collected efficiently over large spatial and temporal extents
Species and community-level dynamics
Surveys of geographic range
Habitat relationships
Metapopulation dynamics
Observed colonization and extinction
Extensive monitoring programs: 'trends' or changes in occupancy over time
Conduct “presence-absence” (detection-nondetection) surveys.
Estimate what fraction of sites (or area) is occupied by a species when species is not always detected with certainty, even when present ( p
< 1).
‘Site’: Arbitrarily defined spatial unit (forest patch of a specified size) or discrete naturally occurring sampling units (ponds).
MacKenzie et al. 2002 (Ecology)
Key design issues: Replication
Temporal replication: repeat visits to sample units
Replicate visits occur within a relatively short period of time
(e.g., a breeding season)
Spatial replication: randomly selected ‘sites’ or sample units within area of interest
Basic Sampling Scheme:
Single Season
s sites are surveyed, each at k distinct sampling occasions.
Species is detected/not detected at each occasion.
Necessary information:
Data summary → Detection histories
Detection history: Record for each visited site or sample unit
1 denotes detection
0 denotes nondetection
Example detection history: h i
= 1 0 0 1 0
Denotes 5 visits to the site
Target species detected during visits 1 and 4
0 does not necessarily mean the species was absent
Not detected, but could be there!
Model Parameters: Single-Season
Models 𝜓 𝑖
-probability site i is occupied.
p ij
-probability of detecting the species in site i at time j, given species is present .
• Sites are closed to changes in occupancy state between sampling occasions
• No heterogeneity that cannot be explained by covariates
• The detection process is independent at each site
• > 500 meters apart
Usually conducted as multiple discrete visits (e.g., on different days)
Can also use multiple surveys within a single visit
Multiple independent observers
Potentially introduce heterogeneity into data
Single visit to each site vs. multiple visits to each site
Rotate observers amongst sites on each day
Rotate order each site is sampled within a day
Several important issues to consider:
1.
Clear objectives that are explicitly linked to science or management
2.
3.
4.
Selection of sampling units
Probabilistic sampling design
Size of unit relative to species of interest
Timing of repeat surveys
“closed”
Relaxed for lab project
Allocation of survey effort
Survey all of the sites equal number of times?
Getting To Know
PRESENCE is software that has been developed to apply these models to collected data.
Within PRESENCE you can fit multiple models to your data.
PRESENCE stores the results from each model and presents a summary of the results in a model selection table using AIC.
The analysis is stored in a project file (created from the
File menu).
A project consists of 3 files, *.pao, *.pa2 and *.pa2.out
*.pao is the data file
*.pa2 stores a summary of the models fit to the data
*.pa2.out stores the full results for all the models
Number crunching window
Point and click window
PRESENCE consists of 2 main windows
When you create a new project, you must specify the data file (if previously created), or input the data to be analysed.
Once the data file has been defined and selected, the filename for the project file will be the same as the data file.
To enter data specify the number of sites, survey occasions, site-specific and survey-specific
(sampling) covariates.
Then select the Input
Data Form .
The No.
Occasions/season box is used for multi-season data. You must list the number of surveys per season, separated with a comma.
Data can be copied and pasted (via the menus only) from a spreadsheet into each respective tab.
You can also enter data directly, or insert from a comma delimited text (.csv) file.
Note the number of PRESENCErelated windows now open.
Once data has been entered, you must save the data before closing the window!
After saving your data and closing the data window, check that the correct data filename appears here. If not then will have to select the file manually.
Make sure you click OK before proceeding.
After setting up your project, an empty Results
Browser window should appear.
The type of analysis to perform is selected from the run menu.
Make sure you see this before attempting to run any models!