Stat 534
Homework # 1 , September 10, 2015
Fall 2015
Due: 4 pm, 28 Sept 2015, to my office (2121 Snedecor) or mailbox (in 1121 Snedecor)
Reminders / Notes:
1. Choose 3 of the 4 problems. You are not expected to do all four.
2. Please see the Homework guidelines page of the class web site for homework policies and the format for answering a data question like problem 3.
3. You are encouraged to work together but you must write up your own answers.
4. This is not a programming class and not a calculus class. I will not write R functions for you, but I will certainly help you get your functions working. You are welcome to use
Program MARK or other software if it will answer the question. If you do use something other than R, please include that information in your answer.
5. You do not need to include your code, but I might ask for it if your answer is quite different from mine and everyone else’s.
6. I will be in Ireland Sept 24 - 27. No lecture on Thursday 9/24.
7. I will check e-mail while gone, so e-mail if you have any questions. I won’t be able to respond Friday morning (PhD examination) or most of Sunday (traveling).
1
Problem 1. The following data come from a study of cottontail rabbits, Sylvilagus floridanus .
Animals were captured on 6 occasions. Captured animals were uniquely marked and released.
Because there are so many different capture histories, I’m giving you the data in summarized form. Remember, the number marked in the population ( M i
) is the number of marked animals prior to that capture occasion.
Capture occasion 1 2 3 4 5 6
# caught, n i
# marked caught, m i
18
0
23
7
28
10
22
12
25
19
24
22
# newly caught, u i
18 16 18 10 6 2
# marked in pop, M i
0 18 34 52 62 68 70
Assume that the population is closed over the study period. The investigators want to estimate the number of cottontails in this population.
1. Estimate the population size and capture probability under model M0.
2. Estimate the population size and capture probabilities under model Mt.
3. Estimate the population size and capture probabilities under model Mb.
4. Which of these three models is the most appropriate for these data? Briefly explain your choice.
5. Using the most appropriate model, calculate 99% confidence intervals using Wald’s method and the profile method.
6. These two confidence intervals are somewhat different. Using these data or quantities derived from these data, explain why they differ.
7. Calculate the model averaged estimate of the population size and its standard error.
8. The best model that you identified in question 4 may not fit the data well at all. Describe a reasonable method to assess whether a particular model actually fits the data. You don’t need to carry out that assessment.
Note: I do not have a specific answer in mind. This is a “think about something you haven’t seen before question” (although you may have seen something related).
2
Problem 2. We have discussed model M t with time varying parameters and model M b behavioural heterogeneity. These can be combined. The general form of model M tb with allows both
P[capture]’s to vary by trapping occasion. That is, for each of the i trapping occasions, P[capture
| not previously caught] = c i and P[capture | has been previously caught] = capture occasions (i.e. a total of 8 possible capture histories).
r i
. Consider three
1. How many parameters are in the general Model M tb for 3 capture occasions?
2. Write out the 8 possible capture histories and their associated probabilities.
3. Write out the log-likelihood function for a multinomial model using 8 possible capture histories. Combine terms wherever possible.
4. Use the simplified expression of the log-likelihood to determine the sufficient statistics.
How many sufficient statistics are there? See notes at the end of the problem description.
5. Are the parameters in the general M bt identifiable? Explain why or why not.
One simplification of the general M tb model is assume the initial capture probability differs among occasions, but the recapture probability is the same at all times. That is, for each of the i trapping occasions, P[capture | not previously caught] = c i and P[capture | has been previously caught] = r . Again consider three capture occasions.
6. What are the sufficient statistics for the “simplified” M tb
?
7. Are the parameters in the “simplified” M tb identifiable? Explain why or why not.
Notes: remember that linear combinations of sufficient statistics do not count as additional sufficient statistics. I.e., if A and B are sufficient statistics, A-B is not a third sufficient statistic.
Sometimes this can be spotted easily. For example, if the log likelihood includes the terms
( x
1
+ x
2
) log c
1
, ( x
3
+ x
4
) log c
2
, and ( x
1 are only two sufficient statistics, x information.
1
+ x
2
+ x
2
+ x
3 and x
3
+ x
+
4 x
4
) log r
2 because x
, it should be clear that there
1
+ x
2
+ x
3
+ x
4 is not “new”
Sometimes, it can be hard to spot all the linear dependencies. Here’s a way to do that numerically: Collect all the necessary terms, e.g.
x
1
+ x
2
, into a vector, then write that vector as product of a matrix of coefficients times the vector of multinomial cell counts. For situation in the previous paragraph, that means:
x
1 x
1 x
3
+ x
2
+ x
2
+ x
4
+ x
3
+ x
4
= C
x
1 x
2 x
3 x
4
, where
C =
1 1 0 0
0 0 1 1
1 1 1 1
3
The number of sufficient statistics is the row rank of C .
This can be computed in R either by doing a QR decomposition, qrx = qr(C) and looking at qrx$rank or by doing a singular value decomposition of C and looking at the number of non-zero singular values: svdx = svd(C); round(svdx$d, 6) . The round function is used to print values of d that are really close to 0, e.g. 4.69E-16, as 0.
Problem 3. One of the classic data sets in the mark-recapture world was collected by French scientists on the European Dipper. This is a small drab bird that lives along mountain streams.
Birds are individually marked. The primary data set is annual data for many years, where the population is not closed. The data set in dipper.csv is multiple censuses in one year, and the population can be assumed to be closed. The columns are the capture history (o1-o4) and the number of females (f)and the number of males (m) with that capture history. The capture history is coded as caught on that occasion (1) or not caught (0).
The investigators are interested in: a) The number of dippers in the study area b) Whether the sex ratio deviates from 1:1, and c) Whether the capture probability or probabilities are the same for males and females.
What can you tell the investigators?
Problem 4. This problem evaluates the consequences of design choices. If you suspect that you will get a large se for ˆ , you could get more data in one of two ways: a) sample more occasions, or b) put more effort into sampling each occasion to increase the capture probability.
This problem illustrates how the relative merits of each approach could be evaluated.
We consider model Mb for a population with N=200 individuals. The standard capture method has a first-time capture probability of c = 0.2. The “baseline” scenario has 3 capture occasions.
Because recaptures contain no information about N , the recapture probability is not specified.
1. Determine the 2 x 2 observed information matrix for N and c . This will be a function of
N , c and one or more sufficient statistics.
2. Calculate the expected values of the sufficient statistics for the “baseline” design, in which
N = 200 and c = 0.2.
3. Substitute these into the observed information matrix and calculate the standard error for
ˆ
.
4. Repeat parts 2 and 3 when there are 4 capture occasions.
5. Repeat parts 2 and 3 when you increase sampling effort so that the first-time capture probability increases to 0.3.
6. What advice will you give to the biologist?
4