Likelihood_practical_matlab

advertisement
WTCHG course in statistical modelling and data analysis
Likelihood: Practical exercises using Matlab
Gil McVean
1. Calculating MLEs and likelihood surfaces
In population genetics the distribution of the number of mutational differences, k, between the copies of a
gene in two randomly sampled chromosomes is described by the formula
𝑃(𝑘|𝜃) = (
1
𝜃 𝑘
)(
)
1+𝜃 1+𝜃
, where  is a quantity called the population mutation rate. Answer the following questions.
•
•
•
•
•
What name is given to this form of distribution? [Hint: look back at the distributions lecture]
What is the expected number of mutational differences between two randomly sampled genes?
Write down the log likelihood function and find an expression for the maximum likelihood
estimate.
If I observe 23 differences between a pair of genes at one locus and 15 at another, what is the
maximum likelihood estimate of ?
Draw the log-likelihood surface for  for the above example. Find the values of  for which the
log-likelihood is 2 units less than the log-likelihood at the MLE.
2. Likelihood ratio tests
In the previous question we informally compared different values for the parameter using the log-likelihood
surface. In this question we are going to look more formally at the idea of likelihood ratio tests. The pdf for
the Normal distribution is
𝑓(𝑥|𝜇, 𝜎) =
i.
ii.
iii.
iv.
1
√2𝜋𝜎
𝑒
−
1
(𝑥−𝜇)2
2𝜎2
Simulate 10 random variables from a Normal(0,1) distribution using Matlab (either using the in-built
function randn or using the polar transformation method).
Find the MLE for the parameter  given your sample assuming that you know the variance. Record
the log-likelihood at the MLE.
Find the log-likelihood for 𝜇 = 0. Calculate twice the difference in log-likelihood between the MLE
and the truth. Record this number.
Repeat steps i-iii 1000 times and plot a histogram of the quantity calculated in iii. Using a qqplot
compare this distribution to that of a chi-squared distribution with one degree of freedom (you can
do this by simulating normal random variables and squaring them). What is the probability of
observing twice the difference in log-likelihood of more than 3.84?
Gil McVean
Last modified 01/11/2008
WTCHG course in statistical modelling and data analysis
v.
vi.
Repeat steps i-iv, but where you estimate both the mean and variance of the normal distribution for
each sample of size 10. How does the effect of estimating two parameters influence the distribution
of the quantity calculated in ii?
Why is the index parameter of the chi-squared distribution referred to as the degrees of freedom?
3. Sufficient statistics
[Hard]. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a series of n random variables sampled from the distribution
𝑓(𝑥|𝜃) = 𝜃𝑥 −2,
•
•
•
•
0<𝜃≤𝑥<∞
Answer the following
What is a sufficient statistic for 𝜃?
Find the mle of 𝜃
Find the method of moments estimator of 𝜃.
Obtain the cdf of the distribution and use this to sample 10 random variables from the distribution with 𝜃 =
1. Obtain the mle for 𝜃. Repeat this 100 times and comment on the performance of the estimator in terms
of bias, variance and distribution.
Gil McVean
Last modified 01/11/2008
Download