Statistical Distributions Practical

advertisement
Statistical Distributions Practical
1. QQ Plots and Quantile Normalisation, Fitting a distribution to data.
This exercise uses the data in the file mt.txt to explore qqplots and quantile
normalization. The file contains two columns, giving case-control status and
mitochondrial levels for a set of subjects.
(i) Generate and plot empirical densities and cumulative distribution functions
for the cases and the controls, using the R functions density() and ecdf().
(ii) Test if the distributions are different using the Kolmogorov-Smirnov test, the
t-test and compare the variances using the F-test.
(iii) Use the R function fitdistr() to try to find parametric distributions that
model the distributions for cases and controls. Note – I don't know the answer
to this – I have tried fitting the Gamma distribution without much success, but
you should try others such as the log normal and the extreme value distributions
(using the R package gev).
2. The Central Limit Theorem
∑𝑁 𝑥
The CLT says that the distribution of the sample mean 1 𝑖⁄𝑁 converges to
2
𝑁 (𝜇, 𝜎 ⁄𝑁) , where 𝜇, 𝜎 2 are the mean and variance of the observations. Write
an R program to demonstrate this. For 𝑁 = 1,5,10,25,50,60,75,100, the program
should generate 1000 sets samples of size 𝑁 from a given distribution (try
sampling from 𝑁(10,2), 𝜒𝑘2 , 𝑃𝑜(𝜆), 𝑇𝑘 , where 𝑘 = 1,2,10, 𝜆 = 1,10). There are two
types of convergence to investigate: (a) convergence of the sample mean to the
true mean 𝜇. This can be explored by computing the standard deviation of the
sample means, and their quantiles;(b) convergence of the distribution of the
2
sample means to 𝑁 (𝜇, 𝜎 ⁄𝑁). This can be visualized by a series of qqplots,
comparing the distribution of the sample means to a Normal, using qqnorm().
What do you notice about the behavior of data sampled from the 𝑇1 distribution?
Download