Exercises for Thusday

advertisement
Exercises for 4th day
AM
1. Get familiar with the following distributions and their random number generators:
rnorm
normal distribution
rexp
exponential distribution
rpois
Poisson distribution
runif
uniform distribution
rbinom
binomial distribution
Simulate random number vectors with different parameters and plot histograms of them. Try how
density, distribution and quantile functions work (e.g. dnorm, pnorm, qnorm).
2. Get familiar with creating a function in R by coding a simple function mymean that calculates
mean of two numbers that are given in the function call. Then proceed to a more complex function:
Code a function that takes as an input any matrix and adds a Poisson random number to each
positive matrix element and an exponential random number to any negative element. Parameters for
these distributions are user defined at the function call. (Advice: ncol() and nrow() functions
return number of matrix rows and columns.)
3. Code a function that simulates random numbers from a left-side truncated normal distribution,
i.e. random numbers only exceeding a fixed threshold. Let this threshold, as well as the number of
random numbers to be generated and distribution parameters to be freely defined at the function
call. Illustrate simulated patterns. (Advice: first create a vector the length of which is the requested
number of random numbers, and derive initially random numbers from a non-truncated distribution.
Then use while function to simulate a new random number for each vector element that does not
fulfill the given condition.)
4. Data rats.txt contains life-time (age at death) of 1582 rats. Calculate mean life-time. Produce
95% confidence interval (CI) for mean using bootstrap sampling. Advice: Sample from the dataset
1582 measurements with replacement and calculate mean. Repeat this for e.g. 1000 times, each time
recording the mean. Now 2.5% and 97.5% quantiles of the distribution of means then give the 95%
CI.
PM
1. The following dataset summarizes lung disease occurence in relation to employment in a
hazardous, dusty place.
mydata=data.frame(n=c(37, 139, 5, 22, 16, 75, 4, 24, 21, 30, 8, 9, 77, 31, 47, 15), y=c(3, 25, 0, 2, 0,
6, 0, 1, 8, 8, 2, 1, 31, 10, 5, 3), smk= c(1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1 ,1, 0 ,0), sex= c(1,1, 0, 0,1, 1,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1), yrs=c(5, 5,5, 5, 5,5, 5, 5,15, 15, 15, 15, 25, 25, 25, 25))
n=size of group, y=number of people in a group affected with lung disease, smk=dummy code 1 for
smoker, sex coded 1 for male, and yrs is years of employment in a hazardous, dusty workplace
a) Write the data as a data frame with appropriate variable types.
b) Fit a logistic regression to assess the link between the years of exposure and lung disease.
c) Extend the model to include sex. Does sex make a difference to the probability of failure? How
do you interpret the intercept in this model?
d) Extend the model further to include smk. What is the interpretation of the intercept now?
e) As an extra, fit the model with yrs and sex*smk
2. The average number of bacteria in 1ml of a liquid is known to be 4. Assuming that the number
of bacteria follows a Poisson distribution, find using R the probability that in 1ml of the liquid there
will be: a) no bacteria, b) 4 bacteria, c) no more than 3 bacteria. Find also the probability that there
will be no more than 2 bacteria in 3 ml of the liquid.
3. The effects of a contaminant with regard to a health defect outcome have been examined using
the distance to the putative contaminant as surrogate for exposure. The numbers of cases per 1000
individuals at different levels of exposure have been found as in the table below.
# of cases
Distance
11
5.3
6
13.1
9
11.2
5
15.4
4
21.7
3
28
2
31.3
0
33.1
1
38.2
Model number of cases by distance as an explanatory variable and make a plot of the predicted
number of cases for distances ranging from the 15 to 42 in increments of 3.
Download