Exercises for 4th day AM 1. Get familiar with the following distributions and their random number generators: rnorm normal distribution rexp exponential distribution rpois Poisson distribution runif uniform distribution rbinom binomial distribution Simulate random number vectors with different parameters and plot histograms of them. Try how density, distribution and quantile functions work (e.g. dnorm, pnorm, qnorm). 2. Get familiar with creating a function in R by coding a simple function mymean that calculates mean of two numbers that are given in the function call. Then proceed to a more complex function: Code a function that takes as an input any matrix and adds a Poisson random number to each positive matrix element and an exponential random number to any negative element. Parameters for these distributions are user defined at the function call. (Advice: ncol() and nrow() functions return number of matrix rows and columns.) 3. Code a function that simulates random numbers from a left-side truncated normal distribution, i.e. random numbers only exceeding a fixed threshold. Let this threshold, as well as the number of random numbers to be generated and distribution parameters to be freely defined at the function call. Illustrate simulated patterns. (Advice: first create a vector the length of which is the requested number of random numbers, and derive initially random numbers from a non-truncated distribution. Then use while function to simulate a new random number for each vector element that does not fulfill the given condition.) 4. Data rats.txt contains life-time (age at death) of 1582 rats. Calculate mean life-time. Produce 95% confidence interval (CI) for mean using bootstrap sampling. Advice: Sample from the dataset 1582 measurements with replacement and calculate mean. Repeat this for e.g. 1000 times, each time recording the mean. Now 2.5% and 97.5% quantiles of the distribution of means then give the 95% CI. PM 1. The following dataset summarizes lung disease occurence in relation to employment in a hazardous, dusty place. mydata=data.frame(n=c(37, 139, 5, 22, 16, 75, 4, 24, 21, 30, 8, 9, 77, 31, 47, 15), y=c(3, 25, 0, 2, 0, 6, 0, 1, 8, 8, 2, 1, 31, 10, 5, 3), smk= c(1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1 ,1, 0 ,0), sex= c(1,1, 0, 0,1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1), yrs=c(5, 5,5, 5, 5,5, 5, 5,15, 15, 15, 15, 25, 25, 25, 25)) n=size of group, y=number of people in a group affected with lung disease, smk=dummy code 1 for smoker, sex coded 1 for male, and yrs is years of employment in a hazardous, dusty workplace a) Write the data as a data frame with appropriate variable types. b) Fit a logistic regression to assess the link between the years of exposure and lung disease. c) Extend the model to include sex. Does sex make a difference to the probability of failure? How do you interpret the intercept in this model? d) Extend the model further to include smk. What is the interpretation of the intercept now? e) As an extra, fit the model with yrs and sex*smk 2. The average number of bacteria in 1ml of a liquid is known to be 4. Assuming that the number of bacteria follows a Poisson distribution, find using R the probability that in 1ml of the liquid there will be: a) no bacteria, b) 4 bacteria, c) no more than 3 bacteria. Find also the probability that there will be no more than 2 bacteria in 3 ml of the liquid. 3. The effects of a contaminant with regard to a health defect outcome have been examined using the distance to the putative contaminant as surrogate for exposure. The numbers of cases per 1000 individuals at different levels of exposure have been found as in the table below. # of cases Distance 11 5.3 6 13.1 9 11.2 5 15.4 4 21.7 3 28 2 31.3 0 33.1 1 38.2 Model number of cases by distance as an explanatory variable and make a plot of the predicted number of cases for distances ranging from the 15 to 42 in increments of 3.