Bayesian and Penalised Regression Methods for Epidemiological Analysis Lab 1. Information-weighted averaging Consider the association between chocolate consumption and risk of stroke in a prospective cohort of middle-aged and elderly men (Larsson et al. Neurology, 2012). From a Cox proportional-hazards model, the adjusted rate ratio for stroke comparing the highest quartile of chocolate consumption (median 62.9 g/week) with the lowest quartile (median 0 g/week) was 0.83 (95% CL 0.70, 0.99). Assuming there is upward skewing of the chocolate-consumption distribution, the means must be higher than the medians, so we assume that the RRs at issue are for roughly 70 g (2.5 oz) per week (units are especially important to keep in mind when considering priors). Because stroke incidence was under 10% over the study period, we will ignore distinctions among risk, rate, and odds ratios. Question 1 Prior with null centre Although few studies had reported an inverse association between chocolate consumption and risk of stroke, strong associations seemed unlikely. We start by modeling this a priori idea by placing 2:1 odds on a RR between ½ and 2, and 0.95 probability on RR between 1/4 and 4, assuming a normal distribution for our prior. The implied prior distribution for the loge rate ratio (ln(RR)) would follow a normal distribution that satisfies: exp(prior mean – 1.96×prior standard deviation) = ¼ exp(prior mean + 1.96×prior standard deviation) = 4 a) What is the prior mean and prior variance of ln(RR)? b) What are the estimated ln(RR) and estimated variance from the observed (actual) data? c) What is the approximate posterior median and 95% posterior limits for RR (the 50th, 2.5th and 97.5th posterior percentiles for RR) based on information-weighted averaging? [Note, assuming normality for both the prior and the estimate allows us to calculate the posterior mean ln(RR) as a weighted average of the prior mean and the maximum-likelihood estimate from the data, where the weights are the inverse variances (the inverse variance is a measure of precision of or information in an estimate). The variance of the posterior distribution for ln(RR) is then one over the sum of the weights.] d) Check the distribution of the prior. Is the specified null prior compatible with the results from the analysis of the data alone? Note, ln(RR) and var(ln(RR)) can be derived from upper and lower bounds using the following formulae: ln(𝑅𝑅𝑢𝑝𝑝𝑒𝑟 ) + ln(𝑅𝑅𝑙𝑜𝑤𝑒𝑟 ) ln(𝑅𝑅) = 2 ln(𝑅𝑅𝑢𝑝𝑝𝑒𝑟July ) −24-25, ln(𝑅𝑅2014 𝑙𝑜𝑤𝑒𝑟 ) 2 Greenland S., Orsini N., Sullivan S., Simpson J.A. Melbourne 𝑣𝑎𝑟(ln(𝑅𝑅)) = [ ] 2 × 1.96 Question 2 Prior with non-null centre Four cohort studies had reported an inverse association between chocolate consumption and risk of stroke. Previous findings are pooled with a meta-analysis. Study | RR [95% Conf. Interval] % Weight ---------------------+----------------------------------------------Mink PJ 2007 | 0.850 0.700 1.030 46.99 Janszky I 2009 | 0.620 0.330 1.160 4.44 Buijsse B 2010 | 0.520 0.300 0.890 5.93 Larsson SC 2011 | 0.800 0.660 0.990 42.64 ---------------------+----------------------------------------------Fixed-effect Pooled RR | 0.793 0.695 0.906 100.00 Random-effects Pooled RR† | 0.786 0.677 0.913 100.00 ---------------------+----------------------------------------------† - DerSimonian & Laird method Heterogeneity chi-squared = 3.41 (d.f. = 3) p = 0.333 I-squared (variation in ES [percent variance in ln(RR)]attributable to heterogeneity) = 11.9% Estimate of between-study variance Tau-squared = 0.0031 The random-effects pooled relative risk of stroke for approximately 70 gr per week of chocolate consumption was 0.786 (95% CL 0.677, 0.913). Ordinarily, we should derive our prior from the random-effects results because they refer to the distribution of RRs across studies, and a good prior will allow for any potential RR variation across studies (the fixed-effects model simply assumes the RR are the same across studies). However the random-effects results are for the average ln(RR) across studies; they are not a prediction (prior) for a new study. To get a prediction for the current study, we will add the estimated ln(RR) variance across studies, tausquared (τ2 = 0.0031) to the variance we calculate from the random-effects interval, to get our prior variance for the current study. a) What are the prior mean, prior variance, and 95% prior limits for ln(RR) using the random-effects results (variance expanded to allow for prediction to a new study)? b) What are the approximate posterior median and 95% posterior limits for RR? Question 3 Reverse-Bayes analysis a) What normal prior would make the 95% posterior interval include RR=1? b) What is the hypothetical RCT result corresponding to such a prior if the incidence rate of stroke in the study population is about 5 per 1000 person-years? [Hint: Work out the number of stroke cases and person-time for a RCT with a 1:1 allocation of participants to high and low intake of chocolate. Var(ln(RR))= 2/A where A is the number of events in each exposure group(A=A1=A0 since the prior for ln(RR) is symmetric).] Greenland S., Orsini N., Sullivan S., Simpson J.A. Melbourne July 24-25, 2014