Department of Public Health and Primary Care, Cardiovascular Epidemiology Unit, Strangeways Research Laboratory, Cambridge, UK Mendelian randomization: The use of genetic variants as an instrumental variable for assessing causal associations in observational data Presenting author: Stephen Burgess Problem: How to assess the causal effect of a factor on an outcome if the data available is observational, not experimental? ? Factor of interest Difficulties: Outcome Confounding: association between factor of interest and competing risks means that those with different levels of the factor of interest cannot be directly compared. ? Factor of interest Confounding: If richer, healthier people have decreased intake of LDL-C, then this may simply mean that richer, healthier people have lower incidence of CHD. LDL-C may be a marker of good health, not a cause. Outcome Competing risk factors Reverse causation: the factor may not only affect the outcome, but the outcome may also affect the risk factor. Factor of interest Instrumental variables: (IVs) ? Outcome An instrumental variable is a variable which is: Reverse causation: If people with poor coronary health decrease their intake of LDL-C in response to subclinical disease (early warning signs of disease), then an association between LDL-C and CHD will be induced. 1) associated with the factor of interest (so the instrument defines groups differing in the factor), Instrumental variable: Suppose there is a common genetic variant which causes the body to retain more LDL-C from the diet, dividing the population into absorbers and nonabsorbers. 2) not associated with any other risk factor (so the instrument gives a fair test), 3) not associated with the outcome conditional on any risk factor (so the effect of the instrument must be via the factor of interest). 1): association between instrument and factor Instrument Competing risks Population 3): no direct association between instrument and outcome 2): no association between instrument and competing risks Factor Absorbers Outcome These conditions, as shown in the directed acyclic graph (DAG) above, ensure that instrumental variable estimates are not biased by confounding. Mendelian Genetic variants are ideal candidates to be used as instrumental variables as genes are: randomization: 1) generally specifically associated with biological factors, 2) determined at conception. These characteristics motivate use and validity of genetic instrumental variables and ensure estimates are not subject to bias due to reverse causation. Estimation: Example: What is the causal association of lipid levels on coronary heart disease (CHD)? — observational injurious association of low density cholesterol (LDL-C) and protective association of high density cholesterol (HDL-C) on CHD If all associations are linear and not subject to interactions, the causal effect of a factor on an outcome can be estimated by the ratio of: βGY regression coefficient of outcome (Y) on instrument (G) regression coefficient of factor (X) on instrument (G) = βGY / βGX = βXY G βGX X βXY Y Non-absorbers All other factors equal between groups Compare outcome between groups We see from the diagram that the groups defined by the instrumental variable are similar to arms in a randomized controlled trial. Assumptions for analysis: We assume that the instrument is only associated with lipid levels. This analysis would be invalid if, for example: – the genetic variant was correlated with another variant associated with, say, triglyceride levels. Current work: If cross-sectional data is available on a number of different factors, each of which has an associated instrumental variable, how can the network of associations between the factors be efficiently estimated? For example, if we are interested in the causal effect of lipid levels on CHD, and have measured instruments which affect LDL-C, HDL-C and triglycerides, how would we estimate a causal association? What if we believe that LDL-C levels may affect triglyceride levels? Could we estimate a direct effect of LDL-C on CHD, or an indirect effect of the increase in LDL-C on CHD via triglycerides? How would you account for structural uncertainty in the model? Take-home message: Current methods for instrumental variable analysis enable causal effects to be estimated in a limited and often unrealistic context, where an instrumental variable is only associated with a single factor. More sophisticated methodology is required to estimate causal effects in a more realistic situation, where a range of instruments are associated with a range of interacting factors. Such analysis requires detailed cross-sectional observational and genetic data, and lots of it! Contact details — E: sb452@medschl.cam.ac.uk, T: 01223 740002