Data Analysis Workshop Chuck Spiekerman (cspieker@u) Karl Kaiyala (kkaiyala@u) Course Outline • February 20 – How to describe your study – Choosing an Analysis method • March 13 – Student presentations of study designs and data-analysis plans • March 20 – Student presentations of data analyses Describing your study • Next session (3/13) we are asking you to present a description of your planned study • The next few slides give an outline of suggested components of this description • Attention to all these components should help you (and/or a consultant) decide on appropriate methods of statistical analysis Study Design Description • Specific Aims (what?) • Background (why?) • Previous work (who?) * • Study methods (how?) – several components *optional for student presentations Specific Aims • Describe the scientific question(s) • Be specific and precise • Stick to the study at hand Background and Motivation • Relevance of this research – Existing knowledge – Identify gap this research will fill • Relate to specific aims • If part of a larger study, where does this study fit? Study Methods Components • Primary outcomes • Study population • Methods and procedures * • Data analysis plan *optional for student presentations Primary Outcomes • Precise definition of key measurement (individual data item) of interest • Justify why this outcome and not something else. – Relate to specific aim • Details of collection can be left to methods and procedures section Study population • How were the subjects selected? – Exclusion and inclusion criteria – Group classification? – Matching? – Randomization? Data analysis plan • Outline data analysis for each specific aim • Make clear which procedures are being used toward which aim • Usually some simple tables and plots should be sufficient • Keep it simple Forming an analysis plan Two important questions 1. What do you want to do/show? 2. What kind of data … i. …will answer your question best? ii. … can you get? iii. … do you have? Types of data • Continuous – Differences between values have meaning, and are interpretable independent of the values themselves – E.g. difference between 8 and 9 basically the same as difference between 1 and 2. • Ordinal – Values have an order, but differences are not easily interpretable (e.g. good, fair, poor) Types of data (cont.) • Categorical – Values are descriptive but do not have any obvious ordering. E.g. tx A, tx B, tx C. • Binary, Dichotomous – Fancy names for categorical variables with only two possible values. Types of data (sampling) • one-sample – Refers to situation when values of interest all come from one group and will be compared to a known quantity (e.g. “change greater than zero”) • two-sample – When data are divided/sampled in two groups and observed values compared between groups. What do you want to do? • • • • • • • Show evidence of differences Estimate population parameters Demonstrate equivalence Show evidence of association Create/validate a predictive model Assess agreement or reliability Other? Showing evidence of differences • Standard hypothesis testing procedures, usually comparing means or proportions • Which test will depend on type of data. Usual suspects (YMMV) – T-test or ANOVA for Continuous data – Chi-square test for Categorical data – Rank-based tests (e.g. Wilcoxon) for Ordinal data • Use Rosner flowchart for guidance • Supplement p-value with estimate of difference (with confidence interval) Estimate Population Parameters • P-values and hypothesis tests aren’t always necessary • Sometimes you don’t really want to compare things but only estimate values • Estimate parameters of interest and supplement with confidence intervals (IMPORTANT!) . Demonstrate equivalence • In some instances the goal is to show equivalence of, say, two treatments. • Failing to show a difference using a standard hypothesis test is usually not sufficient evidence of equivalence • Two strategies – Estimate difference and show ‘worst cases’ with confidence interval – Compute a standard hypothesis test with very good power (> 95%) Demonstrate associations Independent variable outcome variable dichotomous continuous categorical •Chi-square •Logistic regression •T-test/ANOVA •Linear regression continuous •Logistic regression •T-test/ANOVA (backwards) •Correlation •Linear regression •Scatterplots Prediction • Dichotomous outcome – Logistic regression* – Sensitivities, specificities† – ROC curves† (continuous predictor) • Continuous outcome – Linear regression* – “Leave one out” statistics or cross validation† * Predictive model building † assessing predictive model Reliability/Agreement • Kappa statistic is commonly used for categorical data and two raters. • Intra-class correlation coefficient for multiple raters • If you have a ‘gold standard’ it makes the most sense to tabulate percent correct or average distance from correct. more Reliability/Agreement • If trying to demonstrate agreement between two continuous measures the correlation coefficient is tangential at best • Better to tabulate statistics related to mean pairwise differences between judges • See – Bland JM, Altman DG. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, i, 307310. – Available at http://www-users.york.ac.uk/~mb55/meas//ba.htm Other? • Time-to-event data – Kaplan-Meier survival estimate – Cox regression • Other other? Correlated Data Issues • Data consist of “clusters” of correlated observations. This is common in dental studies (many teeth from same mouth) • Common Solutions? – Collapse data to independent units (patientlevel averages) – Adjust for correlation using generalized estimating equations (GEE) or mixed model regression approaches Homework for Feb. 29 • Following the guidelines presented in class today, present a concise description of your study and planned data analysis to the class. • Plan to keep your talk under ____ minutes • Limited office hours will be available with myself and Dr. Kaiyala to help. Call or email us for appointments.