Data Analysis Workshop

advertisement
Data Analysis Workshop
Chuck Spiekerman (cspieker@u)
Karl Kaiyala (kkaiyala@u)
Course Outline
• February 20
– How to describe your study
– Choosing an Analysis method
• March 13
– Student presentations of study designs and
data-analysis plans
• March 20
– Student presentations of data analyses
Describing your study
• Next session (3/13) we are asking you to
present a description of your planned study
• The next few slides give an outline of
suggested components of this description
• Attention to all these components should
help you (and/or a consultant) decide on
appropriate methods of statistical analysis
Study Design Description
• Specific Aims (what?)
• Background (why?)
• Previous work (who?)
*
• Study methods (how?)
– several components
*optional for student presentations
Specific Aims
• Describe the scientific question(s)
• Be specific and precise
• Stick to the study at hand
Background and Motivation
• Relevance of this research
– Existing knowledge
– Identify gap this research will fill
• Relate to specific aims
• If part of a larger study, where does this
study fit?
Study Methods Components
• Primary outcomes
• Study population
• Methods and procedures *
• Data analysis plan
*optional for student presentations
Primary Outcomes
• Precise definition of key measurement
(individual data item) of interest
• Justify why this outcome and not something
else.
– Relate to specific aim
• Details of collection can be left to methods
and procedures section
Study population
• How were the subjects selected?
– Exclusion and inclusion criteria
– Group classification?
– Matching?
– Randomization?
Data analysis plan
• Outline data analysis for each specific aim
• Make clear which procedures are being
used toward which aim
• Usually some simple tables and plots should
be sufficient
• Keep it simple
Forming an analysis plan
Two important questions
1. What do you want to do/show?
2. What kind of data …
i. …will answer your question best?
ii. … can you get?
iii. … do you have?
Types of data
• Continuous
– Differences between values have meaning, and
are interpretable independent of the values
themselves
– E.g. difference between 8 and 9 basically the
same as difference between 1 and 2.
• Ordinal
– Values have an order, but differences are not
easily interpretable (e.g. good, fair, poor)
Types of data (cont.)
• Categorical
– Values are descriptive but do not have any
obvious ordering. E.g. tx A, tx B, tx C.
• Binary, Dichotomous
– Fancy names for categorical variables with only
two possible values.
Types of data (sampling)
• one-sample
– Refers to situation when values of interest all
come from one group and will be compared to a
known quantity (e.g. “change greater than
zero”)
• two-sample
– When data are divided/sampled in two groups
and observed values compared between groups.
What do you want to do?
•
•
•
•
•
•
•
Show evidence of differences
Estimate population parameters
Demonstrate equivalence
Show evidence of association
Create/validate a predictive model
Assess agreement or reliability
Other?
Showing evidence of differences
• Standard hypothesis testing procedures, usually
comparing means or proportions
• Which test will depend on type of data. Usual
suspects (YMMV)
– T-test or ANOVA for Continuous data
– Chi-square test for Categorical data
– Rank-based tests (e.g. Wilcoxon) for Ordinal data
• Use Rosner flowchart for guidance
• Supplement p-value with estimate of difference
(with confidence interval)
Estimate Population Parameters
• P-values and hypothesis tests aren’t always
necessary
• Sometimes you don’t really want to
compare things but only estimate values
• Estimate parameters of interest and
supplement with confidence intervals
(IMPORTANT!) .
Demonstrate equivalence
• In some instances the goal is to show
equivalence of, say, two treatments.
• Failing to show a difference using a
standard hypothesis test is usually not
sufficient evidence of equivalence
• Two strategies
– Estimate difference and show ‘worst cases’
with confidence interval
– Compute a standard hypothesis test with very
good power (> 95%)
Demonstrate associations
Independent
variable
outcome variable
dichotomous
continuous
categorical
•Chi-square
•Logistic regression
•T-test/ANOVA
•Linear regression
continuous
•Logistic regression
•T-test/ANOVA
(backwards)
•Correlation
•Linear regression
•Scatterplots
Prediction
• Dichotomous outcome
– Logistic regression*
– Sensitivities, specificities†
– ROC curves† (continuous predictor)
• Continuous outcome
– Linear regression*
– “Leave one out” statistics or cross validation†
* Predictive model building
† assessing predictive model
Reliability/Agreement
• Kappa statistic is commonly used for
categorical data and two raters.
• Intra-class correlation coefficient for
multiple raters
• If you have a ‘gold standard’ it makes the
most sense to tabulate percent correct or
average distance from correct.
more Reliability/Agreement
• If trying to demonstrate agreement between two
continuous measures the correlation coefficient is
tangential at best
• Better to tabulate statistics related to mean
pairwise differences between judges
• See
– Bland JM, Altman DG. (1986). Statistical methods for assessing
agreement between two methods of clinical measurement. Lancet, i, 307310.
– Available at http://www-users.york.ac.uk/~mb55/meas//ba.htm
Other?
• Time-to-event data
– Kaplan-Meier survival estimate
– Cox regression
• Other other?
Correlated Data Issues
• Data consist of “clusters” of correlated
observations. This is common in dental
studies (many teeth from same mouth)
• Common Solutions?
– Collapse data to independent units (patientlevel averages)
– Adjust for correlation using generalized
estimating equations (GEE) or mixed model
regression approaches
Homework for Feb. 29
• Following the guidelines presented in class
today, present a concise description of your
study and planned data analysis to the class.
• Plan to keep your talk under ____ minutes
• Limited office hours will be available with myself
and Dr. Kaiyala to help. Call or email us for
appointments.
Download