like - University of Kansas Medical Center

advertisement
Introduction to Biostatistics
for Clinical and Translational
Researchers
KUMC Departments of Biostatistics & Internal Medicine
University of Kansas Cancer Center
FRONTIERS: The Heartland Institute of Clinical and Translational Research
Course Information
 Jo A. Wick, PhD
 Office Location: 5028 Robinson
 Email: jwick@kumc.edu
 Lectures are recorded and posted at
http://biostatistics.kumc.edu under ‘Events &
Lectures’
Objectives
 Understand the role of statistics in the scientific
process and how it is a core component of
evidence-based medicine
 Understand features, strengths and limitations of
descriptive, observational and experimental
studies
 Distinguish between association and causation
 Understand roles of chance, bias and
confounding in the evaluation of research
Course Calendar
 July 5: Introduction to Statistics: Core Concepts
 July 12: Quality of Evidence: Considerations for
Design of Experiments and Evaluation of Literature
 July 19: Hypothesis Testing & Application of
Concepts to Common Clinical Research Questions
 July 26: (Cont.) Hypothesis Testing & Application
of Concepts to Common Clinical Research
Questions
“No amount of experimentation can ever prove me
right; a single experiment can prove me wrong.”
Albert
Einstein
(1879-1955)
Vocabulary
Basic Concepts
 Statistics
is a collection of procedures and
principles for gathering data and analyzing
information to help people make decisions when
faced with uncertainty.
 In research, we observe something about the real
world. Then we must infer details about the
phenomenon that produced what we observed.
 A fundamental problem is that, very often, more
than one phenomenon can give rise to the
observations at hand!
Example: Infertility
Suppose you are concerned about the difficulties
some couples have in conceiving a child.
 It is thought that women exposed to a particular
toxin in their workplace have greater difficulty
becoming pregnant compared to women who are
not exposed to the toxin.
 You conduct a study of such women, recording the
time it takes to conceive.
Example: Infertility
 Of course, there is natural variability in time-to-
pregnancy attributable to many causes aside
from the toxin.
 Nevertheless, suppose you finally determine that
those females with the greatest exposure to the
toxin had the most difficulty getting pregnant.
Example: Infertility
 But what if there is a variable you did not consider
that could be the cause?
 No study can consider every possibility.
Example: Infertility
 It turns out that women who smoke while they are
pregnant reduce the chance their daughters will
be able to conceive because the toxins involved
in smoking effect the eggs in the female fetus.
 If you didn’t record whether or not the females had
mothers who smoked when they were pregnant,
you may draw the wrong conclusion about the
Smoking
industrial toxin.
Behaviors
of Mother
Environmental
Toxins
Natural Variability
Fertility
Example: Infertility
Lurking (Confounding) Variable → Bias
Unexposed
to Toxin
?
Exposed to
Toxin
Majority
unexposed to
smoke in
womb
Time-toconceive
measured
?
Majority
exposed to
smoke in
womb
Prolonged
time-toconceive
found
Type I Error!
Example: Infertility
Lurking (Confounding) Variable → “Noise”
Unexposed
to Toxin
?
Exposed to
Toxin
Some smoking
exposure
Time-toconceive
measured
?
Some smoking
exposure
An insignificant
change in
time-toconceive found
Type II Error!
The Role of Statistics
 The conclusions (inferences) we draw always come
with some amount of uncertainty due to these
unobserved/unanticipated issues.
 We must quantify that uncertainty in order to know
how “good” our conclusions are.
 This is the role that statistics plays in the scientific
process.
 P-values (significance levels)
 Level of confidence
 Standard errors of estimates
 Confidence intervals
 Proper interpretation (association versus causation)
The Role of Statistics
Scientists use statistical inference to help model the
uncertainty inherent in their investigations.
sam ple
x1
x2
?
population m odel
(im agination)
x3
population
(reality)
X
S
xn
goal: statistical in feren ce
(u n certa in ty m ea su red b y p ro b a b ility)
histogram
(observation)
Evidence-based Medicine
Evidence-based practice in medicine involves
 gathering evidence in the form of scientific data.
 applying the scientific method to inform clinical
practice, establishment or development of new
therapies, devices, programs or policies aimed at
improving health.
Types of Evidence
Scientific evidence: “empirical evidence, gathered
in accordance to the scientific method, which serves
to support or counter a scientific theory or
hypothesis”
 Type I: descriptive, epidemiological
 Type II: intervention-based
 Type III: intervention- and context-based
Evidence-based Medicine
 Evidence-based practice results in a high
likelihood of successful patient outcomes and
more efficient use of health care resources.
The Scientific Method
Observe
Revise
Experiment
Clinical Evaluation
Evidence
(Data)
Revise
Design &
Hypothesis
Run
Experiment
Types of Studies
 Purpose of research
1) To explore
2) To describe or classify
3) To establish relationships
4) To establish causality
Ambiguity
Control
 Strategies for accomplishing these purposes:
1) Naturalistic observation
2) Case study
3) Survey
4) Quasi-experiment
5) Experiment
Generating Evidence
Studies
Descriptive
Studies
Populations
Case
Reports
Analytic
Studies
Individuals
Case
Series
Cross
Sectional
Observational
Case
Control
Complexity and Confidence
Cohort
Experimental
RCT
Observation versus Experiment
 A designed experiment involves the investigator
assigning (preferably randomly) some or all
conditions to subjects.
 An observational study includes conditions that
are observed, not assigned.
Example: Heart Study
 Question: How does serum total cholesterol vary





by age, gender, education, and use of blood
pressure medication? Does smoking affect any of
the associations?
Recruit n = 3000 subjects over two years
Take blood samples and have subjects answer a
CVD risk factor survey
Outcome: Serum total cholesterol
Factors: BP meds (observed, not assigned)
Confounders?
Example: Diabetes
 Question: Will a new treatment help overweight




people with diabetes lose weight?
N = 40 obese adults with Type II (non-insulin
dependent) diabetes (20 female/20 male)
Randomized, double-blind, placebo-controlled
study of treatment versus placebo
Outcome: Weight loss
Factor: Treatment versus placebo
How to Talk to a Statistician?
 “It’s all Greek to me . . .”
Why Do I Need a Statistician?
 Planning a study
 Proposal writing
 Data analysis and interpretation
 Presentation and manuscript development
When Should I Seek a
Statistician’s Help?
 Literature interpretation
 Defining the research questions
 Deciding on data collection instruments
 Determining appropriate study size
What Does the Statistician Need
to Know?
 General idea of the research
 Specific Aims and hypotheses would be ideal
 What has been done before
 Literature review!
 Outcomes under consideration
 Study population
 Drug/Intervention/Device
 Rationale for the study
 Budget constraints
“No amount of experimentation can ever prove me
right; a single experiment can prove me wrong.”
Albert
Einstein
(1879-1955)
Vocabulary
 Hypotheses: a statement of the research question
that sets forth the appropriate statistical evaluation
 Null hypothesis “H0”: statement of no differences or
association between variables
 Alternative hypothesis “H1”: statement of differences
or association between variables
Disproving the Null
 If someone claims that all swans are white,
confirmatory evidence (in the form of lots of white
swans) cannot prove the assertion to be true.
 Contradictory evidence (in the form of a single
black swan) makes it clear the claim is invalid.
The Scientific Method
Observation
Hypothesis
Experiment
Results
Evidence
supports H
Evidence
inconsistent
with H
Revise H
Hypothesis Testing
 By hypothesizing that the mean response of a
P x
population is 26.3, I am saying that I expect the
mean of a sample drawn from that population
to be ‘close to’ 26.3:
24.5
25.0
25.5
26.0
26.5
x
27.0
27.5
28.0
Hypothesis Testing
 What if, in collecting data to test my hypothesis, I
P x
observe a sample mean of 26?
 What conclusion might I draw?
24.5
25.0
25.5
26.0
26.5
x
27.0
27.5
28.0
Hypothesis Testing
 What if, in collecting data to test my hypothesis, I
P x
observe a sample mean of 27.5?
 What conclusion might I draw?
24.5
25.0
25.5
26.0
26.5
x
27.0
27.5
28.0
Hypothesis Testing
 What if, in collecting data to test my hypothesis, I
P x
observe a sample mean of 30?
 What conclusion might I draw?
24.5
25.0
25.5
26.0
26.5
x
27.0
27.5
28.0
Hypothesis Testing
 If the observed sample mean seems odd or
unlikely under the assumption that H0 is true, then
we reject H0 in favor of H1.
 We typically use the p-value as a measure of the
strength of evidence against H0.
P x
What is a P-value?
A p-value is
thetheprobability
area under
of the
getting
curve
a
The tail of the distribution it is in is
for Ifvalues
sample
H states
mean
of the
that
assample
favorable
the mean
mean
orismore
more
greater
determined
If H1 1Null
states
by that
H1. the mean is less than
distribution
extreme
favorable
than
If H
26.3,
than
states
tothewhat
Hp-value
than
we
theisobserved
mean
what
as shown.
iswas
different
in
1that
26.3, 1the p-value
is the area to the left
the sample
observed,
than assuming
26.3,
we actually
the Hp-value
is true.is twice the
0 gathered.
of the observed sample
mean.
area shown, accounting for the area
in both tails.
Observed sample mean
p-value
24.5
25.0
25.5
26.0
26.5
x
27.0
27.5
28.0
Vocabulary
 One-tailed hypothesis: outcome is expected in a
single direction (e.g., administration of
experimental drug will result in a decrease in
systolic BP)
 Two-tailed hypothesis: the direction of the effect
is unknown (e.g., experimental therapy will result in
a different response rate than that of current
standard of care)
Vocabulary
 Type I Error (α): a true H0 is incorrectly rejected
 “An innocent man is proven GUILTY in a court of law”
 Commonly accepted rate is α = 0.05
 Type II Error (β): failing to reject a false H0
 “A guilty man is proven NOT GUILTY in a court of law”
 Commonly accepted rate is β = 0.2
 Power (1 – β): correctly rejecting a false H0
 “Justice has been served”
 Commonly accepted rate is 1 – β = 0.8
Decisions
Truth
Conclusion
H1
H0
H1
Correct: Power
Type I Error
H0
Type II Error
Correct
Statistical Power
 Primary factors that influence the power of your
study:
 Effect size: as the magnitude of the difference you wish
to find increases, the power of your study will increase
 Variability of the outcome measure: as the variability
of your outcome decreases, the power of your study will
increase
 Sample size: as the size of your sample increases, the
power of your study will increase
Statistical Power
 Secondary factors that influence the power of your
study:
 Dropouts
 Nuisance variation
 Confounding variables
 Multiple hypotheses
 Post-hoc hypotheses
Hypothesis Testing
 We will cover these concepts more fully when we
discuss Hypothesis Testing and Quality of
Evidence
Descriptive Statistics
Field of Statistics
Statistics
Descriptive
Statistics
Experimental
Design
Inferential Statistics
Methods for
processing,
summarizing,
presenting and
describing data
Techniques for
planning and
conducting
experiments
Evaluation of the
information
generated by an
experiment or
through observation
Field of Statistics
Statistics
Descriptive
Graphical
Experimental
Design
Numerical
Inferential
Estimation
Hypothesis
Testing
Field of Statistics
 Descriptive statistics
 Summarizing and describing the data
 Uses numerical and graphical summaries to characterize
sample data

 Inferential statistics
 Uses sample data to make
conclusions about a broader
range of individuals—a
population—than just those who
are observed (a sample)
population
The principal way to guarantee that the sample
sample
Field of Statistics
 Experimental Design
 Formulation of hypotheses
 Determination of experimental conditions, measurements,
and any extraneous conditions to be controlled
 Specification of the number of subjects required and the
population from which they will be sampled
 Specification of the procedure for assigning subjects to
experimental conditions
 Determination of the statistical analysis that will be
performed
Descriptive Statistics
 Descriptive statistics is one branch of the field of
Statistics in which we use numerical and graphical
summaries to describe a data set or distribution of
observations.
Statistics
Descriptive
Graphs
Statistics
Inferential
Hypothesis
Testing
Interval
Estimates
Types of Data
 All data contains information.
 It is important to recognize that the hierarchy
implied in the level of measurement of a variable
has an impact on
(1) how we describe the variable data and
(2) what statistical methods we use to analyze it.
Levels of Measurement
 Nominal: difference
discrete qualitative
 Ordinal: difference, order
 Interval: difference, order, equivalence of intervals
continuous quantitative
 Ratio: difference, order, equivalence of intervals,
absolute zero
Types of Data
NOMINAL
ORDINAL
INTERVAL
RATIO
Information increases
Ratio Data
 Ratio
measurements
provide
the
most
information about an outcome.
 Different values imply difference in outcomes.
 6 is different from 7.
 Order is implied.
 6 is smaller than 7.
Ratio Data
 Intervals are equivalent.
 The difference between 6 and 7 is the same as the
difference between 101 and 102.
 Zero indicates a lack of what is being measured.
 If item A weighs 0 ounces, it weighs nothing.
Ratio Data
 Ratio
measurements
provide
the
most
information about an outcome.
 Can make statements like: “Person A (t = 10
minutes) took twice as long to complete a task as
Person B (t = 5 minutes).”
 This is the only type of measurement where
statements of this nature can be made.
 Examples: age, birth weight, follow-up time, time
to complete a task, dose
Interval Data
Interval measurements are one step down on the
“information” scale from ratio measurements.
 Difference and order are implied and intervals
are equivalent.
 BUT, zero no longer implies an absence of the
outcome.
 What is the interpretation of 0C? 0K?
 The Celsius and Fahrenheit scales of temperature are
interval measurements, Kelvin is a ratio measurement.
Interval Data
Interval measurements are one step down on the
“information” scale from ratio measurements.
 You can tell what is better, and by how much, but
ratios don’t make sense due to the lack of a
‘starting point’ on the scale.
 60F is greater than 30F, but not twice as hot since 0F
doesn’t represent an absence of heat.
 Examples: temperature, dates
Ordinal Data
 Ordinal measurements are one step down on the
“information” scale from interval measurements.
 Difference and order are implied.
 BUT, intervals are no longer equivalent.
 For instance, the differences in performance between the
1st and 2nd ranked teams in basketball isn’t necessary
equivalent to the differences between the 2nd and 3rd
ranked teams.
 The ranking only implies that 1st is better than 2nd, 2nd is
better than 3rd, and so on . . . but it doesn’t try to quantify
the ‘betterness’ itself.
Ordinal Data
Ordinal measurements are one step down on the
“information” scale from interval measurements.
 Examples: Highest level of education achieved,
tumor grading, survey questions (e.g., likert-scale
quality of life)
Nominal Data
Nominal measurements collect the least amount of
information about the outcome.
 Only difference is implied.
 Observations
are classified into mutually
exclusive categories.
 Examples:
Gender, ID numbers, pass/fail
response
Levels of Measurement
 It is important to recognize that the hierarchy
implied in the level of measurement of a variable
has an impact on
(1) how we describe the variable data and
(2) what statistical methods we use to analyze it.
 The levels are in increasing order of mathematical
structure—meaning that more mathematical
operations and relations are defined—and the
higher levels are required in order to define some
statistics.
Levels of Measurement
 At the lower levels, assumptions tend to be less
restrictive and the appropriate data analysis
techniques tend to be less sensitive.
 In general, it is desirable to have a higher level
of measurement.
 A summary
of the appropriate statistical
summaries and mathematical relations or
operations is given in the next table.
Levels of Measurement
Level
Statistical Summary
Mathematical
Relation/Operation
Nominal
Mode
one-to-one transformations
Ordinal
Median
monotonic transformations
Interval
Mean, Standard Deviation
positive linear transformations
Ratio
Geometric Mean, Coefficient of Variation
multiplication by c  0
We must know where an outcome falls on the measurement scale--this not only
determines how we describe the data (descriptive statistics) but how we analyze it
(inferential statistics).
Using Graphs to Describe Data
 Nominal and ordinal measurements are discrete
and qualitative, even if they are represented
numerically.
 Rank: 1, 2, 3
 Gender: male = 1, female = 0
 We typically use frequencies, percentages, and
proportions to describe how the data is distributed
among the levels of a qualitative variable.
 Bar and pie charts are even more useful.
Example: Myopia
 A survey of n = 479 children found that those who
had slept with a nightlight or in a fully lit room
before the age of 2 had a higher incidence of
nearsightedness later in childhood.
No Myopia
Darkness 155 (90%)
Nightlight 153 (66%)
Full Light 34 (45%)
Total 342 (71%)
Myopia
15 (9%)
72 (31%)
26 (48%)
123 (26%)
High
Myopia
2 (1%)
7 (3%)
5 (7%)
14 (3%)
Total
172 (100%)
232 (100%)
75 (100%)
479 (100%)
Example: Myopia
High
Some
Full Light
None
Nightlight
Darkness
0
10
20
30
40
50
60
70
80
90
100
Example: Myopia
 As the amount of sleep time light increases, the
incidence of myopia increases.
 This study does not prove that sleeping with the
light causes myopia in more children.
 There may be some confounding factor that isn’t
measured or considered-possibly genetics.
 Children whose parents have myopia are more likely to
suffer from it themselves.
 It’s also possible that those parents are more likely to
provide light while their children are sleeping.
Example: Nausea
 How many subjects experienced drug-related
nausea?
Dose Nausea No Nausea
0 mg
0
9
10 mg
1
10
20 mg
3
10
50 mg
3
11
12
10
8
6
4
2
0
Nausea
0 mg
10 mg
No Nausea
20 mg
50 mg
Example: Nausea
 With unequal sample sizes across doses, it is
more meaningful to use percent rather than
100%
frequency.
Dose
0 mg
10 mg
20 mg
50 mg
Nausea
0 (0%)
1 (9%)
3 (23%)
3 (21%)
No Nausea
9 (100%)
10 (91%)
10 (77%)
11 (79%)
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Nausea
0 mg
10 mg
No Nausea
20 mg
50 mg
30
Bar & Pie Charts
Percent
Caucasian
30
African American
20
Hispanic
17
Asian American
13
Native
American
Ethnicity
13
Other
7
5
10
15
20
25
Race
Other
0
Native American
Caucasian
African American
Hispanic
Asian American
Native American
Caucasian
Other
Asian American
Hispanic
African American
Using Graphs to Describe Data
 Interval and Ratio variables are continuous and
quantitative and can be graphically and
numerically represented with more sophisticated
mathematical techniques.
 Height
 Survival Time
 We typically use means, standard deviations,
medians, and ranges to describe how the
variables tend to behave.
 Histograms and boxplots are even more useful.
Example: Time-to-death
 Suppose that we record the variable x = time-toTime
20
10
0
Frequency
30
40
death of n = 100 patients in a study.
0
5
10
x
15
Example: Time-to-death
 We can quickly observe several characteristics of
the data from the histogram:
 For most subjects, death occurred between 0 and 5
months
 For a few subjects, death occurred past 15 months
 From this picture, we may wish to identify the
distinguishing characteristics of the individuals with
unusually long times.
Example: Weight
 Suppose we record the weight in pounds of n =
100 subjects in a study.
IQ R
Q1
Q2
Q1 - 1.5 IQ R
Q3
Q 3  1.5 IQ R
*
*
o u tlie r
o u tlie r
x
Example: Tooth Growth
 Boxplots represent the
same information, but
are more useful for
comparing
characteristics
between several data
sets.
 Right: distributions of
tooth growth for two
supplements and three
dose levels
Using Numbers to Describe Data
 Nominal and ordinal measurements are discrete
and qualitative, even if they are represented
numerically.
 Rank: 1, 2, 3
 Gender: male = 1, female = 0
 Interval and Ratio variables are continuous and
quantitative and can be graphically and
numerically represented with more sophisticated
mathematical techniques.
 Height
 Survival Time
Using Numbers to Describe Data
 Nominal
and ordinal measurements are
qualitative, even if they are represented
numerically.
 We typically describe qualitative data using frequencies
and percentages in tables.
 Measures of central tendency and variability don’t
make as much sense with categorical data, though the
mode can be reported.
Describing Data
 Interval and ratio measurements are quantitative.
When dealing with a quantitative measurements,
we typically describe three aspects of its
distribution.
 Central tendency:
a single value around which data
tends to fall.
 Variability: a value that represents how scattered the
data is around that central value--large values are
indicative of high scatter.
 We also want to describe the shape of the distribution of
the sample data values.
Central Tendency
Mean: arithmetic average of data
Median: approximate middle of data
Mode: most frequently occurring
value
location
Central Tendency
 Mode, Mo
 The most frequently occurring value in the data set.
 May not exist or may not be uniquely defined.
 It is the only measure of central tendency that can be
used with nominal variables, but it is also meaningful for
quantitative variables that are inherently discrete (e.g.,
performance of a task).
 Its sampling stability is very low (i.e., it varies greatly
from sample to sample).
Central Tendency: Mode
0.10
0.05
0.00
Density
0.15
0.20
Histogram of x
0
5
10
x
Mo
15
Central Tendency: Mode
Females
Mo
Males
0
2
4
6
8
10
12
14
16
Central Tendency
 Median, M
 The middle value (Q2, the 50th percentile) of the variable.
 It is appropriate for ordinal measures and for skewed
interval or ratio measures because it isn’t affected by
extreme values.
 It’s unaffected (robust to outliers) because it takes into
account only the relative ordering and number of
observations, not the magnitude of the observations
themselves.
 It has low sampling stability.
Example: Median
 Suppose we have a set of observations:
1 2 2 4
 The median for this set is M = 2.
 Now suppose we accidentally mismeasured the
last observation:
1 2 2 9
 The median for this new set is still M = 2.
Central Tendency: Median
0.10
0.05
0.00
Density
0.15
0.20
Histogram of x
0
5
10
x
Mo M
15
Central Tendency
 Mean, x
 The arithmetic average of the variable x.
 It is the preferred measure for interval or ratio variables
with relatively symmetric observations.
 It has good sampling stability (e.g., it varies the least
from sample to sample), implying that it is better suited
for making inferences about population parameters.
 It is affected by extreme values because it takes into
account the magnitude of every observation.
 It can be thought of as the center of gravity of the
variable’s distribution.
Example: Mean
 Suppose we have a set of observations:
1 2 2 4
 The median for this set is M = 2, the mean is
x  2 .2 5 .
 Now suppose we accidentally mismeasured the
last observation:
1 2 2 9
 The median for this new set is still M = 2, but the
new mean is x  3 .5 .
Central Tendency: Median
0.10
0.05
0.00
Density
0.15
0.20
Histogram of x
0
5
Mo M
10
x
x
15
Variability
Range: difference between min
and max values
Standard deviation: measures the
spread of data about the mean,
measured in the same units as the
data
spread
Variability
 Measures of variability depict how similar
observations of a variable tend to be.
 Variability of a nominal or ordinal variable is rarely
summarized numerically.
 The more familiar measures of variability are
mathematical, requiring measurement to be of the
interval or ratio scale.
Variability
 Range, R
 The distance from the minimum to the maximum
observation.
 Easy to calculate.
 Influenced by extreme values (outliers).
1 2 3 4 10  R = 10 - 1 = 9
1 2 3 4 100  R = 100 - 1 = 99
Variability
 Interquartile Range, IQR
 The distance from the 1st quartile (25th percentile) to the
3rd quartile (75th percentile), Q3 - Q1.
 Unlike the range, IQR is not influenced by extreme
values.
Variability: IQR
IQ R
Q1
Q2
Q1 - 1.5 IQ R
Q3
Q 3  1.5 IQ R
*
*
o u tlie r
o u tlie r
x
Variability
 Standard deviation, s
 Represents the average spread of the data around the
mean.
 Expressed in the same units as the data.
 “Average deviation” from the mean.
Variability
 Variance, s2
 The standard deviation squared.
 “Average squared deviation” from the mean.
Shape
shape
Distribution Shapes
Summary
 Basic Concepts
 Definition and role of statistics
 Vocabulary lesson
• Brief introduction to Hypothesis Testing
• Brief introduction to Design concepts
 Descriptive Statistics
 Levels of Measurement
 Graphical summaries
 Numerical summaries
 Next time: Study Design Considerations and
Quality of Evidence
Download