Presentation - Feinberg School of Medicine

advertisement
Basic Biostatistics in Medical Research: What (Not) to Do
November 7, 2013
Leah J. Welty, PhD
Biostatistics Collaboration Center
Welcome to Basic Biostatics in Medical Research: What (Not) to Do. This is part of a bi-annual lecture
series presented by the Biostatistics Collaboration Center.
A laudable goal for today would be for you to come away understanding everything you might want or
need to know about biostatistics in medical research. Unfortunately, that’s highly unlikely in an hour,
especially with an audience of varied specialties and backgrounds. Even condensing introductory
biostatistics in to an hour long lecture would be impossible. So, rather than focusing on all the
background and methodology that’s out there, I will instead focus instead on areas in which people are
prone to making mistakes, incorrectly or inadequately applying biostatistics methods, or confused about
what their results mean.
This lecture is accordingly divided in to four sections:
1.
2.
3.
4.
A good picture is worth 1,000 words: the importance of statistical graphics
Not all observations are created independent
What is a p-value really?
How to collaborate with a biostatistician.
For those who get excited about this today, please come back next week. If you’re serious about
expanding your own biostatistics repertoire, there are a number of excellent biostatistics courses
offered by the graduate school. If instead you’re looking for some guidance on the methods appropriate
for your own research, I urge you to listen carefully in section 4, and consider visiting the Biostatistics
Collaboration Center.
I. A good picture is worth 1,000 words.
Statistical graphics can do two very important things: (1) guide appropriate choice of statistical
analyses; and (2) provide a powerful illustration of results. My first piece of advice to investigators is to
look at their data -- not to “fish” for results -- but to understand how individual variables are distributed
and best summarized. Then, once data have been (appropriately) analyzed and are being prepared for
publication, my next advice is to think about (creative) ways to graphically display results.
A. Graphics guiding appropriate analysis choices
Example 1: Correlation and Anscombe’s Quartet
1
Correlation measures the strength of the linear association between two variables. It is often denoted
by “r” and takes values between -1 and 1. The values have the following interpretations:
r near -1: Strong negative linear association
r near 0: No linear association
r near 1: Strong positive linear association.
Suppose I have two variables A and B, and I tell you that their correlation is 0.82. What impression does
that make? Hopefully that A and B are fairly strongly linearly associated. The picture we associate with
this relationship might look something like what is shown below (Figure 1), where in fact the Variables A
and B have a correlation of 0.82.
Figure 1:
However, it’s also possible for the relationship between the two variables A and B to actually be much
different, but for their correlation to still be 0.82.
First, the variables A and B may be related in a non-linear fashion. Figure 2 illustrates variables A and B
which are quadratically related. Although r = 0.82 because there is still linear trend, correlation is not
an accurate description of the strength of the relationship.
2
6
2
4
Variable B
8
10
Figure 2:
4
6
8
10
12
14
Variable A
Second, variables A and B may either have no relationship at all or we may not have adequate
information to capture the relationship, yet still r = 0.82. In Figure 3, for all but one observation, it
appears that A is completely unrelated to B, or at least the B may vary substantially without any change
in A.
10
6
8
Variable B
12
14
Figure 3:
5
10
15
20
Variable A
The single value on the right side of the plot is what we refer to as an “influential observation.”
Correlation is nororious for not being “robust” in the sense that it can depend heavily on just a few
observations. If I were presented with this data, I would recomment two courses of action: (1)
investigate the ‘influential point’ (is it an obvious mistake in coding or measurement?) and (2) if
3
possible, collect more data in which observations don’t all have the same values for A. Don’t throw-out
the influential observation unless it you can determine it was clearly an error (and not just an error in
the sense that it doesn’t match the rest of the data). Sometimes the most unusual observations end up
giving us the most insignt. It may well be the case that B increases as A increases, we just can’t
detemine that from this limited amount of inforatmion. Reporting r = 0.82 would be highly misleading
for this data.
Our third and final example involves another unusual observation. In Figure 4 below, variables A and B
appear to have a perfect linear relationship, minus one observation, and the correlation is 0.82. As
above, it would be wise to investige this observation a bit more -- is it an error, or are there some people
(or observational units) for which A and B don’t have the same relationship? It’s also incredibly rare to
see such a perfect linear relationship in practice, so I would also recommend investigating the points
that appear perfectly related as well.
10
6
8
Variable B
12
14
Figure 4:
4
6
8
10
12
14
Variable A
In only one of the above four cases was correlation a reasonable summary measure of the relationship
between A and B. Have you ever computed a correlation coefficient without first making sure it’s an
apropriate summary measure?
As a final note, there are many kinds of correlation. We’ve been discussing the most common version,
known as Pearson correlation. Spearman rank correlation and Kendall’s tau are somewhat common as
well, but they do not measure strength of linear association.
Example 2: Means, Medians, and Days in Corrections
The most common summary measures of continuous data are undoubtedly the mean and the
associated standard deviation. However, the mean may not always be the most accurate or appropriate
summary statistic to associate with continuous data. Have you ever computed means and standard
4
deviations without actually looking at the distribution of the variable first? The next example illustrates
why that’s not such a good idea.
This example comes from some research I do with the group called Health Disparities and Public Policy
in the Department of Psychiatry and Behavioral Sciences. In particular, we have a prospective
longitudinal study of juvenile delinquents after detention. A number of participants cycle in and out of
the correctional system (jail, prison), and one of the measures we are interested in is the amount of
time they spend incarcerated.
For this mock data (a subsample of 1000 participants from an interview 5 years after detention as a
juvenile), we found that the average number of days spent in corrections during the past year was 84.
However, the median number of days in corrections in the past year was 0. Figure 5, below, illustrates
what’s going on. Over half the participants (544) had no correctional stays during the past year, and the
next largest chunk of participants (99) were in a correctional facility the entire year. The remaining
participants are distributed between 1 and 364 in a fairly uniform way. However, the 99 participants
who were in a correctional facility the entire time “pull” the mean to 84.
Figure 5:
The mean is not “robust” to outlying values, but the median is “robust.” The mean is actually the
balance point of the distribution: if you can imagine putting the histogram on a fulcrum, you’d need to
put the fulcrum at 84 to balance the two ends – those 99 values are ‘far away’ from the rest of the data,
which gives them disproportionate weight.
The lesson here is not to blindly compute means without first making sure that they’re an appropirate
summary meausre. If not, don’t be afraid to report the median. Just as the mean is generally reported
with the standard deviation, the median should be reported with the range and quartiles (often 25% and
75%, as well as the distance between the two).
5
As a final note, both the histograms below illustrate fake data in which both variables have a mean of
2.0. For the symmetric (and normally distributed) data on the left, the median is also 2. For the skewed
data on the right, the median is 1.4. For the data on the right, I would pause before reporting just the
mean and the standard deviation.
Figure 6:
B. Graphics providing a powerful illustration of results.
The graphics you use to generate helpful summaries of your data are generally not the same ones you’ll
want to use in presentations or publications. Figures are an exciting opportunity to convey your results
in a visual fashion, and they may be far more convincing than text or even the numbers themselves.
Unfortunately, programs like Microsoft Excel or Power Point don’t always provide good guidance on
what makes an effective figure for publication. Or, in the event that they can be coerced in to making a
nice figure, it’s not trivial to figure out how. Biostatisticians hardly ever use Excel to generate figures.
Common statistical programs such as R, SAS, Stata, and SPSS all have reasonble graphics packages that
can easily provide more appropriate graphical summaries.
The example below illustrates what’s possible with different options and increasing levels of
sophistication. As in the previous section, this example uses data on time incarcerated. The purpose of
the figure is to illustrate the racial/ethnic differences in time spent incarcerated.
The first example was created using Excel with the help of a graduate student who was highly proficient
in Excel from her former life as a management consultant. At first glance, are you overwhelmed by the
racial/ethnic differences in incarceration? Can you tell what these differences are? Does this type of
figure look familiar?
6
Although such figures are commonplace, there are a number of ways in which this figure doesn’t work.
Criticisms include: (1) the x-axis divides up a continuous variable – the number of months in corrections
– in to categories; (2) the horizontal lines are distracting; (3) perhaps most importantly, to understand
the relatiosnhip between race/ethnicity and months in corrections, you need to digest racial/ethnic
comparisons in six different categories -- the first and the last being the most relevant.
This second presentation of the exact same data was generated using Stata, with no alterations to the
standard boxplot command:
7
Side-by-side boxplots are a powerful way of conveying differences in the distributions of continuous
variables, but are sadly underused. The boxes constitute the middle 50% of the data (from the 25th to
the 75th percentile), the line within the boxes shows the median, and the whiskers reach to the upper
and lower limits of the data. In the case of non-Hispanic whites, some of the large observations are
considered ‘outliers’ (more the 1.5 times the length of the box beyond the 75th percentile), so they’re
shown as dots rather than included as parts of the whiskers.
It’s clear from looking at this boxplot that non-Hispanic whites are generally spending less time
incarcerated than minorities..
Finally, our last version, which is close to what was submitted for publication, shows a slightly different
version of the data:
Note that it’s not a conventional plot, but is effective at demonstrating racial/ethnic and sex differences
in two variables: (1) who had spent any time incarcerated, and (2) the length of those incarcerations.
This figure was generated using R, which is open source and freely available statistical software. It has
8
excellent graphics capabilities. For the technically inclined, it is certainly accessible. For others, know
that this is the sort of figure your friendly neighborhood biostatisican can create.
C. Good and bad examples of graphics.
Edward Tufte has written extensively (in very accessible language) and elegantly about what makes
good statistical graphics. Visit http://www.edwardtufte.com/tufte/ for more information. Much of
what follows in this section is influenced by his work.
Here are some ideas to keep in mind when you’re generating graphics:
1. Graphics should have maximum information with minimum ink. The ubiquitous tower and
antannea plots are horrible offenders in this category. One tower and antannea uses a lot of ink
to illustrate just two numbers. Why not a dot and a line instead? All the extra ink is distracting.
Only use color if color is necessary.
2. Graphics should have no more dimensions that exist in the data. The 3-d bar charts in Excel
may look fancy, but they’re horrible when it comes to actually reading information from the
plot.
3. Labels should be informative, not distracting, and have a sensible range.
4. Avoid pie charts (expecially the 3-d kind). Humans are horrible at comparing areas, and even
worse at comparing volume. I recommend bars instead (see the final figure in the previous
section). We’re better at comparing length.
If you’re looking for examples of good and creative graphics, check out the New York Times. Below is a
picture of an interactive plot illustrating how people spend their time. Although few of us have the
technical expertise to generate such a figure, it’s important to note that this is highly illustrative even
though it’s not what we’re used to seeing or have likely seen in a statitics textbook. It’s also worth
noting that the New York Times graphics department relies heavily on R for the first versions of many of
the graphics they create.
9
In contrast, the example on the left below comes from USA Today. It’s heavily leaden with what Tufte
refers to as “chart junk” – the gratuitous and cartoonish decoration of statistical graphics. The USA
Today example is particularly bad because it combines chartjunk with a pie chart that’s in perspective.
10
II. Not all observations are created independent.
The majority of methods taught in introductory and intermediate biostatistics courses assume that
observations are independent. However, in medical research especially, we encounter data in which
our observations are not independent. Examples of non-independent data include:
1.
2.
3.
4.
pre and post measurements on the same individuals or experimental units
measurements on cases and matched controls
longitudinal data (measurements taken repeatedly on the same individuals over time)
nested samples (e.g. in a random sample of hospitals, patients are sampled randomly within
each hospital)
In each of these cases, some measurements are more related to others. For example, measurements
within the same person are probably more similar than measurements across individuals.
Measurements on patients within the same hospital may be more similar than measurements on
patients treated at different hospitals.
Note that nearly all data has dependencies – for example one variable is associated with another. This
isn’t the type of dependency we mean here. Rather, the type of dependency we’re talking about comes
from dependencies introduced by how the data were sampled or collected.
It is key to recognize that there are different biostatistics methods for observations that arenot
independent. Using mehtods designed for independent data on observations that are not independent
can result in erroneous findings. Paired and dependent data can be very powerful, but as the following
example illustrates, it’s critical to choose the appropriate analysis method.
Example: Hodgkin’s and Tonsillectomy
This example comes from the early 1970s, and involves two studies investigating whether having a
tonsillectomy is associated with risk of Hodgkin’s Lymphoma. Although dated, the example uses a nice
assortment of biostatistics methods that are at least familiar and accessible to many people in the room.
The first study was published by Vianna, Greenwald, and Davies in 1971 in Lancet (citations at the end of
this section). They conducted a case-control study in which controls were unmatched. They recruited
101 Hodgkin’s patients, and 107 controls who were as a group similar to the controls but not matched
on an individual basis. The data are summarized in the following 2 x 2 table:
Hodgkins
Control
Tonsillectomy
67
43
11
No Tonsillectomy
34
64
An appropriate summary of the associaiton between the exposure (tonsillectomy) and the outcome
(Hodgkin’s) is the odds ratio. The odds ratio captures the association between two binary variables, and
is oft used in medical research. Note that because this was a case-control study, and the authors
recruited cases, we can’t say anything about the risk of disease in the exposed group compared to the
risk of disease in the unexposed group. You may have seen studies report a measure called ‘relative
risk,’ but we can’t do that here. One of the advantages of the odds ratio is that it is still valid in casecontrol studies where the participants are selected on the outcome of interest.
The odds ratio comparing disease in the exposed group to the unexposed group is simply the odds of
disease in the exposed group divided by the odds of disease in the unexposed group. In the
tonsillectomy group, the odds of Hodgkins are 67 to 43 or 67/43 – for every 67 Hodgkin’s cases, we have
43 controls. In the no tonsillectomy group, the odds are 34/67 – for every 34 Hodgkin’s cases, we have
64 controls. So the odds ratio is (67/43)/(34/64) = 2.93.
Odds ratios greater than one suggest that the exposure is promoting of outcome. Odds ratios less than
one suggest that the exposure is protective for the outcome. The odds ratio is always greater than zero.
An odds ratio of 2.93 is reasonably large, and suggests that having a tonsillectomy may be associated
with developing Hodgkin’s. However, the size alone is not enough to convince us of an association.
Instead, we need a test for association between the rows in the table (case/control status) and the
column (tonsillectomy/no tonsillectomy). The appropriate test should be familiar – it’s the chi-squared
test of homogeneity. It’s worth noting that we could also examine the association via logistic regression,
in which we model the log of the odds of disease, but that’s beyond the scope of today’s discussion.
The chi-squared test compares the observed data to what we would expect cell counts would be if the
exposure was independent of the outcome.
Observed data:
Hodgkins
Control
Tonsillectomy
67
43
110
No Tonsillectomy
34
64
98
101
107
208
No Tonsillectomy
47.6
50.4
98
101
107
208
Expected data (if rows and columns independent):
Hodgkins
Control
Tonsillectomy
53.4
56.6
110
The formula for the expected cell counts is (row total x column total)/total, but this is just as easy to
reason out. If there are 101 people with Hodgkin’s, and Hodgkin’s is unrelated to tonsillectomy, we
12
would expect that 101 x (110/208) would fall in the upper left hand corner because overall, 110/208 of
our participants have had tonsillectomies.
The chi-squared statistics is equal to ∑(expected – observed)2 / (expected), which has a chi-squared
distirbution with 1 degree of freedom if expected cell counts are all greater than 5 and the total number
of subjects is “large” (more than 20 or 40, in this case). For this example, the chi-squared statistic is
14.46, which has an associted p-value of 0.002. This is statistically significant evidence for an association
between Hodgkin’s and tonsillectomy.
The second study was published a year later in the New England Journal of Medicine, and reported on
another case-control study examining the same assocation. In this study, however, the cases and
controls were matched, and consisted of 85 pairs in which Hodgkin’s patients were matched to siblings
within 5 years of age and the same sex.
The data were summarized in a 2 x 2 table analogous to tables above:
Hodgkins
Control
Tonsillectomy
41
33
No Tonsillectomy
44
52
The associated odds ratio is (41/33)/(44/52) = 1.47 – not as large as what was observed before.
Furthermore, the associated chi-squared statistic was 1.53, with an associated p-value of 0.22. The
author’s concluded that their data failed to support the association published by Vianna, Greenwald,
and Davies, and their contradictory finding was reported in NEJM. But what’s wrong with the analysis?
The problem is that the analysis ignored the pairings in the data. Although the 2 x 2 table above is not
technically incorrect, it is incredibly misleading as it suggest that there are 170 independent
observations when there are only 85! The odds ratio is not correct at all. A much better table shows the
pairings:
Hodgkin’s
Tonsillectomy
26
7
Tonsillectomy
No Tonsillectomy
Sibling
No Tonsillectomy
15
37
We can think of each of the 85 pairs as falling in to one of four categories: (1) both had tonsillectomies;
(2) neither had tonsillectomies; (3) the sibling had a tonsillectomy and the Hodgkin’s patient did not; (4)
the Hodgkin’s pateint had a tonsillectomy but the sibling did not. Only the last two categories – the
discordant pairs – tell us anything about the association between Hodgkin’s and tonsillectomy.
The appropriate test in this case is to compare the percent of pairs in which the sibling had a
tonsillectomy but the Hodgkin’s patient did not (7/85 = 8%) to the percent of pairs in which the
Hodgkin’s had the tonsillectomy but the sibling did not (15/85 = 17%). If Hodgkin’s and tonsillectomy
13
were unrelated, we would expect the discordant pairs to split evenly between the two cateogies (so
about 11/85 = 13% in each group). Are our percentages different enough from what we would suspect
that we ought to be suspicious? The correct test in this case is called McNemar’s Test, and when applied
to this data, results in a p-value of 0.09. Although not statistically significant, it certainly does not shed
the doubt about the previous study that was originally reported. We could also use conditional logistic
regression to estimate an accurate odds ratio.
References:
Vianna, N. J., Greenwald, P., and Davies, J. N. P “Tonsillectomy and Hodgkin's disease—The lymphoid
tissue barrier.” Lancet i: 431–432, 1971.
Sandra K. Johnson, R.N., and Ralph E. Johnson, M.D. “Tonsillectomy History in Hodgkin's Disease” N Engl
J Med 1972; 287:1122-1125 November 30, 1972
Mathematical Statistics and Data Anlaysis, John A. Rice, Duxbury, 1995.
Final thoughts on paired versus independent data:
I find it useful to think about independent versus dependent data in terms of sources of variation.
Suppose we are alotted a total of 10 measurements of a quantity of interest. We can take these
measurements across 10 different people, 2 measurements each on 5 different people, 5
measurements each on 2 people, or 10 measurements all on the same person. As illustrated below,
when we have single measurements across many (independently sampled) people, we know a lot about
how a measurement varies across a population, but nothing about how it varies within a person. When
we take 10 (independent) measurements on the same person, we know a lot about how the
measurement varies within a person, but nothing about how it varies across a population. For the inbetween scenarios, we learn something about within person variation and something about across
person variation. These two scenarios are the ones in which we need to be especailly careful not to
treat the 10 measurements as independent.
14
You are probably already aware of some methods for data that aren’t independent – the paired t-test is
perhaps the most common. There are many other methods that are suitable for dependent
observations, such as generalized linear mixed models or conditional logistic regression. These are
standard practice in biostatistics and also relatively accessible to someone with intermediate
biostatistics knowledge. If you have dependencies in your observations and you’re not sure how to
account for it, be sure to consult your friendly neighborhood biostatistian. It’s important to remember
that paired data can actually be incredibly powerful, but you must do the analysis correctly.
III. What is a p-value really?
1. An illustrative example
There may be no more universally confusing or misunderstood idea in statistics that the p-value.
Although I’m not a poker player, I find it useful to think of interpreting p-values using the following
scenario:
Dr. X and I are playing poker.
Dr. X is winning. In fact, Dr. X’s last two hands were a flush and a straight.
I’m forced to wonder – is Dr. X. cheating?
This scenario leads me to set up a hypothesis test:
1. Suppose Dr. X is playing fairly (note that this is the opposite of what I suspect). This is called the
null hypothesis, or H0.
2. I observe the data: Dr. X’s next hand is two pair. It is critical to note that the data that led to the
generation of the hypothesis CANNOT be used to test this hypothesis. This would be ludicrous.
Unfortunately, people are temped to do it all the time. This is akin to “fishing” for results, and
will result in erroneous and unreplicable findings.
3. I next figure out the the probability of Dr. X having a hand that is 2 pair or better if the null
hypothesis is true (i.e. if Dr. X is indeed playing fairly). This is called the p-value, and for this
example is approximately 0.08.
4. If this probability is “small,” I conclude that my original supposition (the null hypothesis) might
not be right. This would lead me to believe that Dr. X is indeed cheating. In formal statistical
language, I reject the null hypothesis in favor of the alternative hypothesis. The alternative
hypothesis, H1 or Ha, is in most cases is the opposite of the null hypothesis.
If the probability is not “small,” I conclude that I don’t have sufficient evidence to reject the null
hypothesis. It is important to note that this is not the same as “accepting” the null hypothesis,
15
or showing that the null hypothesis is true. Dr. X may have indeed been cheating, we just didn’t
detect it.
It’s also important to note that we haven’t ‘proved’ anything, nor have we computed the probability
that Dr. X is cheating. Maybe Dr. X is just very lucky, and isn’t cheating at all. Or maybe Dr. X is clever
enough to cheat just enough that we don’t detect it as statistically significant. All we’ve discovered is
that Dr. X would get a hand of two pair or better only about 8% of the time if Dr. X were playing fairly.
One final note on this example. It’s important to define what “small” means before you actually
conduct the test. For most analyses, “small” means 0.05 (this is called the “alpha” level). Some clinical
trials may use 0.01, some epidemiological studies may use 0.10. The choice of “small” depends very
much on the end objectives of the analysis and should never be decided after looking at the results of
the test. This is again akin to fishing.
2. The p-value defined
The p-value is the probabiilty of the observed data (or of more ‘extreme’ data), under the assumption
that the null hypothesis is true.
p-value = Pr(data | H0)
This actually doesn’t tell us what we’d really like to know: the probability of our null hypothesis given
the data, or the probabiilty of the alternative hypothesis given the data – namely Pr(H0|data) or
Pr(H1|data).
If Dr. X is not cheating, we would expect Dr. X to get two pair or better less than 8% of time time. Note
that there is no Pr(Dr. X cheating | observe two pair).
3. A significance test
(Adopted from “The Null Ritual: What you always wanted to know about significance testing but were
afraid to ask.” Gigerenzer, G., Krauss, S., Vitouch, O. in The Sage Handbook of Quantitative
Methodology of the Social Sciences (2004). David Kaplan, Editor.)
Suppose you have a treatment that you suspect may alter performance on a task. You compare the
means of your control and experimental groups (say, 20 subjects per group). You use a simple
independent means t-test and your result is significant (t = 2.7, df = 18, p = 0.01).
H0: µ1 = µ2
H1: µ1 ≠ µ2
Please answer each of the following TRUE or FALSE:
1. You have disproved the null hypothesis (i.e. there is no difference between population
means).
2. You have found the probability of the null hypothesis being true.
3. You have proved your alternative hypothesis (i.e. that there is a difference between the
population means)
16
4. You can deduce the probability of the alternative hypothesis being true.
5. If you reject the null hypothesis, you know the probability that you are making the wrong
decision.
6. If the experiment were repeated thousands of times, you would obtain a significant result
about 99% of the time.
4. Final thoughts
A few parting thoughts to keep in mind about hypothesis testing:
1. Statistics can’t “prove” anything.
2. The p-value is not the probabilty of a hypothesis.
3. Unfortunately, we can reject the hypothesis that most p-values are interpreted correctly.
IV. How to collaborate with a biostatistician.
As I mentioned at the beginning, I am part of the Biostatistics Collaboration Center, housed in the
Department of Preventive Medicine. We’re a group of faculty and master’s level biostatisticians who
love collaborating with investigators. Here’s our mission statement:
The primary goal of the BCC is to collaborate and consult with FSM researchers in order to
produce studies and statistical analyses that ultimately result in funded grants, peer-reviewed
publications and presentations at professional meetings. Typically the best results come from
researchers and statisticians working hand-in-hand as collaborators in these activities.
We help investigators in a number of different ways, so I would encourage you to check out our website,
http://www.feinberg.northwestern.edu/sites/bcc/. We offer everything from free one-hour
consultations to helping develop proposals to long term collaborations in which our faculty members
become key co-investigators in research groups.
There are also a number of resources on how we can best help you:
Guideline Summary: Know what your biostatistician needs from you.
http://www.feinberg.northwestern.edu/sites/bcc/docs/StatsCollaborationGuideSummary.pdf
Part I: Preliminary Help (Grants and Power)
Prepare for your statistical collaboration pertaining to grant applications.
http://www.feinberg.northwestern.edu/sites/bcc/docs/PowerGuide.pdf
Part II: Database Issues
17
Collect and/or organize your data in the most effective way for statistical analysis.
http://www.feinberg.northwestern.edu/sites/bcc/docs/DataGuide.pdf
Part III: Analysis and Write-Up
Work efficiently with a biostatistician through the analysis and write-up phase.
http://www.feinberg.northwestern.edu/sites/bcc/docs/ProjectGuide.pdf
Investigators often think of us as the people to come to when a grant deadline is approaching and they
need a power calculation or sample size justificaiton, or as folks who can help once data has been
collected. Although some of what we do may appear mysterious, it’s important to remember that we’re
not magicians! Unfortunately, we can’t conduct a power analysis when we haven’t had a chance to
thoroughly learn about the science at hand, nor can we wave a statistical wand to salvage poorly
collected data. But we can do a lot when investigators come early in the grant development phase and
we participate fully in the planning and execution of the proposed research.
Here is a schematic for developing biomedical research, and how a biostatistian can help (with thanks to
Dr. Denise Scholtens):
Dr. Roger Peng, a biostatistian, and his collaborator Dr. Elizabeth Matsui, both Associate Professors at
Johns Hopkins Bloomberg School of Public Health, have written informative and entertaining posts on
how to collaborate with a biostatistian and in turn how to collaborate with a scientists:
http://simplystatistics.org/2013/10/08/the-care-and-feeding-of-the-biostatistician/
http://simplystatistics.org/2013/10/09/the-care-and-feeding-of-your-scientist-collaborator/
18
Next week’s lecture information:
Lecture #2 - Basic Biostatistics in Medical Research: Emerging Trends
Thursday, November 14th from 1:30-3pm
Lurie Hughes Auditorium
303 E. Superior St., Chicago
Contact information:
Biostatistics Collaboration <bcc@northwestern.edu>
http://www.feinberg.northwestern.edu/sites/bcc/
19
Download