3 How to identify different types of data In the previous section, I

advertisement
3 How to identify different types of data
In the previous section, I gave a description of different types of data (e.g. metric, ordinal,
nominal) that need to be categorized for SPSS (if not done automatically by the program
itself). Further categorization of data is not included in the statistic programme but has to
be made by the user to decide which statistical test to run. This decision is crucial, as a
wrong statistical test will give false experimental results (e.g. finding differences between
groups when there in fact are none or not finding differences between groups when in fact
differences are there). To help with decision making on which statistical test to use, we will
use a decision tree (a first example is below). You might find it useful to print out these
decision trees instead of constantly scrolling back to an electronic version.
This decision tree is for the comparison of 2 experimental groups. For example, a patient
group and a control group, an unstimulated and stimulated cell culture or a patient group
from which samples have been taken at 2 different time points. Throughout the course, we
will work through each point of the decision tree, learning how to make the right decisions
1
and how to end up at the appropriate statistical test. Further on, I will introduce a more
complex decision tree, for comparisons between more than 2 groups.
Independent samples, dependent samples and replicates
From the decision tree, the first decision or first question we will need to ask ourselves is:
Are our data derived from independent or from dependent samples? An answer to this
question will put you immediately on either the left half of the decision tree or the right half
of the decision tree. From the decision tree, you can see that depending on the answer, we
would end up using completely different statistical tests.
Independent samples
Let’s imagine we have 2 rats. One rat, we feed with a lot of cheese and it ends up weighing
15 kg (I know that no rat will weigh 15 kg…numbers are completely arbitrary). The other rat,
we only feed a standard diet and it ends up weighing 5 kg.
Clearly, these rats not only have been treated differently, but they are also two completely
individual animals. They are Independent samples.
Dependent (related) samples
Let’s imagine we take our 15 kg rat which we have fed with a lot of cheese and put it on a
diet. We measure the weight of this rat before the diet (15 kg) and measure the weight
again after the diet and find it now weighs only 10 kg.
2
Clearly, this is the same rat, on which we have measured the weight at 2 different
timepoints. The weight measurement is a dependent (or sometimes called related) sample.
Replicates
A third type of samples, which are not included in the decision tree, are replicates. In
general, no replicates should be used in statistical analysis which is why they are not
included in the decision tree. However, it is important to understand what replicates are to
make sure to exclude them from the analysis.
Let’s imagine we take our 15 kg rat again which we have fed with a lot of cheese and weigh
it 5 times. Each time, our scale gives us a slightly different reading (due to a small error
margin of the scale and likely the rat wiggling…).
So instead of 1 measurement (e.g. n=1), we now have 5 measurements from the same rat.
However, these measurements have not been taken at different times or after a different
treatment, they are replicates. We cannot take these 5 measurements as individual samples
(e.g. n=5), however we can take the average of the 5 measurements and improve the
accuracy of our otherwise only 1 reading of the weight of the rat, compensating for the
slight error margin of the scale.
With individual animals or people, the decision making of dependent, independent samples
or replicates is in general straight forward. In cell culture experiments (or similar
experimental setups), this decision making can be much more challenging, which the
following exercise will highlight.
Exercise 3.1:
Let’s imagine we are running some cell culture experiments using a monocyte cell line (e.g.
all the cells come originally from one individual and are immortalized to keep replicating).
We have three different treatment conditions (designated with different colours), each
circle represents one well of a cell culture dish, each well contains hundreds of the same
cells (e.g. a typical experimental cell culture set up):
Treatment 1:
Treatment 2:
Treatment 3:
3
Please select your decision of sample type for the following different experimental setups:
We have set up an experiment with three different treatment
conditions. The samples are
a) independent
b) dependent
c) replicates
We have set up an experiment where we treat the cells with an enzyme
and then look 24 h later how much of the cells the enzyme has
destroyed. The samples are
a) independent
b) dependent
c) replicates
We have set up an experiment with three different treatment
conditions, however we have set up each of the different conditions
twice. The samples are
a) independent
b) dependent
c) replicates
We have set up an experiment with three different treatment
conditions, however we have set up each of the different conditions
twice. In addition, we have set up the whole experiment 3 different
times to study 3 different time points. We are collecting the cells at
4
each of the timepoints, therefore we cannot just use one setup but have
to use three (as otherwise there won’t be any cells left after each
collection). The samples are
a) independent
b) dependent
c) replicates
We have set up an experiment with three different treatment
conditions. We collect the supernatants of the cells (e.g. the medium in
which the cells have been growing) and transfer 8 samples from each
treatment condition supernatant to an ELISA plate (a test for proteins).
The samples on the ELISA plate are
a) independent
b) dependent
c) replicates
Exercise 3.1 solution:
A: We have three different, individual treatment conditions. The samples are
independent.
B: We have one well of cells, which we have monitored before and after 24 h. The cells
are still the same, just analysed at different timepoints. The samples are dependent.
C: We have three different, individual treatment conditions. These are independent
samples. However, we have set up each treatment condition twice, using the same cells
(e.g. genetically identical). So, are our two wells for each treatment condition
independent samples or should we count them as replicates? My advice would be to
count them as individual samples, as the cells of each well are grown apart from each
other, possibly in comparison to for example using twins in human studies. However,
because technically speaking, we are using the genetically identically cells, one could
also argue that each treatment condition has been set up as replicates. This is a grey
area and different laboratories will argue this point differently. The best thing to do is to
go with whatever your laboratory or your research field routinely does. Most
importantly, you have to be consistent in each experimental design and cannot change
from one experiment to the next in regarding the duplicate treatment conditions as
independent samples or replicates.
D: Similar to B, but we are not monitoring the same cells over time. Our experiment had
to be done in such a way that we set up new cells for each timepoint we are interested
5
in. So we could argue that the samples are independent. However, I suggest to still treat
this experiment as dependent samples because the aim is to follow a development of
the same cells over time and we only weren’t able to do this due to technical reasons. If
our experiment would have been in rats, there would have been no issue in collecting
for example new blood from the same rat at each timepoint. As rule of thumb, as soon
as you are looking at something across time, treat your samples as dependent (e.g.
related).
E: We are collecting supernatants from 3 different treatment conditions, these
treatments are independent. However, when we transfer 8 samples of the exact same
supernatant for each condition over to the ELISA plate, we are creating replicates on the
ELISA plates. These cannot be counted as individual samples and are only used to
improve accuracy of the ELISA measurement (similar to weighing the rat several times to
improve accuracy within the error margin of the scale).
Metric and categorical data
We have learned how to make a decision between independent and dependent samples.
The next decision on the type of data in the decision tree is between metric and categorical
data. This decision has to be made whether we are on the left or the right side of the
decision tree.
On the decision tree, the choice between metric and categorical data again immediately
points towards completely different statistical tests that should be used.
6
Metric data
Metric data is all data that can be measured on a scale and can take on any number on this
scale. A typical example would be a ruler.
Examples from the laboratory could be results from an ELISA, a Bradford protein assay, a cell
proliferation assay, flow cytometry or Realtime RT-PCR.
7
Categorical data
Categorical data (ordinal or nominal) is data which can only ever fall into certain categories.
A typical examples would be age groups. A person could only ever be a child, a teenager or
an adult, the categories are mutually exclusive of each other. Examples from the laboratory
could be States of disease severity, Cancer classifications or Staining categories.
8
Download