3 How to identify different types of data In the previous section, I gave a description of different types of data (e.g. metric, ordinal, nominal) that need to be categorized for SPSS (if not done automatically by the program itself). Further categorization of data is not included in the statistic programme but has to be made by the user to decide which statistical test to run. This decision is crucial, as a wrong statistical test will give false experimental results (e.g. finding differences between groups when there in fact are none or not finding differences between groups when in fact differences are there). To help with decision making on which statistical test to use, we will use a decision tree (a first example is below). You might find it useful to print out these decision trees instead of constantly scrolling back to an electronic version. This decision tree is for the comparison of 2 experimental groups. For example, a patient group and a control group, an unstimulated and stimulated cell culture or a patient group from which samples have been taken at 2 different time points. Throughout the course, we will work through each point of the decision tree, learning how to make the right decisions 1 and how to end up at the appropriate statistical test. Further on, I will introduce a more complex decision tree, for comparisons between more than 2 groups. Independent samples, dependent samples and replicates From the decision tree, the first decision or first question we will need to ask ourselves is: Are our data derived from independent or from dependent samples? An answer to this question will put you immediately on either the left half of the decision tree or the right half of the decision tree. From the decision tree, you can see that depending on the answer, we would end up using completely different statistical tests. Independent samples Let’s imagine we have 2 rats. One rat, we feed with a lot of cheese and it ends up weighing 15 kg (I know that no rat will weigh 15 kg…numbers are completely arbitrary). The other rat, we only feed a standard diet and it ends up weighing 5 kg. Clearly, these rats not only have been treated differently, but they are also two completely individual animals. They are Independent samples. Dependent (related) samples Let’s imagine we take our 15 kg rat which we have fed with a lot of cheese and put it on a diet. We measure the weight of this rat before the diet (15 kg) and measure the weight again after the diet and find it now weighs only 10 kg. 2 Clearly, this is the same rat, on which we have measured the weight at 2 different timepoints. The weight measurement is a dependent (or sometimes called related) sample. Replicates A third type of samples, which are not included in the decision tree, are replicates. In general, no replicates should be used in statistical analysis which is why they are not included in the decision tree. However, it is important to understand what replicates are to make sure to exclude them from the analysis. Let’s imagine we take our 15 kg rat again which we have fed with a lot of cheese and weigh it 5 times. Each time, our scale gives us a slightly different reading (due to a small error margin of the scale and likely the rat wiggling…). So instead of 1 measurement (e.g. n=1), we now have 5 measurements from the same rat. However, these measurements have not been taken at different times or after a different treatment, they are replicates. We cannot take these 5 measurements as individual samples (e.g. n=5), however we can take the average of the 5 measurements and improve the accuracy of our otherwise only 1 reading of the weight of the rat, compensating for the slight error margin of the scale. With individual animals or people, the decision making of dependent, independent samples or replicates is in general straight forward. In cell culture experiments (or similar experimental setups), this decision making can be much more challenging, which the following exercise will highlight. Exercise 3.1: Let’s imagine we are running some cell culture experiments using a monocyte cell line (e.g. all the cells come originally from one individual and are immortalized to keep replicating). We have three different treatment conditions (designated with different colours), each circle represents one well of a cell culture dish, each well contains hundreds of the same cells (e.g. a typical experimental cell culture set up): Treatment 1: Treatment 2: Treatment 3: 3 Please select your decision of sample type for the following different experimental setups: We have set up an experiment with three different treatment conditions. The samples are a) independent b) dependent c) replicates We have set up an experiment where we treat the cells with an enzyme and then look 24 h later how much of the cells the enzyme has destroyed. The samples are a) independent b) dependent c) replicates We have set up an experiment with three different treatment conditions, however we have set up each of the different conditions twice. The samples are a) independent b) dependent c) replicates We have set up an experiment with three different treatment conditions, however we have set up each of the different conditions twice. In addition, we have set up the whole experiment 3 different times to study 3 different time points. We are collecting the cells at 4 each of the timepoints, therefore we cannot just use one setup but have to use three (as otherwise there won’t be any cells left after each collection). The samples are a) independent b) dependent c) replicates We have set up an experiment with three different treatment conditions. We collect the supernatants of the cells (e.g. the medium in which the cells have been growing) and transfer 8 samples from each treatment condition supernatant to an ELISA plate (a test for proteins). The samples on the ELISA plate are a) independent b) dependent c) replicates Exercise 3.1 solution: A: We have three different, individual treatment conditions. The samples are independent. B: We have one well of cells, which we have monitored before and after 24 h. The cells are still the same, just analysed at different timepoints. The samples are dependent. C: We have three different, individual treatment conditions. These are independent samples. However, we have set up each treatment condition twice, using the same cells (e.g. genetically identical). So, are our two wells for each treatment condition independent samples or should we count them as replicates? My advice would be to count them as individual samples, as the cells of each well are grown apart from each other, possibly in comparison to for example using twins in human studies. However, because technically speaking, we are using the genetically identically cells, one could also argue that each treatment condition has been set up as replicates. This is a grey area and different laboratories will argue this point differently. The best thing to do is to go with whatever your laboratory or your research field routinely does. Most importantly, you have to be consistent in each experimental design and cannot change from one experiment to the next in regarding the duplicate treatment conditions as independent samples or replicates. D: Similar to B, but we are not monitoring the same cells over time. Our experiment had to be done in such a way that we set up new cells for each timepoint we are interested 5 in. So we could argue that the samples are independent. However, I suggest to still treat this experiment as dependent samples because the aim is to follow a development of the same cells over time and we only weren’t able to do this due to technical reasons. If our experiment would have been in rats, there would have been no issue in collecting for example new blood from the same rat at each timepoint. As rule of thumb, as soon as you are looking at something across time, treat your samples as dependent (e.g. related). E: We are collecting supernatants from 3 different treatment conditions, these treatments are independent. However, when we transfer 8 samples of the exact same supernatant for each condition over to the ELISA plate, we are creating replicates on the ELISA plates. These cannot be counted as individual samples and are only used to improve accuracy of the ELISA measurement (similar to weighing the rat several times to improve accuracy within the error margin of the scale). Metric and categorical data We have learned how to make a decision between independent and dependent samples. The next decision on the type of data in the decision tree is between metric and categorical data. This decision has to be made whether we are on the left or the right side of the decision tree. On the decision tree, the choice between metric and categorical data again immediately points towards completely different statistical tests that should be used. 6 Metric data Metric data is all data that can be measured on a scale and can take on any number on this scale. A typical example would be a ruler. Examples from the laboratory could be results from an ELISA, a Bradford protein assay, a cell proliferation assay, flow cytometry or Realtime RT-PCR. 7 Categorical data Categorical data (ordinal or nominal) is data which can only ever fall into certain categories. A typical examples would be age groups. A person could only ever be a child, a teenager or an adult, the categories are mutually exclusive of each other. Examples from the laboratory could be States of disease severity, Cancer classifications or Staining categories. 8