BIL 151: Enzymes & Enzymatic Reactions Data Analysis Once your team has collected an adequate sample size of raw data, you will be ready to do some calculations. This lab chapter provides a brief guide. I. Data Analysis Since you are comparing rates of reaction between two experimental groups, it is important to understand how to properly calculate and present reaction rates. You will then be able to perform a statistical test to see whether the average reaction rates of your two experimental groups are significantly different. Once your calculations are finished and your results are clear, your team must meet to discuss how to explain your observations, logically and completely, in your presentation. A. Calculating the Rate of Reaction Biological functions may take any number of shapes. In the case of the reaction rate of catalase breakdown of hydrogen peroxide, the function will form a linear relationship while the reaction is in progress. To analyze your results, you will calculate the slope of the linear functions. A slope, with units of y over x, is an expression of rate. A rate is an expression of change over an independent variable such as time. "Miles per hour," "millimeters per second," and "pizzas per semester" are all expressions of rate. When you measured the volume of oxygen gas generated during the breakdown of hydrogen peroxide by catalase, you obtained raw data (i.e., data straight from a measuring apparatus, which have not undergone any type of mathematical transformation) similar to that shown in Table 1. Table 1. The cumulative volume of oxygen generated by the hydrolysis of hydrogen peroxide by catalase under control conditions (pH 7.0, 25oC) time (seconds) zero 5 10 15 20 25 30 35 40 45 50 55 60 cumulative O2 volume (cc) 0 5 12 20 37 44 50 56 64 72 75 78 82 Each of the data points consists of an independent (time) and a dependent (cc O2) variable. These are specific coordinates, corresponding to x and y on a graph. The coordinates of the data in Table 1, from top to bottom, are (0,0), (5,5) (10,12), (15, 20), (20,37), Data Analysis - 1 (25, 44), (30,50), (35,56), (40, 64), (45, 72), (50,75), (55,78) and (60,82). (remember: data is the plural of datum) are plotted in Figure 1. These data B. Slope of the Line is Equal to the Reaction Rate The horizontal axis of a graph is known as the abscissa or x-axis. It is labeled with the units of the independent variable. The independent variable is so named because although it changes over the course of the experiment, it is not affected by changes in the experiment. Time is a commonly used independent variable. Its units may be seconds, minutes, hours, months, years, etc. The vertical axis is known as the ordinate or y-axis. It is labeled with the units of the dependent variable, which changes depending upon the progression of the independent variable. An example of a dependent variable is the change in oxygen volume generated by a chemical reaction over time (independent variable). Notice in Figure 1 that the straightest part of the function does not pass through (0,0). Evidently, this particular experimental run started slowly, then increased to a more consistent rate. The dotted line in the figure shows a somewhat "J" shaped relationship at the beginning of the experiment. The best fit line in Figure 1 does not necessarily pass through every data point. Rather, it should reflect the rate of the reaction at its optimum. To calculate the slope of a line (which corresponds to rate of reaction), determine the change in y (Δy) and divide it by the corresponding change in x (Δx). Because y results in a vertical change and x results in a horizontal change, you may recall that the calculation of slope is sometimes referred to as the calculation of "rise" over "run." Your rate will be expressed as the units of y over the units of x. rate = slope = Δy Δx Figure 1. Cumulative volume of oxygen generated by the hydrolysis of hydrogen peroxide by catalase under control conditions (pH 7.0, 25oC). Data Analysis - 2 In our example, the distance of rise (O2 generated) is plotted against time. Thus, the units of the rate are expressed in mm O2 (y axis) per second (x axis), or more simply, mm O2/sec. Because the slope of a straight line is the same no matter where it is measured, choose any two corresponding values of x and y. Once you have plotted your data points, study their relationship. Do they form a straight line? An "S" curve? A "J" curve? A parabola? Fortunately, you wonʼt have to do any guesswork, as the Vernier software you used to collect your data will also calculate the rate of your reaction. But if you were to determine the function by hand and eye, it would be important to note that itʼs not as simple as "connect the dots." Notice whether the reaction started slowly, picked up speed, and then leveled off. If this is the case, then your rate calculation will be less accurate if you include the more horizontal “start up” and “taper off” portions of such an "S" shaped curve. To calculate the rate of a reaction that is linear, use the points of the function that best approximate a straight line, when the reaction is proceeding at its maximum rate. 1. A best fit line through the data points has already been drawn. Notice that the line passes near (but not necessarily through) the points that appear to be most linear with respect to each other. Although the first data point should occur at (0,0), the best fit line does not pass through it, apparently because the reaction did not begin immediately at its most consistent rate (i.e., it took a moment to really get going). 2. Choose any two points along the line and determine their coordinates. For example, coordinates (35, 56) and (50, 75). 3. Subtract the smaller y value from the larger y value. This quantity is Δy, or "rise." In our example, Δy = 75 - 56 = 19. 3. Next, subtract the smaller x value from the larger x value. This quantity is Δx, or "run." In our example, Δx = 50 - 35 = 15. 4. Divide rise by run (Δy/Δx), being certain to include the units of each variable. The result is the slope of the line, which is equal to the rate of the reaction. 5. In our example, slope = 19 mm O2/15 seconds = 1.3 mm O2/sec. Your team no doubt ran several experimental trials for each of your variables. You should calculate a rate for each experimental trial in each of your groups (e.g., treatment and control). These rates can then be analyzed with a studentʼs t-test, which will tell you whether there is a significant difference between the mean rate of reaction between your experimental groups. C. Statistical Testing Probability calculations form the basis of one of the scientist's most important tools: the statistical test. Once data have been collected, it's not enough to merely "eyeball" them and say, “Eeeyup. This is different from what we expected! Something weird is going on here!" Investigators use statistics generated from their data sets to determine the likelihood that their results differ sufficiently from the expected results to conclude they are unlikely to have arisen as a matter of chance. Over the decades, many different probability distributions have been devised by mathematicians, each one appropriate for different types of data. Enough statistical tests and their associated probability distributions have been invented to fill many textbooks. Some of these, such as the Chi-square test, the Student t-test, the Data Analysis - 3 Analysis of Variance (ANOVA), the Mann-Whitney U test and the Fisher's exact test may sound familiar to you. The specific probability distribution and statistical test appropriate in a given situation depend upon the type of data collected and the nature of your hypothesis. One oft-utilized probability distribution is Student’s t-distribution, used to determine whether the observed difference between the means of two samples is unlikely to have arisen if they were in fact drawn from the same population (or from populations with identical means). there is a significant difference between the (continuous numerical) means of two groups under study. To make a very long and complex story short, an investigator can use the mean, variance, and standard deviation of his/her data sets to calculate a t-statistic. Every possible value of the t-statistic is linked to a certain probability that the observed difference in sample means is simply as a matter of chance. 1. Calculation of mean, variance and standard deviation Your rate of reaction means are a form of continuous numerical data. To analyze them correctly, you will need to determine their values for several important quantities: x = data point the individual values of a measured parameter (=xi) _ x = mean the average value of a measured parameter n = sample size the number of individuals in a particular test group df = degrees of freedom the number of independent quantities in a system s2 = variance a measure of individual data points' variability from the mean s = standard deviation the positive square root of the variance To calculate the mean rate of reaction of either the treatment or control group, sum the rates of all individual trials in a particular group and divide it by the number of trials. _ x = Σ xi n i=1 n When you are studying some measurable aspect of a sample of a population (such as the index : ring finger ratio), it is important to understand how much variation around the mean your sample exhibits. In biological systems in particular, there is almost always a great deal of variation around the mean. Variability is part of nature and there is nothing "wrong" with it. In many biological studies, the estimation of variances is as important, if not more important, than the mean. Natural variability is part of life, and understanding how biological systems vary is very important to measure and understand. Measurements of dispersion around the mean include the range, variance and standard deviation. The simplest of these is the range, which is defined as the highest value minus the lowest value. Unfortunately, the greater the sample size, the greater the range, and because it employs essentially only the two extreme values, a great deal of information about variation between those extremes is lost. More useful are the variance and standard deviation, which are measures of deviations from the mean. Data Analysis - 4 The variance (s2) is calculated as (If you're not sure what the symbols mean, go back and review the formula for the mean.) The standard deviation (s) is the square root of the variance, and is calculated as 2. The Student’s t-test The Student’s t-test can be used to determine whether a difference between two means is significant. Note that “significant” in this sense is NOT the same as “biologically meaningful.” It refers only to whether the observed difference is unlikely to be due to chance (“statistically significant”). These means may be calculated from observations that are either paired (as when individuals in a single group are subjected to "before and after" measurements, and data points are paired for each tested individual) or independent (as when individuals in two similar sample populations are measured, but each individual in each sample population is measured only once). Slightly different calculations of the t-statistic must be used in each case. If you take "before" and "after" measurements from the same experimental system, the two values obtained would not be independent of one another. Rather, they would be paired. Statistically, paired data must be analyzed differently than independent samples. In a paired sample t-test, the separate means of two different sample populations is not measured. Instead, the difference between the first measurement and second measurement of the same individual is calculated and used to generate a t-statistic. In the paired sample ttest, the underlying hypothesis is that the mean difference among all samples is zero. In your experiment, you ran each trial with new reagents in your two different systems (e.g., treatment and control). Thus, the means of your two sample populations (e.g., treatment and control) are not paired. They are independent because a single rate is calculated for each unique trial run. An independent sample t-test is appropriate for analysis of this type of data. In the independent sample t-test, the underlying hypothesis is that the difference between the two sample means is zero. (This is a subtle, but crucial, difference from the underlying hypothesis of the paired sample t-test.) Paired designs are always best because they eliminate the added influence of a difference in means arising from variation between samples due solely to their containing different individuals. However, in some experiments—including the ones you performed with yeast and hydrogen peroxide--is it simply not possible to use a paired design because the individual is permanently changed in the process of the experiment. To analyze your data, you will use an independent sample t-test. The critical values for the t-distribution are the same for either paired or independent samples, and the table of critical values (Table 2) can be used for either one to determine the P value associated with your t-statistic at the degrees of freedom in your system. Data Analysis - 5 Use the independent sample t-test to calculate a t-statistic for your two means: ...in which x1 and and x2 are the means of your two groups, n1 and n2 are the numbers of trials you ran in each group, and sp2 is the pooled variance. Pooled variance is calculated as: ...in which s12 is the variance of group 1, s22 is the variance of group 1, df1 is the degrees of freedom for group 1 (df1 = n1 - 1) and df2 is the degrees of freedom for group 2 (df2 = n1 - 1) The degrees of freedom for a two-sample t-test with independent means is calculated as the sum of the degrees of freedom of each test group: df = (n1 - 1) + (n2 - 1) What is your t-statistic? What are your degrees of freedom? What is the P value associated with your t-statistic and degrees of freedom? (Use the table of critical values for the t-test from Lab #1.) > P > D. Drawing Conclusions When all teams have analyzed their data and drawn conclusions about their results, your lab instructor will lead a class discussion. Answer the following carefully. Does your P value indicate that your two means are sufficiently different from one another that the difference is probably not due to chance? Do you accept or reject your null hypothesis? Briefly, what is your groupʼs conclusion about your experiment and original question you posed about your experimental system? Data Analysis - 6 II. Data Presentation In a scientific presentation, a grid containing values corresponding to data is known as a table. A photograph, line drawing or graph is known as a figure. Either type of graphic must be properly labeled (as a Table or a Figure) and informative legend. Note that the legend of a table is placed at the top of the graphic, whereas the legend of a figure is placed below. In our example above, Table 1 and Figure 1 say exactly the same thing. To include both representations of the same data in your presentation would be redundant. Choose only one. In this case, the figure provides more information, and is the better choice. If you ran replicate experiments in which you varied a single factor to determine whether that factor affected reaction rate, you must calculate reaction rate (slope) for each experimental trial. To graphically represent the difference between your two experimental groups, you might wish to create a figure in which you plot reaction rate as the dependent variable (y axis) against the independent variable (e.g., temperature, pH, chemical concentration) for each group (e.g., treatment and control). The resulting figures should show a relationship between reaction rate and the variable. If you are comparing reaction rate in a treatment and control system, then your figure will be more informative if both curves (treatment and control) are shown on the same graph, for comparison. Be sure to differentiate them clearly, using different symbols, colors, or whatever distinction your team thinks is effective. 1. The purpose of a figure is to allow your readers to more easily comprehend your data. Label your axes clearly with the appropriate units of measure. Figures in a PowerPoint presentation or a poster should be large, clearly labeled, and central to the presentation. 2. Each figure must be numbered and be accompanied by a descriptive legend placed underneath the figure. In a scientific paper, all figures must be referred to in the text of the paper. In a PowerPoint or Poster presentation, the figures should stand alone. More information about the nature of your research symposium presentation can be found in the next chapter of your online lab manual. Data Analysis - 7 Table 2-2. Table of critical values for the two-sample t-test. The P levels (0.05) indicating rejection of the null hypothesis are shown in bold for both one-tailed and two-tailed hypotheses. (From Pearson and Hartley in Statistics in Medicine by T. Colton, 1974. Little, Brown and Co., Inc. publishers.) 2-tail --> 1-tail --> df 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0.10 0.05 0.05 0.25 0.02 0.01 0.01 0.005 0.001 0.0005 6.314 2.920 2.353 2.132 2.015 1.934 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 3.169 3.106 3.055 3.012 2.977 2.947 2.921 2.898 2.878 2.861 2.845 2.831 2.819 2.807 2.797 2.787 2.779 2.771 2.763 2.756 2.750 636.619 31.598 12.941 8.610 6.859 5.959 5.405 5.041 4.781 4.587 4.437 4.318 4.221 4.140 4.073 4.015 3.965 3.922 3.883 3.850 3.819 3.792 3.767 3.745 3.725 3.707 3.690 3.674 3.659 3.646 Data Analysis - 8