APPENDIX B Statistics for Science Research: Hints, Tips, Examples Scientists analyze data collected in an experiment to look for patterns or relationships among variable. If we think we see a pattern or a relationship, we must complete one more step before we can be sure of the results. In order to determine that the patterns we observe are real, and not due to chance and our own preconceived notions, we must test the perceived pattern for significance. Statistical analysis allows scientists to test whether or not patterns are real, and not due to chance or preconceived notions of the observer. We can never be 100% sure, but we can set some level of certainty to our observations. A level of certainty accepted by most scientists is 95%. We will be using tests that allow us to say we are 95% confident in our results. STEP ONE Types of Data 1. Qualitative - data using non-standard scales (descriptions of leaf quality). Qualitative data are placed into categories that may be discrete categories represented by word or number labels. It can also be measurements made with a nonstandard scale with unequal intervals. objects are placed into categories that cannot be ranked NOMINAL DATA (male/female or brown, black, red hair) Levels of Measurement objects are placed into categories that can be ranked (Moh’s ORDINAL DATA hardness scale or animal activity ranked 1- 5) 2. Quantitative - measurements made using a scale with equal intervals (temp of water in Celsius degrees). Quantitative data consists of numbers representing counts or measurements. data collected using a scale with equal intervals and with an RATIO DATA absolute zero (temp, velocity) Levels of Measurement using a scale with equal intervals but no absolute zero (temp INTERVAL DATA change, pH) Decide which of the above types of data you have collected and record here: ____________________________ STEP TWO Descriptive Statistics Type of Descriptive Statistic Quantitative Interval Central Tendency - the # most typical Mean Variation - spread of data Range Variance Standard Deviation Mode Qualitative Ratio Nominal Mode Ordinal Median Frequency Distribution value that occurs most often (in a tie, use both) Median middle value when ranked highest to lowest x SX n Mean mathematical average Range difference between the smallest and largest average Variance average squared distance from the mean (how spread out the values in a set of data are). Standard Deviation a measure of how closely the individual points of data cluster around the mean. Frequency Distribution # of cases falling into each category of the variable Number number of data points Use the table above to decide which type of descriptive statistics you will do and list them here 1 Using the TI-84 Plus to Find Descriptive Statistic Values The buttons on the calculator are indicated in bold. Push the ON button. Push 2nd (blue key) then List Push >> until MATH is highlighted Arrow down until MEAN is selected. Hit ENTER Enter your list of data points according to this format: ({5,6,7,3,10}) then hit ENTER The answer should be displayed to the right. Push 2nd then List Push >> until MATH is highlighted Arrow down until MEDIAN is selected. Hit ENTER Enter your list of data points according to the this format: ({5,6,7,3,10}) then hit ENTER The answer should be displayed to the right. Repeat the above steps for standard deviation and variance. __________________________________________________________________ STEP THREE For Quantitative: Follow the directions above for using the TI-84 Plus and record these values here: Mean ______________ Range _______________ Variance _____________ Standard Deviation___________ For Qualitative: Determine the mode, median and frequency distribution and record here: Mode _____________ Median __________________ Frequency Distribution ___________________________ STEP FOUR Inferential Statistics Inferential statistics are done to determine if the data is statistically significant. They limit the possibility that the data differences occurred by random chance or due to some unknown, uncontrolled variable. If the data is shown to be statistically significant than the data differences can be explained by changes in the independent variable. Statistical Tests 1. The t-test (or Analysis of Variance): An analysis of variance is used when you have two or more groups and you want to compare measurements of each group. The t-test analyzes the relationship between two groups. 2. The Chi-square test: This test is used when you have counts that can be placed into yes or no categories, or other simple categories such as quadrats. 3. The Pearson R Correlation: The Pearson R Correlation allows you to test how the values of one event or object relates to the values of another event or object 2 Quantitative Inferential Statistics t-Test (t) Pearson R Correlation Qualitative Chi- Square (x2) Decide which of the inferential statistics you will be doing; calculate your Degrees of Freedom. Record here: Stats ____________________Degrees of Freedom ______________ Level of Significance - We will use 0.05 which means that the probability of error in the research is 5/100 (95%) df Degrees of Freedom - Represents the total number of observations in a sample. To calculate: μ For t-test df = (n1-1) + (n2-1) For Chi-square test df = (#rows – 1) (#columns – 1) For Pearson R correlation df = (n-2) subtract 2 from the number of comparisons made. Null Hypothesis - Basically states that there is no difference between the mean of your control group and the mean of your experimental group. Therefore any observed difference between the two sample means occurred by chance and is not significant. If you can disprove your null hypothesis then there is a significant difference between your control and experimental groups. STEP FIVE Three options for your null hypothesis μ1= μ2 This states that the two means are equal (experimental {1} and control {2}). To use this to reject your null hypothesis, your t-value must be > table value or your x2 calculated > x2 table. μ1< μ2 This states that the mean of your experimental group is lower than the mean of the control group. For example, in golf, the lower score is the better score. To use this to reject your null hypothesis your t-value must be < table value or your x2 calculated > x2 table. μ1>μ2 This states that the mean of your experimental groups is higher than the mean of the control group. For example, plants with fertilizer grow higher without. To reject your null hypothesis your t-value must be > table value or your x2 calculated > x2 table. than those Write your null hypothesis here: ________________________________________________________________________ 3 Graphing calculators are helpful in determining T-TEST and CHI-SQUARE. Using the TI-84 Plus to Find Inferential Statistic Values T-TEST The buttons on the calculator are indicated in bold. Push the ON button Push 2nd MEM Arrow down to 4:ClrAll Lists and hit ENTER Hit ENTER again The screen should say DONE Push STAT Select 1: Edit by hitting ENTER Under L1, type in the data from your experimental group. Type in the numbers and hit ENTER in between each. Arrow over to L2 and type in the data from your control group. When done hit STAT again. Push >> to get to Tests Arrow down to option 4:2-SampTTest and hit ENTER Make sure that Data is highlighted. Arrow down and select the correct null hypothesis; µ1 ≠ µ2, µ1 < µ2, µ1 > µ2 Make sure Pooled is set to NO Arrow down to CALCULATE and hit ENTER Your t-value is indicated by t = CHI-SQUARE Push 2nd MEM Arrow down to 2: Delete and hit ENTER Arrow down to 5: Matrix and hit ENTER Hit enter for each Matrix [A], [B], entry that is listed Example: A researcher tests the hypothesis that there was no significant difference in the amount of graphing calculator use demanded by the different tests given to the three senior classes at Roosevelt High. She analyzed each of the three 50-item tests and classified each item as inactive, neutral, or active depending on the extent of calculator use required. Use the tallies shown in the 3x3 matrix to test the hypothesis. 4 Test A 16 14 20 Inactive Neutral Active Test B 19 10 21 Test C 13 26 11 To enter the data in your matrix Note: Your matrix must be at least a 2 x 2, if you have a 1 x 2 please ask your teacher for additional instructions. Push 2nd then push MATRIX Push >> to get to EDIT (you must set up a matrix to record the data for the x 2 -test) hit ENTER Set up the values for your matrix (rows x columns), the matrix for the example is 3 x 3, and select 1: [A] by hitting ENTER Begin to enter the data for the columns and rows exactly as it is in your matrix table Push STAT and push >> to get to TESTS Arrow down to C: X2-Test and hit ENTER Arrow down to calculate and hit ENTER Your CHI-SQUARE value is indicated by X2 = To view your expected values: Push MATRIX Arrow over to EDIT and select 2:[B] Hit ENTER and your expected values will be listed in the B matrix. To Calculate Chi-square Manually Use the formula: x2 = ( O - E) 2 E x2= Chi-square = Sum of the Values O = Observed Frequency Distribution E = Expected Frequency Distribution Example: Mary read that bees were attracted to the color yellow as opposed to red, blue, or white. She wondered if crickets would show a color preference. To test her hypothesis that crickets would be differentially attracted to colors, she placed 100 crickets in a container. To bottom of the container was divided into four equal sections covered by red, blue, yellow, or white paper. She observed the number of crickets on each color one hour after placing them in the container. The distribution of crickets was: 30 red, 40 blue, 12 yellow, 18 white. By chance alone, an equal number of crickets on each color of paper would be expected. Determine the Observed Frequency Distribution: Red Blue 30 40 Yellow 12 White 18 Determine the Expected Frequency Distribution Red Blue 25 25 Yellow 25 White 25 Use the formula to calculate x2 5 PEARSON R CORRELATION COEFFICIENT To calculate the Pearson R value you must use the Microsoft Excel program on the computer. It can not be calculated using the TI calculators. Calculate your t-value, Chi-Square, or Pearson R and record here: (Note: you will have different values for each of your experimental groups) STEP SIX Deciding to Accept or Reject the Null Hypothesis Use the tables for the t-test and the Chi-square test to find the table value. Use your calculated degrees of freedom and the Level of Significance of 0.05 (95%) to find the correct value. Determine if the calculated value is greater or less than the table value. For t-test: Refer to null hypothesis descriptions for decision to accept or reject the null hypothesis. For Chi-square: If x2 Calculated > x2 Table, then the null hypothesis is rejected. For Pearson R Correlation: If the calculated value is greater than the table value reject the null hypothesis. If the r = 0.00 there is zero correlation. If the r = 1.00 there is a perfect correlation. Values can be + or - . Positive values indicate increase in X corresponds to increase in Y. Negative values indicate increases in one value are associated with decreases in the other. Decide whether to accept or reject your null hypothesis. Accept _________ Reject ________ STEP SEVEN What Does it Mean to Accept or Reject the Null Hypothesis? The null hypothesis generally states that there is no significant difference between your two sets of data. If it is accepted, it means that any differences in your data are not significant and probably due to random chance. If the null hypothesis is rejected, it means that there is a significant difference in your two sets of data and these differences are due to the factors (independent variable) that you changed. Make a statement regarding your null hypothesis. For example: (from above):At df = 3, = 0.05, x2 = 7.815 for significance; the calculated x2 of 18.6 > 7.815 and is significant. The null hypothesis is rejected and the research that crickets would be differentially attracted to colors was supported. Your statement: ________________________________________________________________________ ________________________________________________________________________ 6 ANOVA Statistical Tests (to compare 3 or more groups) Websites for Free Calculators online: 1. http://www.danielsoper.com/statcalc/calc43.aspx 2. http://www.physics.csbsju.edu/stats/anova.html 3. For explanation of ANOVA see Wikipedia or below paragraphs or below websites http://www.stats.gla.ac.uk/steps/glossary/anova.html http://www.statisticallysignificantconsulting.com/Anova.htm ANOVA ("Analysis of Variance"). Like the two-sample t-test, ANOVA lets us test hypotheses about the mean (average) of a dependent variable across different groups. While the t-test is used to compare the means between two groups, ANOVA is used to compare means between 3 or more groups. There are several varieties of ANOVA, such as one-factor (or one-way) ANOVA, two-factor (or twoway) ANOVA, and so on, and also repeated measures ANOVA. The factors are the independent variables, each of which must be measured on a categorical scale - that is, levels of the independent variable must define separate groups. One-Way ANOVA Example One-factor ANOVA, also called one-way ANOVA is used when the study involves 3 or more levels of a single independent variable. For example we might look at average test scores for students exposed to one of three different teaching techniques (three levels of a single independent variable). ANOVA Statistics The null hypothesis for ANOVA is that the mean (average value of the dependent variable) is the same for all groups. The alternative or research hypothesis is that the average is not the same for all groups. The ANOVA test procedure produces an F-statistic, which is used to calculate the p-value. As described in the topic on Statistical Data Analysis if p < .05, we reject the null hypothesis. We can then conclude that the average of the dependent variable is not the same for all groups. With ANOVA, if the null hypothesis is rejected, then all we know is that at least 2 groups are different from each other. In order to determine which groups are different from which, post-hoc t-tests are performed using some form of correction (such as the Bonferroni correction) to adjust for an inflated probability of a Type I error. 7 Examples of Statistical Data Tables Quantitative TABLE 10.5 Effect of Fertilizer on the Mean Height (cm) of Bean Plants Descriptive Information Commercial Compost Control Mean Variance Standard Deviation 7.0 3.6 1.9 10 5.0 2.2 1.5 4.0 2.0 1.4 10 10 Number Results of t-test At df 18; µ of 0.01; Commercial vs. Compost t = 2.6 0.01<p<0.05 Compost vs.Control t = 1.5 p >0.01 Commercial vs. Control t = 4.0 p <0.00 t =2.878 for significance Qualitative TABLE 10.7 Attraction of Crickets to Various Colors Information Mode Observed Distribution Blue Expected Distribution (Chance) Calculated x2 Red-Blue Yellow-White Frequency Distribution Red Blue Yellow White 30 40 12 18 25 25 25 25 Number 100 100 Results of the Chi-square test x2 =18.6 at df=3 1.0 9.0 6.7 1.9 x2 of 18.6 > 7.815 p < 0.001 Tables from “Students and Research”, 2nd Edition, Cothron, Julia, Giese, Ronald, Rezba, Richard. Kendall/Hunt PublishingCompany. Dubuque, Iowa. 1993. 8 Sample Statistical Analysis for Quantitative Data FOR EXAMPLE - A student tested the effect of different types of fertilizers on plants. Below is his data for his control and fertilizer A. Trial Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Control Group Height of plant (mm) 45.0 46.2 51.4 43.2 44.1 42.7 41.8 42.6 41.8 42.4 43.1 44.3 43.2 42.6 43.4 Fertilizer A Height of plant (mm) 47.4 48.5 55.2 49.1 52.3 56.2 51.9 52.9 51.6 49.8 52.7 56.1 57.3 56.2 58.2 Steps for Using Excel for Statistics 1. Enter the data above into your Excel spreadsheet. It should look like the spreadsheet below. 2. Set up a table below your data table for your descriptive statistics. You should include mean, range, variance, and standard deviation. 9 3. Click in the cell for the mean of the control. 4. Click on Formula on the Tool Bar. Click on fx and the insert function will box will open. This will allow you to insert a formula into the spreadsheet. The Mean of a set of numbers is the Average. In the select category box, select Statistics. Under select a function, select Average and then click OK. 5. A box titled Function Arguments will open. 10 6. Take the mouse and highlight the numbers. A dotted line will appear around the column. 7. You will see that the average has been calculated to be 43.85333. Click OK. The average will be transferred to the mean cell in the spreadsheet. 8. Repeat steps 3 – 7 to calculate the mean for the data for Fertilizer A. The mean value you calculate for Fertilizer A should be 53.02667. 9. To calculate the Range, subtract the smallest number from the largest number. Enter the value into the cell for that value. 10. To calculate the variance, repeat steps 3 – 7 selecting VAR from the menu. 11. To calculate the standard deviation, repeat steps 3 – 7 selecting STDEV from the menu. 11 12. Your calculations should give you the following values: Control Fertilizer A Mean 43.8533 53.0267 Range 9.6000 10.8000 Variance 5.7627 11.5192 Standard Deviation 2.4006 3.3940 13. We are going to calculate a value for the t-test. In the area below the standard deviation value, type the word T-Test. 14. Click on the cell next to the T-Test cell. 15. Click on Formula on the Tool Bar. Click on fx and the insert function will box will 16. In the selection area, select TTEST. Your screen should look like this: open. 12 17. Click on OK. Your screen should look like this: 18. Click in the box next to Array1. Highlight the numbers in the control column. 19. Click in the box next to Array2. Highlight the numbers in the Fertilizer A column. 20. Click in the box next to Tails. If you have a one-tailed test, type in one. If you have a two-tailed test, type in two. 21. What is the meaning of a two-tailed test? If you are using a significance level of alpha = 0.05, a two-tailed test allots half of your alpha to testing the statistical significance in one direction and half of your alpha to testing statistical significance in the other direction. This means that .025 is in each tail of the distribution of your test statistic. When using a two-tailed test, regardless of the direction of the relationship you hypothesize, you are testing for the possibility of the relationship in both directions. 22. For a one tailed test, you are testing for the possibility of the relationship in either the left-tail area or the right tail area. 13 23. We are doing a two-tailed test so you need to enter a two next to tails. 24. Click in the box next to Type. If you are doing a paired test, enter 1. If you are doing a t-test in which the two samples have equal variances, you would type a 2. If the two samples have unequal variances, type 3. Our variances are not equal, so type 3. 25. Your screen should look like this: 26. Click on OK. 27. You get a value of 6.46129E-09. This is the probability that the results happened by chance. Since the p-value is so small, you would reject the null hypothesis. 14 Making a graph of your data. You want to graph your descriptive statistics. Highlight your descriptive statistics. 1. 2. 3. 4. Click on Insert on the Toolbar. Click on the type of graph your want. Click on the columns. Click on 2-D columns. If your graph covers your data, you can click on the graph and move the graph. 15 Sample Statistical Analysis for Qualitative Data FOR EXAMPLE - A student tested the effect of different types of fertilizers on plants. The students developed a rubric for the health of the parts. A 1 was not very healthy and a 5 was very healthy. Below is his data for his control and the different strengths of fertilizer A. Trial Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Control Group Health of plant 3 4 3 3 4 3 3 3 4 3 3 4 4 3 3 Fertilizer A 2 % Health of plant 4 4 4 4 4 4 4 4 4 4 4 4 4 3 3 Fertilizer A 4% Health of plant 4 4 4 5 5 5 4 5 5 4 5 4 5 4 5 Fertilizer A 6 % Health of plant 5 4 5 5 5 5 4 5 5 5 5 5 4 5 5 Steps for Using Excel for Statistics 1. Enter the data above into your Excel spreadsheet. It should look like the spreadsheet below. 2. Set up a table below your data table for your descriptive statistics. You should include the mode and the median. 16 3. Click in the cell for the mean of the control. 4. Click on Formula on the Tool Bar. Click on fx and the insert function will box will open. This will allow you to insert a formula into the spreadsheet. 17 5. In the select category box, select Statistics. Under select a function, select Mode and then click OK. 6. A box titled Function Arguments will open. 18 7. Take the mouse and highlight the numbers. A dotted line will appear around the column. 8. You will see that the mode has been calculated to be 3. Click OK. The mode will be transferred to the mode cell in the spreadsheet. Your spreadsheet should look like this. 19 9. Repeat steps 3 – 8 to find the mode for the different percentages of Fertilizer A. The mode represents the number that appears most often. If a number does not appear more than once, you will get an error message. The column will not have a mode. Your results should look like this. Mode Control Group Health of Plant Fertilizer A2% Health of plant Fertilizer A 4% Health of Plant Fertilizer A6% Health of plant 3 4 5 5 Median 10. To calculate the median, repeat steps 3 – 8 selecting MEDIAN from the function list. Your results should look like this. Control Group Health of Plant Fertilizer A2% Health of plant Fertilizer A 4% Health of Plant Fertilizer A6% Health of plant Mode 3 4 5 5 Median 3 4 5 5 20 Making a graph of your data. You want to graph your descriptive statistics. Highlight your descriptive statistics. 1. 2. 3. 4. Click on Insert on the Toolbar. Click on the type of graph your want. Click on the columns. Click on 2-D columns. If your graph covers your data, you can click on the graph and move the graph. 21 Doing Chi-Square in EXCEL There is a function in EXCEL called CHITEST. CHITEST does not return a value for Chi-Square. It skips that step and returns a probability that you will get a Chi-Square at least as high as the one you calculate from the observed values and predicted values. The problem is that the CHITEST’s degrees of freedom are not always calculated correctly. Depending on the case you can lose one or two degrees of freedom using CHITEST. Because the CHITEST is basing its answer on less than the correct degrees of freedom, it gives you an inappropriately large value for the probability. After Chi-Square has been calculated by hand, you can use the CHIDIST worksheet function to make a judgment about the Chi-Square value. 1. Select a cell to store the result. 2. From the Statistical Functions menu, select CHIDIST to open the Functions Arguments dialog box for CHIDIST. 3. In the Functional Arguments dialog box, type the values asked for in the box. 4. In the X box, type the calculated Chi-Square value. For an example, put 36 in the X box. 5. In the Deg_freedom box, type the degrees of freedom. After typing the degrees of freedom, the dialog box shows the one-tailed probability of obtaining at least this value of Chi-Square. For the example we are doing, type 25 for the degrees of freedom. 22 6. The Functional Arguments dialog box should look like this: 7. Click OK to close the dialog box and put the answer in the selected cell. 8. The value in the dialog box is greater than .05, so the decision is not to reject the null hypothesis. 23 Pearson Correlation This is a data analysis for a t-test for a paired two sample for means. 1. Enter the data for each sample into a separate data array. For example, we have the before data in column B and the after data in column C. 2. Select Data, then Data Analysis to open the Data Analysis dialog box. The Data Analysis ToolPak must be loaded as an add-in. 3. In the Data Analysis dialog box, scroll down the Analysis Tools list and select t-Test: Paired Two Sample for Means. 4. Click OK to open this tool’s dialog box. 24 5. In the Variable 1 Range box, enter the cell range that holds the data for one of the samples. Click in the Variable 1 Range box, then highlight the data in the B column. The range will appear in the box. 6. In the Variable 2 Range box, enter the cell range that holds the data for one of the samples. Click in the Variable 2 Range box, then highlight the data in the C column. The range will appear in the box. 25 7. In the Hypothesized Mean Difference box, type the difference between µ1 and µ2 that Ho specifies. In this example, the difference is 0. 8. If the cell ranges include column headings, check the Labels checkbox. These were included, so the box needs to be checked. 9. The Alpha box has 0.05 as a default. Change that value if you want to use a different α. 10. In the Output Options, select a radio button to indicate where you want the results. For this example, New Worksheet Ply was selected to put the results on a new page in the worksheet. 11. Click OK. Because New Worksheet Ply was selected, a new page opens with the results. 26 12. After the new page opens with the results, you need to expand the columns to read the results. 13. Cell B7 shows a value for the Pearson Correlation Coefficient. The coefficient will be a number between -1 and +1. It shows the strength of the relationship between the data in the first sample and the data in the second sample. 14. If this number is close to 1, high scores in one sample are associated with high scores in the other sample and low scores in one are associated with low scores in the other. If this number is close to -1, high scores in the first sample are associated with low scores in the second and low scores in the first are associated with high scores in the second. 15. If the number is close to zero, the scores in the first sample are not related to scores in the second sample. Our example gives us a value close to one. 16. Cell B9 shows the degrees of freedom. 17. Cell B8 shows the Ho specified difference between the population means. 18. Cell B10 gives the calculated value of the test statistic. 27 28