S6. Plant Ecology Research-Lab 4-basic stats lab

Fleming, 2014, plant ecology research lesson, S6 Lab 4 Lab 4: Basic Statistics There are many metrics by which we can quantify aspects of plant communities. In today’s lab you will use the files “supplemental table SI-2” and “supplemental table SI-1” to quantify and compare selected characteristics from sampled plant species. The big questions to answer today are (1) what major trends or patterns can be discerned in these data sets; (2) what differences among major groups of plants can be demonstrated with values within these data sets? Today, you will use the concepts of central tendency and variance to analyze data. Additionally, you will analyze pairs of data sets against each other using a method called a t-test, or three or more data sets against each other using a method called analysis of variance (ANOVA). The MS Excel program is handy for these simple statistical tests if you have the “Data Analysis” toolpack added on, and of course, is also very useful for calculating averages of data in rows and/or columns. However, we can all use a free online software computational website called “vassarstats” to do these analyses as well. It’s robust, easy to use, costs nothing, provides links to logical and computational details for each statistical method, and gives every user the exact same output format. In groups of 2-3, work through the following exercise. If you each have a personal laptop, please bring it with you and use it in lab so that you are all working in parallel. If you don’t have a laptop, please work together closely with someone in your group who does. Or, if you prefer, the group can go to the computer lab to complete its work, with some people working on their laptops and some on the desktops in the computer lab. Answers to the following questions are due next week in lab, one set of answers per group. 1. As a simple warm up, consider the following data scenario. Consider 10 college students riding a bus to the CSU Stanislaus campus. Assume each of them by some astonishing coincidence makes exactly $10,000 per year. What would the average income of this group of students be? Now assume that one student gets off and Bill Gates gets on the bus! His income is about $1,000,000,000 per year (wow!)…now, what is the average income of all the people on the bus? “Average” can be expressed three different ways. The most common expression of an “average” is called the mean, which is simply the sum of the sample values divided by the sample size. For example, if you have the values 3,6,7 then the mean is (3+6+7)/3 = 5.33. Another expression of the “average” value is called the median. The median value of a sample is simply the middle value of all the samples. For example, if you have the values 3, 6, 7 then the median is 6. If you have the values 3,6,7,9 then the median is 6.5 (take the mean of the middle two values). One tip: finding the median is much easier if you order the samples from smallest to greatest first, then look for the middle value. The final expression of an “average” is called the mode. The mode is simply the value that occurs most often in your sample. For example, if you have the values 3,6,7 then there is no mode! If you have 3,3,6,7 then the mode = 3. Central Tendency Method Mean Median Median Mode Mode Raw Data 3, 6, 7 3, 6, 7 3, 6, 7, 9 3, 6, 7 3, 3, 6, 7 Central Tendency Value 5.33 6 6.5 none 3 Fleming, 2014, plant ecology research lesson, S6 Lab 4 Of course, we also need to report the amount of variation in the data. As for averages, there are also several ways to do this, but we will consider two methods here: the standard deviation and standard error. Standard deviation is simply the mean value by which any one point in your data set deviates from the sample mean. If there is much “spread” in the data, then standard deviation will be large compared to a precise data set where there is little “spread”. By extension, standard error takes sample size into account. If the sample size is large, then the value calculated for standard error will be small compared to a small data set, which would have a larger value for standard error. Let’s return to our example of the bus passengers and the values we might compute for their average salaries. Calculate the mean, median, mode, standard deviation and standard error of these two samples. See below for hints! Income (only students) 10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 10,000 Income (students + Bill Gates) 10,000 10,000 10,000 10,000 10,000 10,000 1,000,000,000 10,000 10,000 10,000 Mean Median Mode Standard Deviation Standard Error NOTE: Standard deviation is calculated as the square root of variance. Variance is calculated by summing the squares of the deviations of each individual observation from the sample mean and dividing by one less than the sample size. Before squaring, variance can be negative! We the square the sums of deviations of each data point to get positive units, but that leaves us with “units squared”, which leads us to use the square root of variance to obtain variation in terms of our original units. A helpful tutorial can be found here: http://www.youtube.com/watch?v=qqOyy_NjflU Variance = s2 = Σ(x-ẍ)2/n-1 so  standard deviation =s = square root (variance) Standard error is calculated as the standard deviation divided by the square root of the sample size. Fleming, 2014, plant ecology research lesson, S6 Lab 4 Another important concept in statistical analyses is “significance”. You have probably heard or used the expression “there is no significant difference between these data sets.” What does “significant” mean in a statistical context? That the results are meaningful or important? Not exactly. In statistics, a significant result means that there is a low probability that the observed effect is attributable to chance alone. Said another way, a significant result in statistical testing tells us that the observed effect is very likely attributable to the variable(s) we manipulated in an experiment, or that two or more groups truly do differ from each other on average most of the time. 2. Download the file “sample_speciesv2”. This data set is a simplified data set much like one you will generate and use for your Red Hills project. For this exercise you can disregard the first three rows in the file. The values in each cell represent the percent (%) cover for a particular species in a particular stand (location). Use vassarstats.net to explore the data for patterns (your instructor will walk you through a simple exercise to get you started with this website), and answer the following questions (a tutorial on using vassarstats for a different type of statistical analysis can be found here): https://www.youtube.com/watch?v=qrdMDnwFapE. Be sure to graph your findings in terms of averages and standard deviation! a. What is the mean (average) cover for each species? For each stand? What is the standard deviation and standard error for each species and each stand? b. What is the mean cover for each functional group? What is the mean functional group cover for each stand? Provide standard deviations and standard errors as well. c. Is tree cover significantly different from shrub cover for the 20 plots? A t-test will help you answer this question. Be sure to use the “independent sample” option for t-tests. Should you use the “equal variance” or “unequal variance” output? Why? d. Is Grass 3 cover significantly different from Moss 1 cover? A t-test will help you answer this question. Be sure to use the “independent sample” option for t-tests. Should you use the “equal variance” or “unequal variance” option? Why? e. Are there significant differences among mean cover of all the functional groups? ANOVA is appropriate here (be sure to use the “single factor  independent sample” option). Which functional group(s) differ from the others? How do you know? f. Are Shrub 1, Grass 3, and Moss 1 cover significantly different from each other? An ANOVA test (single factor  independent sample) will help you answer this question. 3. Use the greenhouse data (entered into S3 Table 1 – greenhouse activity data template) to answer the following questions. Again, use vassarstats.net to explore the data and answer the following questions. Be sure to graph your findings. a. Is the average per stem biomass for radishes grown on flats of 8 species in 2013 significantly different in from 2014? b. Are there significant differences in per stem biomass in corn grown in every species combination? For this question, lump together the 2013 and 2014 data so you have only 4 columns of data (1, 2, 4 and 8 species combinations).

S6. Plant Ecology Research-Lab 4-basic stats lab

Related documents

Products

Support

S6. Plant Ecology Research-Lab 4-basic stats lab

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib