STATISTICAL ANALYSIS IB Biology Year 1 CURRICULUM OUTCOMES Topic 1: Statistical analysis (2 hours) 1.1.1 State that error bars are a graphical representation of the variability of data. 1.1.2 Calculate the mean and standard deviation of a set of values. 1.1.3 State that the term standard deviation is used to summarize the spread of values around the mean, and that 68% of the values fall within one standard deviation of the mean. 1.1.4 Explain how the standard deviation is useful for comparing the means and the spread of data between two or more samples. 1.1.5 Deduce the significance of the difference between two sets of data using calculated values for t and the appropriate tables. 1.1.6 Explain that the existence of a correlation does not establish that there is a causal relationship between two variables. LET’S START WITH AN EXAMPLE Imagine you want to study some aspect of bean plants. What sorts of things could you study? Create an hypothesis How will you test and measure your hypothesis? Obviously you can’t measure every bean plant that exists! Even thousands of bean plants are unrealistic in terms of time... We must use samples of bean plants that represent the entire population. So what we do is grow enough bean plants in order to get a sample that is small enough to efficiently get our data but large enough to represent the population as a whole. STATISTICS IS A BRANCH OF MATH! It allows us to take small portions from habitats, communities and populations and draw conclusions about the larger population. Stats measures the differences and relationships between sets of data. As for our experiment... Small sample compared to large population. Depending on our sample size, we can draw conclusions with a certain level of confidence. We can be 95% confident... We may even be 99% confident... But nothing is 100% confident in science (Yikes... That makes my scientific brain hurt...TOK application!!!). DESCRIPTIVE STATISTICS The mean and the standard deviation describe the data – they show us a picture that helps with interpretation of the data. MEAN The MEAN is the average of your data points. It is calculated by adding your data points together and dividing by how many points there are. Example: Look at these numbers: 3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29 The sum of these numbers is equal to 330 There are fifteen numbers. The mean is equal to 330 ÷ 15 = 22 RANGE Is the measure of the spread of data. It is calculated by finding the difference between the largest and smallest values. The range can give us an idea of how variable the data is Example: Largest value is 15, smallest value is 5 The range is 10 (15 – 5 = 10) Note: very large and very small values, called outliers, can have a very dramatic effect on the range. STANDARD DEVIATION (SD) Measures how the individual observations of a data set are dispersed (spread)around the mean. We will learn how to calculate SD by hand but usually you will use your graphing calculator or Excel. CALCULATING SD Try the example on your handout. I find a table helps to organize the calculation of SD NOTE: In Biology, we are calculating the sample standard deviation. In math, you will calculate the population standard deviation. Funny s for math, Sx for biology on your graphing calculators! We use standard deviation to measure the spread of our values around the mean. If our data has normal distribution (meaning our values are clustered around the mean) then we assume that: About 68% of our values lie within ± 1 SD of the mean. This number rises to 95% for ± 2 SD from the mean. ERROR BARS Are graphical representations of the variability of the data. Error bars can show either the range of the data or the SD on a graph. BACK TO OUR BEAN PLANT... A sample of 100 bean plants Some extremes (very small, some very large) But when plotted our data should look something like a bell curve with the majority of our data centred around the mean. THE NORMAL DISTRIBUTION A flat bell curve indicates that the data is spread out widely from the mean. Thus, the standard deviation would be large. A bell curve that is very tall and narrow shows that the data is very close to the mean. Thus, the standard deviation would be very small. SIGNIFICANT DIFFERENCE BETWEEN TWO MEANS To determine if a difference between two data sets is significant a t – test is commonly used. A t-test compares two sets of data. T TABLES Along one side of the table of critical values of t, you see probability (p), this is the likelihood that chance alone could produce your results. If p = 0.50 that means that the difference is due to chance 50% of the time. This is not significant. If p = 0.05 then only 5% of the difference is due to chance and means the 95% of the difference is due to one set of data actually being different from the other. This is considered to be a significant difference. The mean, standard deviation and sample size are all used to calculate the value of t. On the left column you will notice the “degrees of freedom” this is calculated by adding the two sample sizes together and subtracting 2. Line up the degrees of freedom and the 0.05 level of significance and this will give you the critical value of t for your Compare this critical value with the calculated value of t ***If the calculated t value is larger than the number on the chart then the two groups are significantly different from each other! LET’S DO AN EXAMPLE TO CLEAR THE MUD... Ms. Chris conducted an experiment. She wanted to study the effect of a hair growth product on the length of toe hair. She measured the length the hair on the toes of students in her biology class (Sample X) and then she had the students apply the growth product daily for one week and measured the length of the hair again (Sample Y). Both groups were normally distributed N for sample X was 23 N for sample Y was 19 Ms. Chris did some fancy math and calculated t = 2.956. Use a level of significance of 0.05 Can we conclude that the hair growth product resulted in significant hair growth? We will be testing the null hypothesis that is the two groups are the same. Step 1: Calculate the degrees of freedom: df= (N1+N2)-2 Df = 23+19-2 = 40 Step 2: Use the chart of critical values of t Line up 0.05 with 40 This gives us a critical value of t to be 2.021 Step 3: Evaluate Remember*** If the calculated t value is larger than the number on the chart, then the groups are significantly different from each other So... We were given a calculated t of 2.956 which is larger than the table value so.... We can conclude that the two groups are significantly different from each other and that the hair growth formula resulted in significant hair growth! TRY ANOTHER... Are two samples with df=17 and a calculated t value of 1.935 significantly different at 0.05? CORRELATION DOES NOT MEAN CAUSATION! We make observations about the world around us all the time. We might notice that our grass turns yellow when its soil is dry, this is a simple observation. We might do an experiment to see if watering our grass prevents yellowing. Observing that the yellowing occurs when soil is dry is a simple correlation, but the experiment gives us evidence that a lack of water is the cause of the yellowing. Experiments provide a test that shows cause, observations without a test only show a correlation. ODD EXAMPLES... Ice cream sales and the number of shark attacks on swimmers are correlated. The number of cavities in elementary school children and vocabulary size have a strong positive correlation. Clearly there is no real interaction between the factors involved simply a co-incidence of the data. APPLICATION AND CONSOLIDATION Now it’s time to apply what we have learned. Please complete the worksheet called “Statistical Analysis Application Practice” You can put your heads together with the person sitting next to you We will go over everything when you have had a chance to try – the best way to learn is to try!