FIELDSTON BIOLOGY STATISTICS SUMMARY PACKET TOOL MEAN DESCRIPTION Average, sum of a set of data divided by the number of data When to Use: use to describe the middle of a set of data – most useful when data set does not have outliers or is not skewed to one extreme Useful when comparing sets of data Affected by extreme values or outliers Example: 4, 5, 6, 5, 4 Mean: (4 + 5 + 6 + 5 + 4) / 5 = 4.8 MEDIAN Middle value, or the mean of the middle two values, when the data is arranged in numerical order When to Use: use median to describe middle of set of data, more helpful when there are outliers or data is skewed Useful when comparing sets of data Not as affected by extreme values or outliers If the numbers in the data are more clustered, without outliers or skewed data, the mean provides more information than median Example: 4, 5, 6, 5, 4 Median: 4, 4, 5, 5, 6 = 5 MODE Value, or number, that appears the most; it is possible to have more than one value or for no mode to exist When to Use: use mode when the data are non-numeric or when asked to choose the most popular item Not as affected by outliers or skewed data When no values repeat in data set, mode is every value and not useful When there is more than one more, difficult to interpret and/or compare Example: 4, 5, 6, 5, 4 Mode = 4 and 5 DISTRIBUTION NORMAL DISTRIBUTION VS SKEWED DISTRIBUTION OF DATA: Normal Distribution Skewed Distribution Outliers NORMAL DISTRIBUTION: In normal distribution, extremely-large values and extremely-small values are rare. Most-frequent values are clustered around the mean (which here is same as the median and mode). SKEWED DISTRIBUTION: When data is shifted to one extreme, may be due to a real phenomenon that you are studying in your experiment. Another statistical tool (line of best fit) may be helpful in demonstrating that the skewed pattern is REAL, and not due to chance. Not necessarily the same thing as outliers OUTLIERS: an observation that is numerically distant from the rest of the data can occur by chance in any distribution but are often indicative of a measurement error no rigid mathematical definition of what constitutes an outlier --> ultimately determination is subjective and must be discussed openly if mean = (or close to) median, then probably no outliers if mean is not close to median, then most likely due to outlier often you will “know one when you see one” Estimators capable of coping with outliers are said to be robust: the median is a robust statistic, while the mean is not. How to Deal with OUTLIERS (Not Necessarily SKEWED DATA): If you can identify a concrete reason for why this outlier occurred then you can simply throw it out WHEN TO THROW OUT DATA, examples: if one group performed their experiment on a different day than the rest of the groups where the conditions of the experiment were affected by running it on a different day (ex, using reactive solutions that are a day older) if the group with the outlier noted that they had spilled some of their solution, then you could throw out their data. CAUTION: whatever criteria you used to throw out piece of data, that criteria must be applied to ALL other groups. EX: a procedure called for you to keep the solutions at 23 degrees C. So if you threw out data from a group because they performed the experiment at 33 degrees C, then you must throw out any data that was obtained at 33 degrees, even if that data “looks good” ACCURACY vs PRECISION If you can’t find legitimate way of eliminating an outlier, you can always perform statistical analysis WITH and WITHOUT the outlier included AND include both analysis. You could describe how while data did not support your hypothesis, if you eliminate this one outlier it changes the data enough to support your hypothesis. Accuracy: how close a measured value is to the actual (true) value Precision: how close the measured values are to each other If you are playing soccer and you always hit the left goal post instead of scoring, then you are not accurate, but you are precise! ERRORS Systematic Error: impacts accuracy of experiment; errors that may be reduced by some form of modification to the experimental design or execution Random Error: impacts precision of experiment; reduce the effect by averaging multiple trials ROUNDING Experimental Design / Method Error Errors in underlying assumptions in how the DV is being measured Human Error: ex., put in 2 drops when 3 drops were called for Instrumental Error Ex., balance that is calibrated correctly but gives two different measurements When calculating, rounded value should not go beyond the precision of instrument that was used to record. If last value is between 1 – 4 Round down; If last value is between 5 – 9 Round up Example: 3.15 + 2.7 + 1.624 = 7.474 7.5 % CHANGE Describes the amount of change relative to the original value. (Final – Initial)/Initial * 100 Example: Initial = 2 grams, Final = 3 grams % Change: (3 – 2) / 2 * 100 = 50% ABSOLUTE AVERAGE DEVIATION Absolute Average Deviation tells you the average spread (range) of the numbers around the mean of a sample. In order to calculate the absolute average deviation, you: 1. Take the difference between each score relative to the mean 2. And then take the absolute value average of all the deviations The value you get represents the spread, or range, of your data. % ABS AVE % in which the Absolute Average Deviation is relative to the Mean = (ABS AVE DEV / MEAN) * 100 DEV RELATIVE TO Example: MEAN = 30, ABS AVE DEV = 5 % the ABS AVE DEV relative to Mean = (5/30) *100 = 16.7% MEAN T TEST When to use: when comparing two or more different data sets, t test helps us to see if two data sets are different enough from each other so that we don’t think they are different due to chance. If they are different enough, we say the difference is statistically significant Calculation takes into consideration both difference in means as well as range/deviation within each data set