Data Analysis Guidelines MEASURES OF CENTRAL TENDENCY Mean Generally, the mean is the best measure of central tendency for quantitative data and should be used unless there are outliers that would distort the mean value. To calculate mean: 1. Add all data values in the set 2. Divide the sum of the data values by the number of data in the set, this value is the mean. 3. Record this value in the appropriate column of the data table. Mean = Sum of the data Number of data in the set Summary: Mean is the average of the data set. Median Median is an appropriate measure of central tendency for quantitative data that is “skewed” and for qualitative data with ranked categories. The median is the data value that falls in the exact middle of a data set that is ordered from smallest to largest. To determine the median: 1. List all data values in numerical order from least to greatest. 2. If the number of data in the set is odd, find the value that falls in the exact middle of the data set, this value is the median. 3. If the number of data in the set is even, calculate the mean of the two middle data values, this value is the median. 4. Record this value in the appropriate column of the data table. 1, 4, 4, 7, 8, 8, 9, 10, 42 Median = 8 1, 7, 8, 8, 10, 10, 42 , 52 Median = 8+10 = 9 2 Summary: Median is the middle number in an ordered data set. Mode Mode is an appropriate measure of central tendency for qualitative non-ranked data. The mode of a data set is the data value that occurs most frequently. If no data value in a data set occurs more than once, then we say that the data set has no mode. If two or more data value appear the most, then both values are considered the mode. To determine mode: 1. Look over the data set and count how many times each data value occurs. 2. Identify which data value occurs the most, this value is the mode. 3. Record this value in the appropriate column of the data table. white, white, red, brown, brown, red, white, white Mode = white Summary: Mode is the most common data value in a data set. MEASURES OF VARIATION Standard Deviation Standard deviation is a calculated value that describes the variation (or spread) of values in a data set. It is calculated using a formula that compares each piece of data to the mean. It is useful to think of this number as a “plus or minus” value, where a larger standard deviation indicates that the data are spread further from the mean and thus the mean is less reliable. In general, a large value of standard deviation indicates less confidence in the data set and its mean, while a small value of standard deviations suggests greater confidence in the data and its mean. To calculate standard deviation: 1. Calculate the mean. 2. Calculate the difference between the mean and each data value. 3. Square each difference. 4. Add the squared values together. 5. Divide the sum by the total number of data in the set. 6. The square root of the value calculated in the previous step is the standard deviation. 7. Record standard deviation as a value in the appropriate column of the data table. Plant Biomass (g) Difference between Biomass and Mean (g) Squared Value of Difference 3 1 1 5 1 1 6 2 4 4 0 0 6 2 4 2 2 4 2 2 4 Mean: 4 Sum: 18 Sum/Total # of Data: 18/7 Square Root of Quotient: √2.5 Standard Deviation: 1.6 Summary: Standard deviation describes how far the majority of the data is from the mean. To calculate mean and standard deviation using a graphing calculator* (Texas Instruments): 1. 2. 3. 4. 5. Press the Stat key, then choose Edit and press Enter. Enter data values in list one (L1). Press the Stat key again, choose Calc, then choose 1-Var Stats and press Enter. Choose 2nd, then L1 (the number 1 key) and press Enter. The x value is the mean and the x value is the standard deviation. *Any scientific calculator can calculate mean and standard deviation. Refer to the instructional manual for your scientific calculator for specific instructions. To calculate standard deviation using Microsoft Excel 1. 2. 3. 4. Create a data table in an Excel spreadsheet complete with all relevant data values and columns headings for data analysis (ex. mean, std. dev) Highlight a cell where you want to add a calculation Choose “Insert” and then “Function…” from the menu Note: There is no “Function” to calculate frequency distribution. You will either need to determine frequency distribution and input the calculated value into the data table OR create a formula in Excel to calculate frequency distribution. Select the cells with the appropriate data to be analyzed Either select and highlight the cells, or type the cell numbers into the window that appears. Note: When calculating standard deviation, do not include the mean value when selecting the data to be analyzed. Soil type (IV) Potting Soil (Control) Marsh Mt. Tam Plant 1 Height in cm (DV) Plant 3 Plant 2 20.5 10.5 15.2 Mean 22.2 12.2 16.3 Std. Dev. (+/-) 25.1 9.5 14.2 22.6 10.7 15.3 2.3 1.4 1.1 Frequency Distribution When using qualitative data, frequency distribution is a good expression of variation. Frequency distribution is a decimal that represents the number of times a particular data value occurs for each experimental group. It is calculated by dividing the number of times a particular data value occurs by the total number of data in the set. To calculate frequency distribution: 1. Identify and the number of times a particular data value occurs in a data set. 2. Divide this number by the total number of data in the set. 3. Report this value as a decimal. 4. Calculate and record frequency distribution for all data categories in the appropriate column of the data table. To present frequency distribution in a data table: The Effect of the Amount of Compost on the Leaf Quality of Bean Plants Leaf Quality - 30 Days Trials Amount of Compost Median 1 2 3 4 5 0.0 g 10.0 g 4 4 4 3 4 4 4 4 4 4 20.0 g 30.0 g 3 2 3 2 2 3 4 2 40.0 g 50.0 g 2 2 1 3 1 3 1 3 Overall Plant Health 4 = green color, firm, no curled edges 3 = yellow-green color, firm, no curled edges 2 = yellow color, limp, with curled edges 1 = brown color, limp, with curled leaf Frequency Distribution 4 3 2 1 4 4 1.0 0.8 0 0.2 0 0 0 0 2 4 3 2 0.2 .2 0.4 0.2 0.4 0.6 0 0 1 2 1 3 0 0 0 0.6 0.2 0.4 0.8 0 Graphing Guidelines Types of Graphs The type of graph used to display data depends largely on the independent variable. Category of Data for Graph Type IV continuous line graph discontinuous bar graph General rules for all graphs. Always use graph paper when graphing by hand. Use a ruler for all lines. Graph in pencil first, then trace over it in ink. All graphs should be titled by stating the relationship between the independent and dependent variables. Label each axis, with appropriate unit in parentheses. The x-axis represents the independent variable, while the yaxis represents the dependent variable. If multiple data sets are included on a graph, it must include a key. The Line/Scatter-Plot Graph Constructing a line graph or scatter-plot. • Draw and label the x and y axes of the graph. Indicate the appropriate units for each variable. • Determine an appropriate scale for each axis. Do this by finding the range of the values to be graphed (maximum - minimum), divide this range by the number of grids on the graph paper. To display the data most clearly, increments should increase using even values (i.e. 2, 4, 6, 8, not 1, 3, 5, 7). • Carefully plot the data. • Line Graph: Connect the points (straight line) starting with the first data point and ending with the last data point. Do not start at the origin unless you have that data (0,0). • Scatter-Plot: If interested in a linear relationship between the IV and DV, then draw a line of best-fit. The Bar Graph Constructing a bar graph. • Draw and label the x and y axes of the graph. Indicate the appropriate units for each variable. • Evenly space the x-axis with the levels (treatments) of the independent variable. Evenly distribute the values along the axis, leaving a space between each value. • Determine the appropriate scale for the y-axis, using the method described for line/scatter-plot graphs above. Subdivide the y-axis accordingly. • Draw a vertical bar from the value of the independent variable on the x-axis to the corresponding value of the dependent variable on the y-axis. Leave space between each bar. Graphing Variation Plotting Standard Deviation (SD) Add the SD to each mean or median and draw a thin horizontal line above the point/bar on the graph. Subtract the SD from each mean or median and draw a thin horizontal line below the point/bar on the graph. Draw a thin vertical line to connect the two thin horizontal lines. See the diagram below. The Effect of Compost on Plant Height 25 Plant Height (cm) 20 15 Mean Plant… 10 5 0 No Compost Grass Compost Food Compost Type of Compost Plotting Frequency Distribution (FD) A separate bar graph or pie chart can be used to show FD. The Effect of Compost on Plant Health 100% 90% 80% Frequency Distribution 70% 60% Unhealth y 50% 40% 30% 20% 10% 0% No Compost Grass Compost Type of Compost Food Compost Note on Graphing: Use Common Sense. Think about how to show your data to someone who has never seen it before. Make it as clear as possible Computer Graphing Use the instructions below to enter, analyze, and graph data in Excel. Graphing in Excel: 1. Create a data table in an Excel spreadsheet complete with all relevant data values 2. Highlight data to appear on the y-axis (the DV). If appropriate, you can also highlight the data for the x-axis (the IV). Note: in many cases, this will be the values in the central tendency column 3. Select the chart wizard (from toolbar) or “Insert”…”Chart” from the menu Select the appropriate type of graph (recommended type for bar graphs is Clustered Column) and click ‘Next’. Click on the “Series” tab and change ‘series titles’ and ‘category x- axis labels’ if applicable. Note: ‘category x axis labels’ represent the levels of the independent variable Click on the grid icon next to the window labeled ‘category x-axis labels’ and then simply highlight the cells containing the levels of the IV on your data table. Click ‘Next’. Title the graph in the window labeled ‘Chart Title’ Label the x- axis by typing the independent variable into the window labeled ‘Category x- axis’. Don’t forget units! Label the y- axis by typing the dependent variable into the window labeled ‘Value y-axis’. Don’t forget units! 4. Click ‘Next’, and Finish as New Sheet Note: Graphs can often better express the difference between data values by changing the scale of the y-axis. To do this, double click on the y-axis of your graph. Under “scale”, change the minimum value when appropriate. Graphing Y Error Bars (standard deviation) in Excel: 1. If the graph is a bar graph, double click on one of the bars. If the graph is a line graph, double click on one of the data points in the graph. Note: This may not work if you created a 3 dimensional bar graph. Change the chart type to Clustered Column. 2. Choose tab titled “y error bars” from dialogue box 3. Click on the grid icon next to the window labeled Custom (+). Highlight the cells containing values for standard deviation on your data table. 4. Click on the grid icon next to the window labeled Custom (-). Highlight the cells containing values for standard deviation on your data table. 5. Click O.K. The effect of soil type on plant height 30 25 Height (cm) 20 15 10 5 0 Potting Soil (Control) Marsh Soil Type Mt. Tam