Data Analysis Guidelines

advertisement
Data Analysis Guidelines
MEASURES OF CENTRAL TENDENCY
Mean
Generally, the mean is the best measure of central tendency for quantitative data and should be used unless there are
outliers that would distort the mean value.
To calculate mean:
1. Add all data values in the set
2. Divide the sum of the data values by the number of data in the set, this value is the mean.
3. Record this value in the appropriate column of the data table.
Mean =
Sum of the data
Number of data in the set
Summary: Mean is the average of the data set.
Median
Median is an appropriate measure of central tendency for quantitative data that is “skewed” and for qualitative data
with ranked categories. The median is the data value that falls in the exact middle of a data set that is ordered from
smallest to largest.
To determine the median:
1. List all data values in numerical order from least to greatest.
2. If the number of data in the set is odd, find the value that falls in the exact middle of the data set, this value is
the median.
3. If the number of data in the set is even, calculate the mean of the two middle data values, this value is the
median.
4. Record this value in the appropriate column of the data table.
1, 4, 4, 7, 8, 8, 9, 10, 42
Median = 8
1, 7, 8, 8, 10, 10, 42 , 52
Median = 8+10 = 9
2
Summary: Median is the middle number in an ordered data set.
Mode
Mode is an appropriate measure of central tendency for qualitative non-ranked data. The mode of a data set is the data
value that occurs most frequently. If no data value in a data set occurs more than once, then we say that the data set has
no mode. If two or more data value appear the most, then both values are considered the mode.
To determine mode:
1. Look over the data set and count how many times each data value occurs.
2. Identify which data value occurs the most, this value is the mode.
3. Record this value in the appropriate column of the data table.
white, white, red, brown, brown, red, white, white
Mode = white
Summary: Mode is the most common data value in a data set.
MEASURES OF VARIATION
Standard Deviation
Standard deviation is a calculated value that describes the variation (or spread) of values in a data set. It is calculated
using a formula that compares each piece of data to the mean. It is useful to think of this number as a “plus or minus”
value, where a larger standard deviation indicates that the data are spread further from the mean and thus the mean is
less reliable. In general, a large value of standard deviation indicates less confidence in the data set and its mean, while
a small value of standard deviations suggests greater confidence in the data and its mean.
To calculate standard deviation:
1. Calculate the mean.
2. Calculate the difference between the mean and each data value.
3. Square each difference.
4. Add the squared values together.
5. Divide the sum by the total number of data in the set.
6. The square root of the value calculated in the previous step is the standard deviation.
7. Record standard deviation as a  value in the appropriate column of the data table.
Plant Biomass (g)
Difference between
Biomass and Mean
(g)
Squared Value of
Difference
3
1
1
5
1
1
6
2
4
4
0
0
6
2
4
2
2
4
2
2
4
Mean: 4
Sum: 18
Sum/Total # of Data: 18/7
Square Root of Quotient: √2.5
Standard Deviation: 1.6
Summary: Standard deviation describes how far the majority of the data is from the mean.
To calculate mean and standard deviation using a graphing calculator* (Texas Instruments):
1.
2.
3.
4.
5.
Press the Stat key, then choose Edit and press Enter.
Enter data values in list one (L1).
Press the Stat key again, choose Calc, then choose 1-Var Stats and press Enter.
Choose 2nd, then L1 (the number 1 key) and press Enter.
The x value is the mean and the x value is the standard deviation.
*Any scientific calculator can calculate mean and standard deviation. Refer to the instructional manual for your
scientific calculator for specific instructions.
To calculate standard deviation using Microsoft Excel
1.
2.
3.
4.
Create a data table in an Excel spreadsheet complete with all relevant data values and columns headings for
data analysis (ex. mean, std. dev)
Highlight a cell where you want to add a calculation
Choose “Insert” and then “Function…” from the menu
 Note: There is no “Function” to calculate frequency distribution. You will either need to determine frequency
distribution and input the calculated value into the data table OR create a formula in Excel to calculate
frequency distribution.
Select the cells with the appropriate data to be analyzed
 Either select and highlight the cells, or type the cell numbers into the window that appears.
 Note: When calculating standard deviation, do not include the mean value when selecting the data to be
analyzed.
Soil type (IV)
Potting Soil (Control)
Marsh
Mt. Tam
Plant 1
Height in cm (DV)
Plant 3
Plant 2
20.5
10.5
15.2
Mean
22.2
12.2
16.3
Std. Dev. (+/-)
25.1
9.5
14.2
22.6
10.7
15.3
2.3
1.4
1.1
Frequency Distribution
When using qualitative data, frequency distribution is a good expression of variation. Frequency distribution is a
decimal that represents the number of times a particular data value occurs for each experimental group. It is calculated
by dividing the number of times a particular data value occurs by the total number of data in the set.
To calculate frequency distribution:
1. Identify and the number of times a particular data value occurs in a data set.
2. Divide this number by the total number of data in the set.
3. Report this value as a decimal.
4. Calculate and record frequency distribution for all data categories in the appropriate column of the data table.
To present frequency distribution in a data table:
The Effect of the Amount of Compost on the Leaf Quality of Bean Plants
Leaf Quality - 30 Days
Trials
Amount of
Compost
Median
1
2
3
4
5
0.0 g
10.0 g
4
4
4
3
4
4
4
4
4
4
20.0 g
30.0 g
3
2
3
2
2
3
4
2
40.0 g
50.0 g
2
2
1
3
1
3
1
3
Overall Plant Health
4 = green color, firm, no curled edges
3 = yellow-green color, firm, no curled edges
2 = yellow color, limp, with curled edges
1 = brown color, limp, with curled leaf
Frequency Distribution
4
3
2
1
4
4
1.0
0.8
0
0.2
0
0
0
0
2
4
3
2
0.2
.2
0.4
0.2
0.4
0.6
0
0
1
2
1
3
0
0
0
0.6
0.2
0.4
0.8
0
Graphing Guidelines
Types of Graphs
The type of graph used to display data depends largely on the independent variable.
Category of Data for
Graph Type
IV
continuous
line graph
discontinuous
bar graph
General rules for all graphs.
 Always use graph paper when graphing by hand. Use a ruler for all lines. Graph in pencil first, then trace over it in
ink.
 All graphs should be titled by stating the relationship between the independent and dependent variables.
 Label each axis, with appropriate unit in parentheses. The x-axis represents the independent variable, while the yaxis represents the dependent variable.
 If multiple data sets are included on a graph, it must include a key.
The Line/Scatter-Plot Graph
Constructing a line graph or scatter-plot.
• Draw and label the x and y axes of the graph. Indicate the appropriate units for each variable.
• Determine an appropriate scale for each axis. Do this by finding the range of the values to be graphed
(maximum - minimum), divide this range by the number of grids on the graph paper. To display the data most
clearly, increments should increase using even values (i.e. 2, 4, 6, 8, not 1, 3, 5, 7).
• Carefully plot the data.
• Line Graph: Connect the points (straight line) starting with the first data point and ending with the last data
point. Do not start at the origin unless you have that data (0,0).
• Scatter-Plot: If interested in a linear relationship between the IV and DV, then draw a line of best-fit.
The Bar Graph
Constructing a bar graph.
• Draw and label the x and y axes of the graph. Indicate the appropriate units for each variable.
• Evenly space the x-axis with the levels (treatments) of the independent variable. Evenly distribute the values
along the axis, leaving a space between each value.
• Determine the appropriate scale for the y-axis, using the method described for line/scatter-plot graphs above.
Subdivide the y-axis accordingly.
• Draw a vertical bar from the value of the independent variable on the x-axis to the corresponding value of the
dependent variable on the y-axis. Leave space between each bar.
Graphing Variation
Plotting Standard Deviation (SD)
 Add the SD to each mean or median and draw a thin horizontal line above the point/bar on the graph.
 Subtract the SD from each mean or median and draw a thin horizontal line below the point/bar on the graph.
 Draw a thin vertical line to connect the two thin horizontal lines. See the diagram below.
The Effect of Compost on Plant Height
25
Plant Height (cm)
20
15
Mean Plant…
10
5
0
No Compost
Grass Compost
Food Compost
Type of Compost
Plotting Frequency Distribution (FD)
 A separate bar graph or pie chart can be used to show FD.
The Effect of Compost on Plant Health
100%
90%
80%
Frequency Distribution
70%
60%
Unhealth
y
50%
40%
30%
20%
10%
0%
No Compost
Grass Compost
Type of Compost
Food Compost
Note on Graphing: Use Common Sense. Think about how to show your data to someone who has never seen it before.
Make it as clear as possible
Computer Graphing
Use the instructions below to enter, analyze, and graph data in Excel.
Graphing in Excel:
1. Create a data table in an Excel spreadsheet complete with all relevant data values
2. Highlight data to appear on the y-axis (the DV). If appropriate, you can also highlight the data for the x-axis
(the IV).
 Note: in many cases, this will be the values in the central tendency column
3. Select the chart wizard (from toolbar) or “Insert”…”Chart” from the menu
 Select the appropriate type of graph (recommended type for bar graphs is Clustered Column) and click ‘Next’.
 Click on the “Series” tab and change ‘series titles’ and ‘category x- axis labels’ if applicable.
 Note: ‘category x axis labels’ represent the levels of the independent variable
 Click on the grid icon next to the window labeled ‘category x-axis labels’ and then simply highlight the cells
containing the levels of the IV on your data table. Click ‘Next’.
 Title the graph in the window labeled ‘Chart Title’
 Label the x- axis by typing the independent variable into the window labeled ‘Category x- axis’. Don’t forget
units!
 Label the y- axis by typing the dependent variable into the window labeled ‘Value y-axis’. Don’t forget
units!
4. Click ‘Next’, and Finish as New Sheet
 Note: Graphs can often better express the difference between data values by changing the scale of the y-axis.
To do this, double click on the y-axis of your graph. Under “scale”, change the minimum value when
appropriate.
Graphing Y Error Bars (standard deviation) in Excel:
1. If the graph is a bar graph, double click on one of the bars. If the graph is a line graph, double click on one of the
data points in the graph.
 Note: This may not work if you created a 3 dimensional bar graph. Change the chart type to Clustered
Column.
2. Choose tab titled “y error bars” from dialogue box
3. Click on the grid icon next to the window labeled Custom (+). Highlight the cells containing values for standard
deviation on your data table.
4. Click on the grid icon next to the window labeled Custom (-). Highlight the cells containing values for standard
deviation on your data table.
5. Click O.K.
The effect of soil type on plant height
30
25
Height (cm)
20
15
10
5
0
Potting Soil (Control)
Marsh
Soil Type
Mt. Tam
Download