GK-12 Seminar: using Excel for graphing and basic statistics

advertisement

GK-12 Seminar: using Excel for graphing and basic statistics

We will cover…

1) cell functions (i.e. mean, median, variance, standard deviation, etc.)

2) graphing functions (histograms, scatter plots)

3) statistical analysis (regression, paired t-tests)

Cell Functions

Excel 2007 – simple drop down menu makes for quick access to cell functions

Excel 2003 – insert a function…

1) select “function” from the Insert menu

2) clicking on the f x to the left of the formula bar

3) Manually type an “=” or a “+” sign and the function in a cell, once the function is recognized, the syntax summary is provided for you…

Let’s practice? Open the GK-12 template, click on “Cell Functions” tab at the bottom

1)

Convert “# of worms per sample” to “# of worms per m 2 ” by multiplying the values in column C by 9 a.

Do it on one cell, then copy the formula to the cells below using either the i.

“copy” & “paste” commands ii.

“autofill” drag down

2)

Now, for the two sample locations (lawn & forest) calculate… a.

The sample size (n) using the “count” function b.

The mean using the “average” function c.

The median using the “median” function d.

The standard deviation (use the insert function options to find the correct function) e.

The variance (use insert function to find it) f.

The 95% confidence interval using the “confidence” function. This one is tricky because it uses the desired alpha level, standard deviation and sample size to calculate the confidence interval. You can enter those parameters manually or by selecting the appropriate cells. g.

The upper and lower 95% confidence values (add and subtract the 95% interval to/from the mean)

Questions? On to graphing…

Graphing

1) Bar graph with error bars

Note how, in the examples below, changes in the scale and addition of error bars may change your impressions of the data.

Earthworm abundance in two habitat types on the LSC campus

120

100

80

60

40

20

0 high low habitat type

Earthworm abundance in two habitat types on the LSC campus

200

150

100

50

0 habitat type high low

First, we will be creating a simple bar graph, then add a 95% confidence error bar using the worm data you have been working with. To create a graph...

In 2003, select “chart” from the Insert menu or click the Chart Wizard icon. From the Chart

Wizard, select the type of graph (chart) you want to create, in this case “clustered column”

Using the “data range” tab, you can let the Wizard create the graph, i) tell it whether the data is in columns or rows, and ii) then select the cells with the labels (lawn, forest) and data (means) iii) click next

Add the chart title and labels for the X and Y axis, click next

Indicate the desired chart location (as an object in “graphing”), click finish

Now it’s clean up time… i) delete the “series” box (if you have multiple data series, this would be your legend) ii) change the chart title to “Earthworm abundance in two habitat types on the Lake

Superior College Campus” iii) change the Y axis scale…hover over the axis until you see “value axis”, then right click, select “format axis”, select the “scale” tab and change the scale to 0 to 100 and the major unit to 20, uncheck the minor unit. Click OK and you have a basic bar graph! This is the simplest graph & a good place to start

Advanced graphing…adding error bars

To add appropriate error bars to your data (note the standard deviations are very different for the two habitat types) you have to do graph each data set as a “series”

Start the Chart Wizard again, but this time select the “series” tab i) click the “add” button ii) select the cells (or type manually) the name and data values for “series 1” (i.e. name=Lawn; date=mean) iii) click the “add” button iv) select the cells (or type manually) the name and data values for “series 2”

You could go on if you have more data, but we only have two series, so click NEXT

Add the chart title and labels for the X and Y axis, click next

Indicate the desired chart location (as an object in “graphing”), click finish

Notice that the two bars are different colors, this indicates that they are different “series”

Clean up time… iv) delete the “1” on the X axis (hover over the axis, right click on category axis, select format axis, on the patterns tab, select tick mark labels “none”) v) change the chart title to “Earthworm abundance in two habitat types on the Lake

Superior College Campus”. Click OK

Now let’s add error bars… i) select one data column (lawn), right click and select “Format Data Series” ii) select the “Y error bar” tab iii) select the display option “both” ; select the “custom” error amount option and select the cells that contain the upper and lower 95% confidence values you calculated for that data series iv) repeat for the second data series

Notice that the Y-axis adjusts to accommodate the error bars worms

160

140

120

100

80

60

40

20 lawn forest

0 habitat types

So, do these populations look like they are different? Why and why not?

Let’s test it statistically…

Statistical Analysis – the “Analytical Tool Pack”

Excel 2007

1) click the Excel 'big button', then Excel Options.

2) then Add-ins and then click GO (making sure Excel Add-ins is selected in dropdown)

3) Then, check the box by desired add-in(s), finally click OK...

Excel 2003

1) in the Tools menu, select Add-ins…

2) Select the Analytical tool kits…

3) Now, the Data Analysis tool should be listed under the Tools menu…

4) Now, the Analytical Tool Pack in the 2003 & 2007 versions look the same…

On to using the Analytical Tool Pack…

Comparing means – the Paired T-test

Under the Tools menu, select “data analysis”, and you will see the analytical tools available.

Scroll down and select “t-test: Paired two sample for means”

Select the cells with variable 1, variable 2, your desired alpha level, and the output range (select the cell that will be the upper left corner of the output table, for example, F6), Click OK.

Re-label “variable1” and “variable2” in the output table (lawn & forest).

Note: if the data is arranged in columns with the label as the first row, be sure to include the label cell at the top of the data when you select the “variable”, then simply click on the “labels” box on the t-test dialogue box and it will automatically provide the variable labels in the output table...

Try it.

Do you know about “paste special”? How to “wrap text” in a cell?

Data Distributions: Histograms

Used to evaluate the relative distributions of data, bar graphs representing the frequency of any given data point are used, these graphs are called histograms…

Biased Distribution Random Distribution

7

6

5

4

3

2

1

0

10

8

6

4

2

0

1 2 3 4 7 8 9 10 1 2 3 4 7 8 9 10 5 6 number

5 6 number

To create a histogram in Excel, you will use the Analytical Tool pack, but first you will need to create a column that indicated the categories you want on your X axis (they call it a “bin range”).

For a frequency distribution, this would be the range of the data you are assessing.

In our example, BIN 1 represents the range of the data when converted to “worms per m 2 ” and

BIN 2 represents the range of data for “worms per sample”.

To create the histogram…Select “data analysis” in the Tools menu, then select ‘histogram” i) select the input range (the data you want to see in the histogram). Let’s start with “# worms per sample and select all the data in that column (lawn & forest). NOTE: if you want to examine the distributions of the data for each location, you need to do them separately. ii)

Select the “bin range”, including the label iii)

Check the “labels” box iv) Select the output range (upper left corner of where you want the histogram table and graph placed, like H2) v)

Select “chart output” if you want to have a graphed histogram created for you…

Now try it with the “# of worms per m 2 ” data.

Scatter plots

There are many different kinds of scatter plots, the example here of a “performance curve” which can be used to access the reliability of your data.

From these curves, what might you conclude about the data collected in the “lawn” vs. “forest” locations?

“Species area curves” are a special kind of performance curve that plots the number of plots (or cumulative area) sampled vs. the cumulative number of unique species detected.

25

20

15

10

5

0

1 2 3 4 5 6 7 8 9 10 cumulative area sampled (m

2

)

Both of these types of curves can be used to assess whether or not you have collected enough samples for reliable data, given the organisms of interest.

Regression

When you want to test the relationship of one variable to another, for example, earthworm length vs. biomass…regression is usually the tool of choice. Excel provides a robust version with quite a few options, depending on your needs and interests…

To start, select the “data analysis” from the Tools menu, then select “regression” i) select the “input Y range”, your dependent variable (earthworm biomass), with the label ii) select the “input X range”, your independent variable (worm length) with the label iii) check the “labels” box iv) what is “constant is zero” ? click on help to find out, then select it or not v) select a confidence level vi) select an output range (upper left corner of where you want all your output to go) vii) select all the residual options and the normal probability plot viii) click OK

This is pretty much the maximum of what you can get as an output, go ahead and play with it to get what you want, and not more, so as not to confuse students any more than they already are!

Download