4ExcelTutorial

advertisement
Content Materials – Excel Tutorial
Using Statistical Analysis to Verify Conclusions
Overview
This tutorial will provide students with instructions on how to organize their data into a
contingency table. Students will also be able to calculate expected frequencies using HIV
mortality data. This tutorial will also guide students through the calculation of the chisquare statistic.
Tutorial
Open Excel
Create a title for your contingency table. Enter the title in the first row.
Use the printout for the data table from California to create your own contingency table.
Enter the variables you will be considering in your table.
Once the data is entered, click on B11 and then the summation symbol to calculate row
totals. Press the enter key when you are ready to have the total calculated. Do this for
each of the columns and rows. Totals should be in the following cells: C11 (total for 3544 age group), D11 (total for 45-54 age group), E9(total for number of males), E10(total
for number of females), and E11(grand total).
Enter totals as follows:
Now that the observed and total frequencies have been entered, the expected frequencies
must be calculated.
Create a similar table below, modifying it to include the categories representing the
expected frequencies.
The new table should look like this:
You may link the data by cell referencing. This is beneficial in case you decide to make
modifications to the information in your contingency table. An example of cell
referencing is as follows:
Press enter for each cell reference to be recorded.
The table should look like this at this stage.
Now the expected observations will be calculated as follows:
((row total for specific gender)X(column total for a specific age group))/grand total of
population
For example, Expected Male, 25-34 age group would be calculated as: ((454)(101))/529.
To calculate this in Excel, select the Exp M, 25-34 cell and type in the = symbol. Type in
a left parentheses, click on the first cell to be referenced, include the multiplication
symbol and click on the second cell to be referenced and then type in a right parentheses.
To complete the calculation, type the division symbol and reference the grand total of the
population.
This process is done for each cell. The table should look like this:
The calculation for the chi-square statistic is found in cell E14.
This formula is (Observed frequency-Expected frequency)^2/Expected frequency.
This is applied to each cell. After entering the calculation in each cell, press the enter key.
The table should look like the following:
To calculate the Chi-Square statistic for the data in the contingency table, the sum of all
of the calculated differences between the observed and expected values (cells: E14, F14,
G14, E16, F16, G16) is computed and compared to the Chi-Square critical value at a pvalue of .005 or any p-value of your choice. For any comparison, the Chi-Square critical
value is determined by the degrees of freedom (df) for the contingency table. The rule is
the following:
df = (number of rows minus one) x (number of columns minus one),
not including the “total” rows or columns.
For our California HIV mortality rate data, the df = (3-1) x (3-1), therefore our degrees of
freedom is 4. Now, you can refer to the appendix of any statistics text for critical values
of the chi-square distribution or to the Chi-Square distribution table from Camosun
College, Victoria, BC, Canada website. This table may be found in the content materials
folder.
The cells used to calculate the chi-square statistic is found below.
The final value for the calculated chi-square statistic is shown below:
Compare the expected frequencies with the observed frequencies. What can you deduce
from this information? Do you believe that the two variables will be independent of each
other? Why or why not?
Compare the critical value to the calculated chi-square value. If the chi-square value is
larger than the critical value, the data describes a statistically significant relationship
between the variables chosen in your table. In comparing the critical value of 14.860
with 4 degrees of freedom at a p-value of .005 and calculated chi-square statistic,
4.318699213 in our example, there is reason to believe that there is no association
between age group and gender with respect to mortality counts due to HIV in California
in 2001.
Download