DR320 COMPUTER PRACTICAL 1

advertisement
LIS570 Rese arc h Met hods
Hazel Tay lor
LIS570 QUANTITATIVE ANALYSIS PRACTICAL 4 HYPOTHESIS TESTING FOR NOMINAL DATA
STUDENT NAME: ……………………………………………..
The purpose of this exercise is two-fold:
1. To apply what we have learned in class about testing hypotheses about an underlying
population from nominal sample data.
2. To use a spreadsheet package (MS Excel) to perform simple non-parametric inferential
statistical tests.
You will be working through a research case exercise, using a raw data set extracted from a
graduate survey at a New Zealand community college.
Read the instructions in each section carefully, to avoid errors that will, in the end, slow you
down or prevent you from completing the assignment successfully. Please note that full step-bystep instructions are NOT given. Instead, at each point you are given some guidelines about
where to look for the commands you need, and then you are expected to try and work out the
detail of what’s required by yourself – use the Help functions when you are not sure! Having said
that, please do NOT spend hours struggling if you get stuck, and can’t find a solution through
the Help functions. Instead, put the exercise aside, come back to it the next day and if you still
can’t see any way through, send me an email!
NOTE: This exercise involves the use of MSExcel pivot tables. Review Practical 2 for basic
instructions on using pivot tables.
You can submit this exercise via eSubmit, if you wish. Fill in the answers to the questions on this
document, and submit the document and your Excel spreadsheet.
Your document should contain answers to: RC4.1a, RC4.1b, RC4.1c, RC4.2a, RC4.2b, RC4.3a,
RC4.3b, RC4.3c, RC4.4a, RC4.4b.
Your spreadsheet should show results for: RC4.2, RC4.3, RC4.4
Page 1 of 6
LIS570 Rese arc h Met hods
Hazel Tay lor
RESEARCH CASE 4: GENDER IMBALANCE IN COMPUTING
GRADUATES
For this exercise, you are provided with a data set, gndrqual.xls, which has been extracted
from a graduate survey of students graduating from a New Zealand community college. The data
set contains a random sample of responses from 1998 computing graduates, recording the
gender and qualification of each graduate.
Load the gndrqual.xls file into MS Excel, and examine the data set. You should see a
spreadsheet with 3 columns: Response; Qual; Gender. Note the coding key is shown on a
separate worksheet. The community college offered many certificate and diploma qualifications
in computing, in addition to a bachelor’s degree.
As usual, protect your raw data sheet and create a new work-sheet for your analyses.
RC4.1 Hypotheses
We have a couple of alternative hypotheses, listed below, that we wish to test. What is the
null hypothesis for each alternative hypothesis?
Alternate hypothesis 1: From this sample we can conclude that there is an imbalance in
gender in the underlying population of 1998 computing graduates from this community college.
RC4.1a Answer - Null hypothesis 1:
Alternate hypothesis 2: From this sample we can conclude that there is a relationship
between gender and qualification in the underlying population of 1998 computing graduates
from this community college.
RC4.1b Answer - Null hypothesis 2:
RC4.1c Answer – p value:
What p value will you choose for this analysis?
Page 2 of 6
LIS570 Rese arc h Met hods
Hazel Tay lor
RC4.2 Hypothesis 1
Take another look at the first hypothesis. What variable(s) are we considering in this
hypothesis? What type of variable? What sort of test can we use in this situation?
RC4.2a Answer:
The first step is to establish a count of the numbers of males and numbers of females in the
sample. You can do this using the function COUNTIF, or you can prepare a pivot table (see
Practical 2 for instructions on pivot tables).

Prepare a table of the count of responses by gender.
Your results should show a difference between the numbers of male and female graduates.
We want to find out whether this difference in the sample is enough to infer that there is a
difference in the whole (1998) population of computing graduates. In order to do this we
compare our observed sample results with the results we would expect, if in fact there were no
difference in the whole population.

Add another column to your table to show the expected results. Make sure that you double
check to ensure there are no calculation errors.
We now need to add a column to calculate the values for the one-way chi-square statistic.
You can find this formula in the lecture notes on inferential statistics.

Add another column to hold the chi-square calculation. Enter the calculation in each cell in
the chi-square column, and total this column to find the value of the chi square statistic.
On its own, this is no help because we don't know the probability of getting this value of chi
square for a sample if the null hypothesis is in fact true. MS Excel has a function, CHIDIST,
which enables you to calculate the probability. However, you first need to work out the degrees
of freedom.

Enter the degrees of freedom in a cell on your spreadsheet, and type an informative label
next to this cell, in order to identify its content.

Now use the CHIDIST function to calculate the probability of getting your value of chi
square with the degrees of freedom you have. Make sure you label this cell too.
Compare your calculated probability with the p-value you set at the beginning. Can we
conclude that there is a statistically significant difference in gender in the underlying population
of 1998 computing graduates from this community college? Why or why not?
RC4.2b Answer:
Page 3 of 6
LIS570 Rese arc h Met hods
Hazel Tay lor
RC4.3 Hypothesis 2
Our second hypothesis requires a different sort of test - what is it? (HINT: how many
variables are involved?)
RC4.3a Answer:
First we need to create a contingency or frequency table of qualifications by gender.

Use a Pivot Table to do this.
Take a look at the resulting table. Remember that chi-square requires an expected value of at
least 5 in every cell. Will you meet this condition?
RC4.3b Answer:
One option to deal with this is to collapse the categories. Of course, that can only be done
with suitable categories. For our example, we could choose to compare just three categories:
Bachelor’s degree; certificate and diploma in computing education (both teaching related); others
(all vocational certificates and diplomas).

In order to do this, make sure you have the Pivot Table tool-bar displayed.

Hold down the control key and click on the categories in the Pivot table that you want to
select for the first group.

Select the Pivot Table drop-down menu from the Pivot Table tool-bar, and select Group
and Show Detail/Group to group these categories.

Click on the new Group 1 cell in the table, and click on the Minus sign on the Pivot Table
toolbar to hide details.

Finally, rename the group with a more meaningful name.

Repeat for the other groups.
Your Pivot Table should now show only 3 categories of qualification, and the totals for these
categories should be summarised.
Page 4 of 6
LIS570 Rese arc h Met hods
Hazel Tay lor
We now need to create the expected values for this table. You can find this formula in the
lecture notes on inferential statistics.

Create the expected values alongside your table. Double check for errors.
We are now ready to do the appropriate chi-square calculation. MS Excel has a function
called CHITEST that will return the probability for the two-way chi square value related to the
observed and expected results you have entered. For this function, you don’t need to enter the
degrees of freedom because Excel will work it out from your arrays of observed and expected
values.

Now use the CHITEST function to calculate the probability of getting your value of chi
square with the degrees of freedom that you have.

If necessary, format the CHITEST cell as number, with at least 6 decimal places

Make sure you label this cell too.
Compare your calculated probability with the p-value you set at the beginning. Can we
conclude that there is a statistically significant relationship between gender and qualification in
the whole population of 1998 computing graduates from this community college? Why or why
not?
RC4.3c Answer:
RC4.4 Further Analysis of Hypothesis 2
Take another look at the pivot table you create in section RC4.3. Some rows show more
males than females, while one row should show more females than males. At this stage, we may
wonder whether any relationship between gender and qualification in computing graduates is due
to the computing education graduates, who seem to be predominantly female, even though our
whole sample has more males than females. Can we conclude that there is a relationship between
gender and qualification for the remaining qualifications?
Page 5 of 6
LIS570 Rese arc h Met hods
Hazel Tay lor
In order to investigate this, we first need to sub-set our data, and select just the rows that
don't contain computing education graduates.

First of all select your grouped Pivot Table and copy it to a new location (so that you don't
do anything to affect the calculations you've already done).

Click your cursor in the computing education qualification cell in the new Pivot Table, right
click, and select Hide. Your Pivot Table should now display just the categories for the
bachelor’s degree and vocational qualifications.

Do a chi-square analysis on the new pivot table, and find the corresponding CHITEST
value.
What is your conclusion from this test?
RC4.4a Answer:
A final point relates to the conclusions you can draw from your results. Can you conclude
that there is a relationship between gender and the computing education qualifications for all
computing graduates from this community college? Explain why or why not.
RC4.4b Answer:
Congratulations! You have completed the final practical for this course.
You can submit this exercise via eSubmit, if you wish. Fill in the answers to the questions on this
document, and submit the document and your Excel spreadsheet.
Your document should contain answers to: RC4.1a, RC4.1b, RC4.1c, RC4.2a, RC4.2b, RC4.3a,
RC4.3b, RC4.3c, RC4.4a, RC4.4b.
Your spreadsheet should show results for: RC4.2, RC4.3, RC4.4
Page 6 of 6
Download