Practical 5: Tables and graphs

advertisement
Module 1 Session 5/6
Practical 5: Tables and graphs
(Using a larger data set)
Work in pairs
1. Type of power used for lighting
The data are from a large agricultural survey in Tanzania. The data are for one region, Tanga. In
Tanga the survey was of 3223 households.

Open the workbook for the Tanzania agriculture survey.

Copy the whole sheet called R00 into a new sheet. It is called R00temp in the figure
above.
Your first problem was to find what type of power was used for lighting in households.
This was for a group who were interested generally, but particularly in the potential for solar
energy. They also wished to know whether districts in the region were similar in their use of
sources of power.

Produce a simple one-way table of percentages for the column Q3431. Display the
percentages without decimals.

Sort the energy uses into descending order, and merge the categories (Firewood,
Candles, Solar and Biogas) that had very small frequencies.
The resulting table should look roughly as follows:
Districts Training Programme
Module 1 Session 5/6 – Page 1
Module 1 Session 5/6

Add the district information and re-orientate the table as shown in the figure below.

Explain why these are the wrong percentages for the problem posed.

Produce the row percentages instead, as shown below
You now decide the differences between the districts do not warrant the complexity of the 2-way
table.
Districts Training Programme
Module 1 Session 5/6 – Page 2
Module 1 Session 5/6

Produce a simple graph for a report that shows the same information as in the 1-way
table above. An example pivot chart is below:

Write a brief report of your findings, using the information in the chart, and also
mentioning the information from the districts table where they are important.
For example in Tanga district stands out with 7% of the households using electricity, compared
to an overall of 2%.
2. Size of land holdings
Column Q0123 gives the sex of the head of the household, and question Q041C2_H gives the
size of the holding. It is given in categories in a column called CLASS!.
It is thought that female headed households may have smaller land holdings. If so, is this may
be in all the districts, or just in a few of them?

Do NOT produce the tables and graphs at this stage. Instead, outline the layout of
the table(s), that would permit an investigation of this problem. If you use the actual
Districts Training Programme
Module 1 Session 5/6 – Page 3
Module 1 Session 5/6
size of the holdings, then would you use the average, or some other summary? If so,
then which?

Might boxplots be useful? If so, then outline which boxplots you would use.
Layout of the table(s): (Do a sketch, or say what are the row variables and column
variables.)
(Also state the content. What exactly would you like to tabulate?)
If you would use boxplots, then they would be for the field size. Would they be by district,
or sex of household, or both. Does the software easily allow boxplots by 2 factors?

Produce tables to investigate if this is the case? What are your conclusions?
3. Indigenous chickens
Many households keep indigenous chickens and the number is recorded in column Q230C2ind.
Suppose the extension service plans to improve their support of such households. To help in
their planning they ask for information from the survey that includes the following:
a. What proportion of households keeps indigenous chickens?
b. When they do, how many do they keep?
c. Are these values roughly the same, or do they differ by district?
d. Do they differ by other factors, in particular the sex of the household head or the
type of agriculture household (Q021).
Districts Training Programme
Module 1 Session 5/6 – Page 4
Module 1 Session 5/6

Start your analysis by looking at the data column: As examples the figures below give
some summary statistics (using SSC-Stat => Analysis => Descriptive statistics) and a
histogram following Excel’s pivot table. (You may have trouble getting the histogram.
If so, then leave it till later.)

The data are clearly very skew. What does this indicate for your use of summaries
like the mean and standard deviation?

And does the summary above indicate any other issue that you should investigate?
The additional issue you should have noted is the statement above that there were 174 blank
values. Now blanks are missing values. Before proceeding we should check whether these values
are really missing.

Scroll down the data to see some of these values. An example is below:
Districts Training Programme
Module 1 Session 5/6 – Page 5
Module 1 Session 5/6
There were just 4 questions in the survey (in this set) that concerned the chickens. It looks likely
that some of those families that had no chickens were not asked those questions.
We do need to check more, and perhaps ask the staff more concerned with the survey. For now,
as all values were like this, we will assume that the missing values were really zeros.

So, add a column after the one you were analysing. Give it a name and just make it
equal to the value before, see below: (note, because of the blank values in the
neighbouring columns, you cannot double click. Drag it down to row 3224 instead.
When you do this, Excel automatically replaces the blanks by zeros – which is what we want
here.

Now produce the summary statistics again. If you were not able to do the histogram
earlier, it is now easier, without the blanks.
Districts Training Programme
Module 1 Session 5/6 – Page 6
Module 1 Session 5/6

Then plan your analysis of the data to answer the questions posed. You may wish to
produce an additional column with the flock size in categories, e.g. 1-4, 5-9, 10-19,
20-50, >50. You can produce tables or graphs.
To start you in your discussion we did a preliminary analysis getting descriptive statistics,
including proportions of households with different numbers of flocks. Then we quickly did a set
of boxplots to explore possible district and sex of household effects.
We found:
Two thirds of the families kept indigenous chickens.
Of those that kept chickens 15% had a flock size of 1 or 2, and 30% had 4 or
less. About 20% of chicken-keeping households had a flock size of 20 or more
and 3% had over 50 chickens.
The boxplots, see below by sex of households, and district does not indicate
evidence for a difference by sex of household, but there do seem to be
differences between districts.

What is your plan:
Districts Training Programme
Module 1 Session 5/6 – Page 7
Module 1 Session 5/6
Now do the analysis. What do you conclude?
Districts Training Programme
Module 1 Session 5/6 – Page 8
Download