Module 1 Session 5/6 Practical 5: Tables and graphs (Using a larger data set) Work in pairs 1. Type of power used for lighting The data are from a large agricultural survey in Tanzania. The data are for one region, Tanga. In Tanga the survey was of 3223 households. Open the workbook for the Tanzania agriculture survey. Copy the whole sheet called R00 into a new sheet. It is called R00temp in the figure above. Your first problem was to find what type of power was used for lighting in households. This was for a group who were interested generally, but particularly in the potential for solar energy. They also wished to know whether districts in the region were similar in their use of sources of power. Produce a simple one-way table of percentages for the column Q3431. Display the percentages without decimals. Sort the energy uses into descending order, and merge the categories (Firewood, Candles, Solar and Biogas) that had very small frequencies. The resulting table should look roughly as follows: Districts Training Programme Module 1 Session 5/6 – Page 1 Module 1 Session 5/6 Add the district information and re-orientate the table as shown in the figure below. Explain why these are the wrong percentages for the problem posed. Produce the row percentages instead, as shown below You now decide the differences between the districts do not warrant the complexity of the 2-way table. Districts Training Programme Module 1 Session 5/6 – Page 2 Module 1 Session 5/6 Produce a simple graph for a report that shows the same information as in the 1-way table above. An example pivot chart is below: Write a brief report of your findings, using the information in the chart, and also mentioning the information from the districts table where they are important. For example in Tanga district stands out with 7% of the households using electricity, compared to an overall of 2%. 2. Size of land holdings Column Q0123 gives the sex of the head of the household, and question Q041C2_H gives the size of the holding. It is given in categories in a column called CLASS!. It is thought that female headed households may have smaller land holdings. If so, is this may be in all the districts, or just in a few of them? Do NOT produce the tables and graphs at this stage. Instead, outline the layout of the table(s), that would permit an investigation of this problem. If you use the actual Districts Training Programme Module 1 Session 5/6 – Page 3 Module 1 Session 5/6 size of the holdings, then would you use the average, or some other summary? If so, then which? Might boxplots be useful? If so, then outline which boxplots you would use. Layout of the table(s): (Do a sketch, or say what are the row variables and column variables.) (Also state the content. What exactly would you like to tabulate?) If you would use boxplots, then they would be for the field size. Would they be by district, or sex of household, or both. Does the software easily allow boxplots by 2 factors? Produce tables to investigate if this is the case? What are your conclusions? 3. Indigenous chickens Many households keep indigenous chickens and the number is recorded in column Q230C2ind. Suppose the extension service plans to improve their support of such households. To help in their planning they ask for information from the survey that includes the following: a. What proportion of households keeps indigenous chickens? b. When they do, how many do they keep? c. Are these values roughly the same, or do they differ by district? d. Do they differ by other factors, in particular the sex of the household head or the type of agriculture household (Q021). Districts Training Programme Module 1 Session 5/6 – Page 4 Module 1 Session 5/6 Start your analysis by looking at the data column: As examples the figures below give some summary statistics (using SSC-Stat => Analysis => Descriptive statistics) and a histogram following Excel’s pivot table. (You may have trouble getting the histogram. If so, then leave it till later.) The data are clearly very skew. What does this indicate for your use of summaries like the mean and standard deviation? And does the summary above indicate any other issue that you should investigate? The additional issue you should have noted is the statement above that there were 174 blank values. Now blanks are missing values. Before proceeding we should check whether these values are really missing. Scroll down the data to see some of these values. An example is below: Districts Training Programme Module 1 Session 5/6 – Page 5 Module 1 Session 5/6 There were just 4 questions in the survey (in this set) that concerned the chickens. It looks likely that some of those families that had no chickens were not asked those questions. We do need to check more, and perhaps ask the staff more concerned with the survey. For now, as all values were like this, we will assume that the missing values were really zeros. So, add a column after the one you were analysing. Give it a name and just make it equal to the value before, see below: (note, because of the blank values in the neighbouring columns, you cannot double click. Drag it down to row 3224 instead. When you do this, Excel automatically replaces the blanks by zeros – which is what we want here. Now produce the summary statistics again. If you were not able to do the histogram earlier, it is now easier, without the blanks. Districts Training Programme Module 1 Session 5/6 – Page 6 Module 1 Session 5/6 Then plan your analysis of the data to answer the questions posed. You may wish to produce an additional column with the flock size in categories, e.g. 1-4, 5-9, 10-19, 20-50, >50. You can produce tables or graphs. To start you in your discussion we did a preliminary analysis getting descriptive statistics, including proportions of households with different numbers of flocks. Then we quickly did a set of boxplots to explore possible district and sex of household effects. We found: Two thirds of the families kept indigenous chickens. Of those that kept chickens 15% had a flock size of 1 or 2, and 30% had 4 or less. About 20% of chicken-keeping households had a flock size of 20 or more and 3% had over 50 chickens. The boxplots, see below by sex of households, and district does not indicate evidence for a difference by sex of household, but there do seem to be differences between districts. What is your plan: Districts Training Programme Module 1 Session 5/6 – Page 7 Module 1 Session 5/6 Now do the analysis. What do you conclude? Districts Training Programme Module 1 Session 5/6 – Page 8