National FSA Training Module 14: Statistical analysis
The objective of the module is to enable trainees to understand:
Different approaches to statistical analysis of on-farm experiments
Environmental index and its interpretation to flexible recommendations
Simple calculation modules for different experimental designs.
14.1
Background
14.2
Basic statistical analysis
14.3
Experimentation in diverse environments
14.4
Approaches to statistical analysis of on-farm experiments
14.5
An example: maize variety testing in Karagwe District
Diverse environment; environment technology interaction; stability analysis; flexible recommendation; Environmental index; ANOVA,
1
National FSA Training Module 14: Statistical analysis
In agricultural research, variations in yield and other agricultural products have been observed since long and researchers have been working on problems associated with this variation since the early 1900’s (Vieira et al ., 1982). In the beginning of experimental research on agricultural stations, of which the first was founded in France in 1834 followed by the foundation of the
Rothamsted Experiment Station in the United Kingdom in 1843 (Salmon and Hanson, 1964) undesired yield variations were not yet of any concern. Experimental plots were laid out systematically in the field and treatment means were compared but not tested. Often, one of the treatments resulted in such better results that deviations of the mean were of minor importance, making its superiority obvious. As a consequence, the selected treatment could yield an unexpected result when repeated or applied under different environmental conditions. One of the first researchers paying explicit attention to this problem of undesired variation in field experiments, L.H. Smith (1909), stated in reference to his variety experiments with corn that,
“The topic proposed embraces some vital questions, and it is one which should certainly be of great interest and importance to every agronomist who has to do with field experiments. The elimination or control of the variable factors, so essential in experimentation of any sort, becomes unusually difficult in field experiments where we have to deal with so many uncontrollable conditions” . The perception of the importance of these “uncontrollable conditions” was rather new. In this period, agronomists openly expressed their concern about the perturbations in their experiments, causing numerous difficulties of interpretation. In 1913, after six years of experimentation with winter wheat, a great variation was found in the check rows, even when conditions appeared quite uniform (Montgomery, 1913). Besides this observation, Montgomery also underlined the problem of how to handle factors causing variations: “ All things being equal, the yield of the 47 plots should have been the same. But all factors can never be equal, so in row-breeding work, owing to unequal environment, we must expect a wide degree of error. The only practical way so far suggested to overcome this error is to repeat the plots, according to some systematic method, enough times to equalise variations in soil or climatic effects” (Montgomery, 1913). Methods and techniques to deal with undesired variation, inherent to field experimentation, were lacking and the single known solution was repetition. The existence of factors causing variations in yield were known, but up to the early
1920's only few writers on field experimentation sufficiently recognised, and none adequately emphasised their importance (Harris, 1920).
The statistical solution
In mathematics, similar problems of probability, population distributions and testing errors of observation, especially in the fields of astronomy and evolutionary biology were object of debate (Fisher Box, 1978). The journal Biometrika played an important role in this debate. In his article “
The Probable Error of a Mean”,
Student aimed to determine the significance of the means of a series of experiments, especially when large repetitions were not feasible, like in agricultural experiments (Student, 1908). He underlined the importance of randomisation and introduced the assumption of normality for small sample sizes, which he based on empirical analysis of random sampling in a large population of finger measurements of 3000 criminals.
Maybe Students’ major contribution to the development of agricultural research was the introduction of the z-distribution (z being a function of the mean and the standard deviation). It was Fisher who provided the actual mathematical proof, leading to acceptance of the zdistribution (later called t-distribution) and the normality assumption. With its tabulation and diffusion, the testing of the significance of means in field experimentation became possible. It still took some time before the use of tables, distributions and testing of significance was generally adopted in agricultural research. Fisher’s “ Statistical Methods for Research Workers ” may be regarded as the major breakthrough that lead to the general acceptance of testing in experimentation (Fisher, 1925). Fisher called this method “ analysis of variance-ANOVA ” and
Snedecor simplified and sophisticated the distribution, calling it the F-distribution, in honour of
Fisher (Fisher Box, 1978). Now analysis of variance and covariance were ready for distribution and use in experimental research.
2
National FSA Training Module 14: Statistical analysis
The development of statistical methods provided agricultural researchers with immediate solutions to their problems of “ uncontrollable conditions ” and undesired variation. In the
1930’s, testing based on distribution tables was widely adopted, giving an important stimulant to agricultural research. The undesired variation could be eliminated from the results by following simple techniques. However, the uncritical way in which analysis of variance was applied in experimental fieldwork was questioned and concern about it was expressed. “ Many observers, in such a case, consult a manual offering them numerous prescriptions for calculation, often without a proof. They try to find out which case has the greatest affinity to their own, and then, fully relying on the error-doctor, meekly follow the given recipe . For such blindly submissive natures this book has not been written”
(van Uven, 1935). Sceptics like van
Uven could not prevent it to become commonly adopted and applied in agricultural science, standardising procedures on the treatment of any deviation from the mean. For the moment, variation was no longer of concern to agricultural scientists and the debate on its treatment was closed.
On-farm experimentation and the need for unconventional statistics
Until the late seventies, most agricultural research was carried out on-station, where conditions can be controlled and single treatments applied. With the adoption of FSA as an approach to improve the impact of agricultural research, on-farm experimentation gained in popularity.
Testing new innovations under farmer’s conditions and with farmer participation proved to result in better adoption. However, one major constraint emerged: How do you analyse data from experiments in divers environments? Unlike on-station experiments numerous factors, affecting the experiment cannot be controlled. When analysis of variance is applied, it results in a huge error and treatment effects hardly ever reach significance levels. Many scientists do not know how to deal with data from on-farm experiments and have become frustrated by the lack of significance. Their responses to a problem of diverse variations of on-farm trials and data from experiments are multiple, such as:
Increasing the number of repetitions (and thus increase the costs);
Repeating the trial over several years (and thus delay the dissemination of results);
Controlling the farmer’s conditions (and thus create an artificial environment);
Applying complicated ANOVA techniques (and thus estrange themselves from common scientists without advanced training in statistics);
Seeking advice from a biometrician (and thus loose control and responsibility);
Cooking-up the results anticipated (and thus against the ethics of science)
One conclusion which was drawn from the experience with on-farm trials was that new statistical techniques were needed to analyse data derived from this type of experimentation.
Without specific techniques and procedures for on-farm trials, frustration by scientists and lack of results would eventually lead to abortion of this type of research.
Brainstorming exercise: Discuss with your fellow scientists how you were trained to analyse
(on-farm) data and how you used to deal with variation in your work as agricultural scientist.
3
National FSA Training Module 14: Statistical analysis
Generally, data obtained from experiments or surveys consists of raw, unorganised sets of numbers. For these numbers to be readable, they must be summarised and analysed in such a way that pertinent information can be extracted and interpreted. Statistical analysis is the basis for doing this. Statistical analysis enables us to:
Separate the effect of different treatments
Determine the magnitude of significance difference
Provide information for subsequent statistical analysis
The form of statistical analysis normally depends on the objective of the study, type or design of experiment (or survey) and the design used. Basic aspects that will be discussed here are:
1.
Data scrutiny
2.
Methods to summarise and present data
3.
Calculation of descriptive measures
4.
Overview of data from different experimental designs
This section assumes that trainees have gone through a basic statistics course in their formal training. Thus for further detail, readers are referred to standard text and appropriate literature. It should be noted that the use of computers and available software programmes has made statistical analysis simple. Thus the computer performs most of the tedious calculations.
However, it is important for trainees to understand the process and interpret the results.
A
B
C
D
1. Data scrutiny
This is always the first step in processing data. It should be done as early as possible while the memory of the situation in the field and the data collection procedures is good. Inconsistencies in collected data can have a variety of causes such as:
Incorrect measurement of data
Incorrect data transcription
Errors in trial implementation
Farm conditions
Treatment behaviour (may differ from researcher's expectations)
Table 14.1 Yield of sunflower varieties
Variety Block
I
614
642
59
680
II
63
602
657
592
III
504
55
673
600
IV
628
616
645
583
Consider the following hypothetical example indicated in Table 14.1. Scrutiny of data shows that:
There are quite low yields for varieties C in Block 1 (i.e. 59), Variety A in Block 2 (i.e.
63) and variety B in block 3 (i.e. 55).
If these data are analysed as they are now, they will have a very high CV.
Scrutiny in the field record book showed that there was an error in transforming the data. The correct data were found to be 590, 630 and 550 respectively.
Thus it is very important to scrutinise the data before attempting any analysis.
4
National FSA Training Module 14: Statistical analysis
2. Methods to summarise and present data
Useful methods for presenting data are:
Frequency distribution
Cummulative frequency distribution
Histogram
Ogave
Bar charts
Pie diagrams
Consider the numbers below to illustrate the following techniques.
22 26 42 39 17 46 35 31 34 22 34 20 11 33
20 29 27 30 33 19 39 41 19 18 14 57 28 24
33 48 9 41 27 39 21 12 24 53 36 51 42 49
44 32 25 39 43 37 32 40 36 35 30 30 21
6-10
11-15
16-20
21-25
26-30
31-35
36-40
41-45
46-50
51-55
56-60
(i) Frequency distribution
This is a tabular presentation of data showing number of occurrence (frequency) of a given class interval. The procedure of construction is as follows:
Observe the final range (maximum and minimum values)
Separate range into suitable intervals
Specify the class limit (higher and lower levels of a class)
Select the class boundaries
Specify the class mark (mid-point of the class)
Count the number of observations of each class
Calculate relative frequency (class frequency/total number of observations)
Table 14.2 Frequency distribution of weight gain of 55 pigs
Class limits Class boundaries Class mark Class frequency Relative frequency
5.5-10.5
10.5-15.5
15.5-20.5
20.5-25.5
25.5-30.5
30.5-35.5
35.5-40.5
40.5-45.5
45.5-50.5
50.5-55.5
55.5-60.5
28
33
38
43
8
13
18
23
48
53
58
9
10
8
6
1
3
5
7
3
2
1
(ii) Cumulative frequency
Represents the number of observations whose value is less than a given value.
Example
0.0182
0.0545
0.0909
0.1273
0.1636
0.1818
0.1454
0.1091
0.0545
0.3640
0.0182
Weight
Frequency
<6 <11 <16 <21 <26 <31 <36 <41 <46 <51 <56 <61
0 1 4 9 16 25 35 43 49 52 54 55
5
National FSA Training Module 14: Statistical analysis
(iii) Histogram
Is a graphic presentation of a frequency distribution and is constructed by erecting rectangles on the class intervals.
12
10
8
6
4
2
0
A B C D E F
Class interval
G H I J
(iv) Ogave
Is a graphic presentation of a cumulative frequency distribution.
60
50
40
30
20
10
0
A B C D E F G H I J K
(v) Bar charts
Is a graphical presentation showing the proportion of each constituent contributing to the total amount of the item being considered.
6
National FSA Training
(vi) Pie diagram
Module 14: Statistical analysis
11%
16%
30%
43%
Is a circular presentation of statistical data showing the relative size of the component parts of the total.
3.
Calculation of descriptive measures
Calculation of descriptive measures is the second step in transforming raw data into suitable information. These procedures include: a.
Measures of central tendency (mean, median, standard deviation) b.
Measures of dispersion (range, variance, standard deviation)
It should be noted that with the available computers and software programmes, calculation of these statistics is no longer a problem. Thus what is more important to the trainees is to know how to apply these measures. a. Measures of central tendency:
Mean (simple arithmetic mean) measures the central tendency. Is the sum of observation values divided by the number of values or observations.
Xi /n
Weighted mean. This is important when calculating the mean of numbers with different importance. For example when there are different numbers of observations in a class.
/
Table 14.3 Maize yield in different agro-ecological zones
III
3800
15
Agro-Ecological Zone
Average maize yield (kg/ha)
Average land size (ha)
I
4500
4
II
4000
12
Weighted Mean = (4 x 4500) + (12 x 4000) + (15 x 3800)
4 + 12 + 15
= 123000
31
= 3968 kg/ha
7
National FSA Training Module 14: Statistical analysis
The un-weighted mean would be:
X = 4500 + 4000 + 3800
3
= 4100 kg/ha
This shows that the un-weighted mean gives a biased estimate of the maize yield.
Median is the middle value (for an odd number of observation) or the arithmetic mean of the two middle values (for an even number of observations). The median position divides a histogram into two equal parts. When data are not skewed the median may be more characteristic and provide a better description.
The mode is the value that occurs with the greatest frequency. b. Measures of dispersion (S)
These are statistical measures that contribute to the description of the precision of the measurements (observation values). They include:
Range is the difference between the largest and smallest values. It is the least suitable measure of variation.
Variance (S 2 ) is the measure of deviation from the mean
2
(
)
2
/
Standard deviation (S) is the square root of the variance
tan
2
Standard deviation of the mean is sometimes referred to as standard error.
Coefficient of variation (CVs). This is the magnitude of experimental error expressed as the percentage of the mean:
CV = Standard Deviation x 100%
Mean
In a well laid down on-station trial, a CV should not exceed 10-15%. However, in on-farm trials, the CV can go up to 30%. In these situations, field notes explaining the cause of variation are important to support the result interpretation.
Setting the hypothesis and the levels of significance
The hypothesis is a statement that describes the validity of values of one or more parameters in a population. There are two types of Hypothesis:
H o
H a
8
National FSA Training Module 14: Statistical analysis
Commonly used levels of significance are 1% and 5%. However, in on farm trials, significance levels of up to 10% are acceptable. When interpreting the data it is important to consider the following:
Distinction between the technical significance and statistical significance
Conclusions should be of practical importance and should be used in the interpretation of data.
Other statistical analysis methods
Analysis of data from Two Sampled and Paired Sample techniques
Analysis of Single Factor Experiments in Block Designs (ANOVA and Mean
Comparisons)
Completely Randomised Design (CRD)
Randomised Complete Block Design
Latin Square Design
Analysis of Factorial Experiments
Regression analysis
Correlation analysis
Co-Variance Analysis
Combined Analyses
Stability Analysis
Detailed analysis procedures for these methods can be found in any standard statistical analysis reference book or manual.
9
National FSA Training Module 14: Statistical analysis
During the Eighties and Nineties agricultural scientists became aware that variation in on-farm trials is not due to random factors, but results from systematic and sometimes deliberate interaction of the treatment and the environment (De Steenhuijsen Piters, 1995). In other sciences, such as ecology, pedology and sociology, comparable awareness emerged and concepts, such as biodiversity, kriging, actor-in-context and gender analysis, were adopted. In agricultural sciences many efforts were done to understand the causes of variation. Some scientists concentrated their efforts on the analysis of the divers environment in which agricultural production takes place (Harlan, 1975; Zimmerer, 1991; McBratney, 1992; Paoletti
& Pimentel, 1992; de Steenhuijsen Piters, 1995). Others contributed to the development of new statistical techniques. The most prominent author is beyond any doubt Hildebrand, once a founding father of FSA and recently the inventor of adaptability analysis. One thing these authors have in common is that they consider variation as a source of information.
Understanding the production environment
Causes of variation can be classified as natural deterministic, deliberate and random. The first category reflects the given situation at a certain moment, the second reflects human response or action and the third reflects casual, non-systematic events (de Steenhuijsen Piters, 1995).
Natural deterministic causes include the biotic, a biotic and climatic environment of production
(e.g. soils, weeds, rainfall etc.). The deliberate causes include crop genotype and management
(practices, techniques and inputs) by a farmer. Random cause include non-systematic causes of damage to the crop or livestock (e.g. elephants walk through a field, a thief shoots a cow).
Exercise: Farmers in Mbulu District have a problem with drought in maize: frequently, their maize does not attain maturity because of erratic rainfall. You propose to test three new varieties under farmers conditions. Five farmers volunteer and at the onset of the rains, they sow the three new varieties and one local variety. You standardise the fertiliser application at a rate and timing which is common practise by farmers.
Question: which major sources of variation do you expect in this trial? Which are deterministic, deliberate or random?
Trial design
Conventional trials are designed to exclude as many causes of variation as possible. However, when the objective is to test a potential innovation under real conditions, the trial must be designed to include as many relevant causes of variation. If we would exclude systematic causes of variation from our trial, how do we understand the response of our innovation when confronted with them? However, one should not include more variation than necessary.
Random variation is not of interest because it is not predictable. Well-drained sandy soils are not interesting when irrigated rice is concerned. So, include only those production environments that are relevant for a specific crop or type of livestock. A short survey and interviews with farmers are helpful tools to identify these relevant production environments, so different types of farmers must be included in your trial.
Example: In Bukoba District farmers grow maize in homegardens as well as in annual crop fields.
Results from a short survey and interviews with farmers indicate that homegardens can be categorised into 2 types according to their soil fertility ('poor' and 'fertile'). Annual crop fields can be categorised into three types ('exhausted', ,after fallow, and 'new'). Farmers explain that resource-poor households have 'poor' homegardens and mainly grow their maize on fields which are fallowed.
Resource-rich farmers grow maize on new annual crop fields. Female farmers grow maize in any type of homegarden or, if not allowed by their husbands, on annual crop fields. These results indicate that there is a systematic relationship between production environment and type of farmer. Six major production environments can be identified and a maize trial should include all of them. Because there is some interaction with the sex of the producer, it is advisable to include both male and female farmers. The proposed trial will therefore include 12 trial sites.
10
National FSA Training Module 14: Statistical analysis
In case you plan to apply adaptability analysis (see 14.5), no repetitions are needed within a site
(farmer), instead the number of farmers in different environments can be increased. If you include all relevant production environments, you do not even have to repeat your trial
(Hildebrand and Russell, 1994). It is advisable not to make your plots too small to prevent losses due to random damage.
Recording observations
In order to explain variation in a trial, you must record the most prominent causes. This implies that conducting a trial includes more than making observations on the treatment; you should also observe, record and quantify non-treatment factors that influence your treatment. Some observations only have to be made once (e.g. soil type, slope of the field etc.), while others need constant recording (e.g. crop growth, disease incidence). That's why on-farm trials need frequent visits. However, many observations can be made and even recorded by the farmer. This necessitates good instruction and sometimes some sort of training. Once capable of recording events and phenomena taking place in the trial fields, you cannot wish yourself a better field officer than a farmer. After all who else will look at the site 10 times a day? To facilitate farmer's observations treatments should always be labelled and scientific names of varieties should be avoided (e.g. 'Bahati' in stead of 'FHIA 3'). All recordings need standardised recording forms which should be entered into a spread-sheet as soon as possible. Errors in data files often occur because of negligence and late processing. Make sure that your 'hard-copies'
(recording forms) are legible by- and accessible to fellow scientists. Keep a log-book with the meaning of codes and abbreviated variable names. Check, check and check again data entry by somebody who was not involved in the execution of the trial (e.g. your secretary). Keep copies of your data files on hard- and floppy disk.
Farmer assessment
Involving farmers in the evaluation of a trial provides a lot of relevant information. It not only increases your knowledge about the treatment effects, but it also predicts future adoption of the tested innovation by farmers. For assessment of trials in divers environments keep in mind to (1) assess the performance of the tested innovation/s in multiple production environments (if relevant) and (2) include all stakeholders in the assessment. It may be necessary to group stakeholders according to their gender, interest and socio-economic position. Results of the assessment may thus indicate group-specific preferences, which are extremely important for future adoption of the tested innovations.
Example: A bean variety trial was performed in a village with two major production environments.
Include in the trial were small and large size beans, as well as white beans. During the farmer assessment resource-poor and resource-rich farmers, female farmers and traders were invited and grouped accordingly. Results revealed that resource-poor farmers preferred another bean variety than resourcerich farmers, which was due to differences in production environment. Female farmers preferred whiteseeded beans because of their palatability and rejected one variety selected by male farmers. Trader selected a variety they consider suitable for export to other regions and urban centres. Results from the farmer assessment emphasized the need for a flexible recommendation of new bean varieties.
11
National FSA Training Module 14: Statistical analysis
In most cases analysis of variance ( ANOVA) is not the most appropriate technique to analyse the results from on-farm experiments. As Stroup et al (1991) stated that: " traditional methods simply cannot accommodate the complexity of on-farm trials" . Confronted with data from experiments in divers environments, a specific approach to statistically analyse them is needed.
The environment-technology interaction
In reality there is strong environment-technology interaction. Once agreed that our trials should include a wide range of conditions the best design and techniques of analysis have to be selected. Additional criteria for our approach include cost-effectiveness and rapid dissemination of results. Statistical techniques for on-farm trials must be analytic because we need to understand the interaction of our technology with its environment and we often have no testable hypothesis about this interaction. We need intensive farmer participation and environment analysis for reasons of continuous feedback from the field and increasing the speed of adoption.
No existing approach completely satisfies these conditions. What we need is a hybrid approach which selects useful elements from different techniques. The approach presented here is inspired by Hildebrand (1984), Stroup et al. (1991) and Hildebrand and Russell (1994).
Adaptability Analysis is a derivation of the Stability Analysis which was developed during the sixties and which was first published by Eberhard and Russell (1966).
Stability Analysis was applied by crop breeders to identify a variety producing a stable yield when grown in divers environments. Although this was a great improvement it still aimed at a blanket recommendation for all farmers. During the Eighties Hildebrand (1984) adjusted the technique and called it Modified Stability Analysis. He and Russell improved the technique again and now call it Adaptability Analysis: " The procedure, as we use it, is not related to stability but rather to adaptability of technologies to different environments and Socio-economic conditions " (Hildebrand and Russell, 1994).
Guidelines for statistical analysis of data from experiments in divers environments
The statistical procedure presented here was extensively tested and further improved in the Lake
Zone of Tanzania.
(1) Evaluate your data visually.
Print the data in the spread-sheet and just look at them. Do you observe any obvious discrepancies or recording mistakes? Are there interesting cases and sites with peculiar data? In contrast to conventional statistics: do not exclude 'out-liers' (cases with extreme values), but correct only for errors (refer 14.3a).
(2) Calculate simple statistics, such as average, maximum, minimum and coefficient of variation (CV), per treatment. Low CV values indicate stability of the response of the treatment in divers environments, but it may also indicate that the environments were not so heterogeneous after all. High CV values indicate strong interaction of the treatment with the environments. Some important basic statistics:
Mean/Average:
Coefficient of Variation (CV): The standard error expressed as a percentage of the grand mean.
tan
12
National FSA Training Module 14: Statistical analysis
ANOVA (Analysis of Variance): The method and ANOVA Table layout may differ depending on the design used (e.g. RCBD, Split design, Latin Square, etc)
Example:
Source Degree of freedom
Sum of Squares Mean Sum of
Squares
F value
Treatments
Error
Total
2
4
6
24
58
82
12.0
14.5
0.83
Example: Make an ANOVA with the following data
I
II
III
Totals
A
22
17
22
B
17
23
15
C
21
26
20
Totals
(3) Perform a one-way analysis of variance with a B-Tuckey procedure to test for group differences. This technique will reveal if there are groups of treatments which significantly differ from each other. For example, group A, composed of three treatments, may have significant lower yields than group B, composed of two treatments.
(4) Calculate the site-averages and CVs for the dependant variable (e.g. yield, pest-incidence, damage etc.) The CV of the site-averages indicates how diverse your environments were. You can use the following CV values and their interpretation as a rule of thump. However it should be noted that maximum acceptable CV% differ among parameters and precision of interest.
Value of environment CV
0 - 10%
10 - 30%
30 - 50%
50 - 100%
100 - 200%
> 200%
Interpretation
Uniform environment
Moderately divers environment
Divers environment
Very divers environment
Extremely divers environment
Chaos
(5) Environmental Index (i.e. the site average).
This can be regarded as a measure for the suitability of a site for a specific crop, type of livestock, pest etc. The site average value is called the Environmental Index (EI ). Low values of EI indicate that the site (or that specific environment) was not suitable for that crop, type of livestock, pest etc. High values of EI reflect favourable conditions. Create in your spreadsheet a new variable called EI with the average values per site.
Table 14. 4 Environmental Index (site averages) of bean foliage beetle (Ootheca spp.) incidence and its interpretation.
Environmental Index in number of beetles per 250 cm 2
Site 1 7
Site 2 1
Site 3 0
Site 4 20
Site 5 80
Site 6 3
Interpretation
The site is favourable for beetles
The site is unfavourable for beetles
The site is very unfavourable for beetles
The site is very favourable for beetles
The site is extremely favourable for beetles
The site is moderately for beetles
(6) Plot in a diagram for each treatment (y) as a function of Environmental Index (x) and perform a linear regression analysis (y = ax + b). Test the fitness of the function by calculating
13
National FSA Training Module 14: Statistical analysis the R 2 and its significance. Perform this for each treatment. The slope (a) of each equation indicates the extent of interaction of the treatment with the environment (flat slope: little interaction, steep slope: strong interaction) The flatter the slope, the more stable the treatment across the environments. The intercept (b) reflects the treatment performance in unfavourable environments.
Treatment
Environmental Index
Figure 14.1 Plot of treatment A as a function of Environmental Index and results of linear regression analysis
Equation
Yield A = 0.86 EI + 120
Treatment
A
R 2 and significance
87% and p<0.001
Yield B = 0.56 EI + 200
Yield C = 1.10 EI - 80
Yield D = 1.23 EI + 10
B
C
D
76% and p<0.01
57% and p<0.05
68% and p<0.005
(7) Plot all treatments as a function of Environmental Index in one diagram Interpret the equations in relation to each other.
Treatment
A
C
B
Environmental Index
Figure 14.2 Plot of yield of three treatments as a function of Environmental Index
In the above figure treatment A has the strongest interaction with the production environment.
In unfavourable environments the treatment does not perform well. However, when conditions of production improve, treatment yields also improve. Treatment B has a more stable response.
14
National FSA Training Module 14: Statistical analysis
It performs rather well under unfavourable conditions, but it does not respond as well as treatment A to improvements.
(8)Explain variation in environmental index.
Because you included many observations on the trial site and events that occurred during the trial, it is now possible to explain why certain sites are more or less favourable than others. The most suitable technique is multiple regression analysis where EI is the dependant variable and measured (numeric) environmental/ management variables may explain its variation ( = ax + bz + … + e). Be careful with the independent variables: they should be measured on an ordinal or ratio scale. For example, soil types are nominal ('sand', ‘clay’) and cannot be included. However, if categorised according to their texture (1=light, 10=heavy), then they are suitable as independent variable to be included in the analysis. There should be no interaction between the independent variables. For example, you cannot include both organic matter content and available nitrogen in one multiple regression analysis.
Table 14.5 Results of multiple regression on EI of beans
Independent variables Cumulative R 2 Significance
Beetle incidence
Soil moisture
Date of sowing
Years after fallow
Frequency of weeding
34
56
63
69
81
< 0.0001
< 0.0001
< 0.0005
< 0.005
< 0.01
Interpretation: Five variables explain 81% of the variation in suitability of the environment for beans. Some variables are related to the biotic environment (beetle incidence), others to the biotic environment (soil properties) or to the intervention of the farmer. If variables do not meet the criteria for multiple regression, you can still analyse their interaction with the environment, but not in relation to other variables. For example, if the production environments were composed of two soil types, you can apply a T-test to analyse treatment differences and EI values between the two categories of soil. You may conclude that one soil type provides a better environment for a crop or pest than the other soil type.
Exercise: Calculate the EI of the following treatment yields. Plot the individual treatments as a function of EI in one figure. Interpret them in relation to each other.
Site
Site 1
Calculated EI
Site 2
Site 3
Site 4
1
2
3
1
2
3
1
2
3
Treatment
1
2
3
Yield (kg/ha)
2500
1600
1500
3000
2100
4000
2500
1600
2500
3500
2600
5500
15
National FSA Training Module 14: Statistical analysis
Exercise: Interpret the following treatment-environment interactions
Treatment Treatment
A
A
B
B
Environmental Index Environmental Index
Treatment Treatment
A
B
C
Environmental Index Environmental Index
C
(9) Translate your findings into flexible recommendations for farmers.
A flexible recommendation takes into account the divers production environments as well as differences in objectives of farmers. Flexible recommendations are formulated in terms of ‘if you want …….. than ………’ or ‘if you have ……… than …….’ . Flexible recommendations give the farmer the choice to select the treatment which suits his/her objectives and conditions best. Your input as researcher is to provide the information necessary for a farmer to make a choice. You can disseminate your flexible recommendations by posters and leaflets, provided your farmers are literate, or through farmer training.
Table 14.6 Flexible recommendations for cassava varieties in the medium and low
High yield
To intercrop rainfall zones of Bukoba District
If you want ………….. Plant ……
Rushura or Mulundi
Mulundi or Nigeria
But do not plant ……
Msitu zanzibar or Aipin valenca
An early harvest
Big roots
To consume leaves
To consume fresh roots
To consume cooked roots
To consume ugali
To sell your roots
Msitu zanzibar or Aipin valenca Mulundi or Rushura
Nigeria or Mulundi Aipin valenca or Msitu Zanzibar
Mulundi Nigeria or Msitu zanzibar
Mulundi Nigeria
Aipin valenca or Msitu zanzibar Mulundi
Mulundi
Mulundi
Nigeria or Msitu zanzibar
Msitu zanzibar
16
National FSA Training Module 14: Statistical analysis
Background
Maize cultivation in Karagwe District, Kagera Region, is of recent history. The current maize gene-pool is extremely heterogeneous. Most varieties have a long growing cycle of more than
130 days and are susceptible to streak virus, which limits maize production to one season.
Farmers plant maize in the homegarden or in annual crop fields with beans and groundnuts.
Maize planting densities are very low. After a diagnostic survey in Karagwe District the
Farming Systems Research team concluded that:
Maize is increasing in importance as a food crop, especially for low-resource households as a result of serious decline of banana production.
Maize is a potential cash crop in remote areas.
Local maize varieties have low yield potential and long growing cycles.
Farmers' practices and knowledge of maize growing do not favour high production levels.
A maize research improvement programme was started in three villages that represent different agro-ecological zones. In each village close collaboration was established with groups of farmers who are interested in research. These farmer research groups (FRG) provide trial farmers and form platforms of discussion and feedback. Trials were established to test new maize varieties, fertiliser application levels and some crop husbandry practices.
Materials and methods
The maize variety trial was conducted with seven farmers in each village. These farmers represented different social strata and their fields were selected with the criterion of including all relevant maize production conditions. Varieties tested were Kito (short cycle), TMV-1 (short cycle), Staha (medium cycle), Kilima (long cycle) and UCA (long cycle). No typical farmer variety could be identified because of the high genetic diversity. It was assumed that UCA resembles most characteristics of local varieties. Planting date and population densities were standardised per agro-ecological zone. At four weeks, 25 Kg N/ha in the form of CAN was applied and when farmers observed stalk borers thiodan dust was used to control stalk borers.
Rainfall amounts and distribution was recorded for each village. Field data collected included soil type, soil depth, soil colour, stoniness and slope. Field management data included age of the field, previous crops, number of trees and ant hills in and around the field, and dates of weeding.
Crop data included days to flowering, days to maturity, yield and yield components, height of the plants, grain moisture, and damage due to birds, vermin and diseases. Household and gender characteristics were obtained through informal interviews. Farmer assessment of the trial was performed by visiting all sites with three male and three female farmers. Varieties were compared using pair-wise ranking and farmers indicated the reasons of their preferences.
Results
Statistical analysis showed that the average maize yield was 3490 kh/ha. The varieties Kito,
TMV-1 and STAHA did not differ in average yield, and neither did the varieties Kilima and
UCA. The two categories of varieties differed significantly when a one-way analysis of variance was applied (Table 14.7). Kilima and UCA also obtained higher minimum and maximum yields compared to Kito, TMV-1 and STAHA.
17
National FSA Training Module 14: Statistical analysis
Table 14.7 Results of one-way analysis of variance with B-Tukey test
Treatment Average yield kg/ha Minimum yield kg/ha Maximum yield kg/ha
Kito
TMV-1
STAHA
Kilima
UCA
2690
3170
3170
4170
4080
CV
%
51
32
32
27
29
1090
1310
1330
1950
2130
4980
5770
4960
6670
6690
Site 1
Site 2
Site 3
Site 4
Site 5
Site 6
Site 7
Site 8
Source: Maize variety trial in Karagwe District, 1996
All varieties showed high coefficients of variation (CV) of above 25% suggesting that the response of the varieties to their production environments was not uniform. To investigate this interaction the average yield per site was calculated (Table 14.8). This showed that there was a very pronounced variation in average yields between the sites, ranging from 2510 to 5830 kg/ha.
Table 14.8 Average maize yield per site
Site number Average yield kg/ha
2940
2600
2510
3030
3890
3790
4480
2220
Site number
Site 9
Site 10
Site 11
Site 12
Site 13
Site 14
Site 15
Site
Average yield kg/ha
3240
4930
5830
3140
2770
3320
3960
Source: Maize variety trial in Karagwe District, 1996
The average yield per site indicates the suitability of each site for maize production. It can also be called the Environmental Index (EI). Low EI values indicate environments which are less favourable for maize cultivation, while high EI values reflect more favourable environments.
Varieties do not have to react in a uniform way to the EI. Some varieties may respond better than other varieties in a favourable or unfavourable environment. To investigate this interaction, linear regressions were performed of each variety on the EI. Table 14.9 shows that the resulting yield equations differ very much between the varieties. The equations are plotted in Figure
14. 3.
Table 14.x9 Results of linear regression of varieties on EI
Variety Yield equation R 2 (%)
Kito
TMV-1
STAHA
Kilima
UCA
Y = 0.911*EI - 406
Y = 0.927* EI - 22
Y = 0.951*EI - 208
Y = 0896*EI + 1079
T = 1.251*EI - 229
48
56
64
43
75
Source: Maize variety trial in Karagwe District, 1996
18
National FSA Training Module 14: Statistical analysis
Yield (kg/ha)
UCA
Kilima
TMV-1
STAHA
Kito
Environmental Index (kg/ha)
Figure 14.3 Interaction between maize varieties and maize environmental index
Source: Maize variety trial in Karagwe District, 1996
Figure 14.3 clearly shows the differences in response to various production environments between the varieties. Kito, TMV-1 and STAHA hardly differed in their response and show regression lines which stay close to each other along the trajectory. Lines of Kilima and UCA are situated higher in the graph. Kilima responded better than any other variety to unfavourable environments, while UCA had the highest yield potential.
To explain the variation in EI a multiple regression was done with all non-nominal independent variables measured during the experiment. The results (Table 14.9) show that six variables explained almost 60% of the variation in EI. These variables belong to the (a) abiotic environment (soil texture, stoniness, soil depth), (b) biotic environment (pest incidence) and (c) farmer practices (years after fallow, timing of first weeding).
Table 14.9 Results of multiple regression on EI
Variables
Soil texture
Pest incidence
Stoniness
Number of years after fallow
Timing of first weeding
Soil depth
Cumulative R 2
18.6
37.7
48.7
51.5
53.5
58.6
Source: Maize variety trial in Karagwe District, 1996
Farmer assessment was performed during harvest time. Farmers did not only evaluate yield performance but also other characteristics, such as length of cycle, suitability for roasting, taste, suitability for sale etc. Table 14.10 presents the results of pooled ranking of the varieties per village. It shows that in all villages TMV-1 was the most appreciated variety among the varieties tested. Kilima, Kito and UCA were second in different villages. STAHA was least appreciated in two village and UCA in one village.
19
National FSA Training Module 14: Statistical analysis
Table 14.10 Farmer assessment results of pooled ranking of varieties by village
Variety
Kito
TMV-1
STAHA
Kilima
UCA
3
2
1
Village 1
4
5
1
3
4
Village 2
2
5
1
4
2
Village 3
3
5
5
9
7
Total score
9
15
Source: Maize variety trial in Karagwe District, 1996
Statistical analysis, farmer assessment, and field observations made it possible to summarise the results of this experiment in flexible recommendations of maize varieties. Table 14.11 and
14.12 give examples of such recommendations.
Table 14.11 Flexible maize variety recommendation: variety characteristics
Length of cycle (days)
Kito
90
Resistance to streak virus very low
Resistance to very low
Yield on poor soils
Yield on fertile soils
Suitability for roasting low medium very good
Suitability for ugali
Suitability for marketing medium not good
TMV-1
100 very good medium low medium good good medium
STAHA
110 good medium low medium medium very good medium
Kilima
120
UCA
140 low high high high not good low medium medium very high not good very good very good good good
These final results were promoted to farmers by posters and leaflets, made available at local village stockiest and markets. In 1996 Bukoba farmers purchased 1000 kg of improved seed for the short rains season. The extension service of Bukoba District contributed this success to the diffusion of the flexible recommendations and the availability of seed of all tested varieties.
Now, village information centres called 'Educate Yourself are being established and more posters are being diffused.
Table 14.12 Flexible maize variety recommendation: conditional use of varieties
If you …..
Have a field with poor soil fertility
Have a field with good soil fertility
Then sow.….
Kilima
Kilima, STAHA or UCA
But don't sow…..
Kito, TMV-1 or
STAHA
Kito or TMV-1
Have no time for early weeding
Need an early, average maize harvest
Want to produce early maize for roasting
Want to produce maize for the market
Want to intercrop in annual fields
Want to plant maize in homegarden
TMV-1, Kilima, UCA, STAHA Kito
TMV-1 Kilima or UCA
Kito or TMV-1 Kilima or UCA
Kilima or UCA
Kilima, TMV-1, or STAHA
Kilima or UCA
Kito
Kito
Kito
Source: Ndege and de Steenhuijsen Piters, 1995
Discussion and conclusions
It is remarkable that researchers and farmers differed in their appreciation of maize varieties.
Researchers concluded that Kilima is a very suitable variety, and that UCA has a high yield potential. Farmers unanimously selected TMV-1 as the most preferred variety. The reason of this discrepancy is the lack of market for maize in Karagwe District. Prices are so low that maize is almost uniquely cultivated for home consumption in times of banana scarcity.
Obtaining maximum yield is thus not an issue for most farmers. More important is the length of
20
National FSA Training Module 14: Statistical analysis the growing cycle in combination with a reasonable yield and good taste. TMV-1 is the best compromise among the varieties.
This preference for TMV-1 does not imply that no other varieties are needed. A minority of farmers has ways to sell their maize and appreciate Kilima or UCA. Some households will grow
Kito to obtain some early maize when other crops are still maturing. Although there is a general preference for TMV-1 flexible recommendations are needed to satisfy all demands of various maize producers.
Farmer’s involvement proved to be crucial to obtain a complete understanding of maize variety performance and their perspectives. In general, researchers tend to concentrate on few criteria of comparison and often have a bias towards yield. Farmers are more holistic in their evaluation and tend not to give absolute conclusions. They see advantages in more than one variety or treatment and will adopt them accordingly. The reason for this is the heterogeneity of their fields and the multiple goals of production. This confirms the necessity of flexible recommendations which fit different production environments and different farmers views and goals.
Production environments of maize proved to be very variable. High intra variety variation complicated statistical analysis and ‘blurred’ the distinction between the varieties. Analysis of the Environmental Index, also called Adaptability analysis, proved to be very effective in explaining variety yield performance within their heterogeneous environment.
Techniques of analysis define the conclusions which are derived from a trial. This is clearly shown in Table 14.x. Analysis of variance would not have given satisfactory results because of the high yield variation of the individual varieties. Maybe by applying some complicated statistical analysis it would have been possible to obtain some results, but most agronomists would have repeated the trial. Not to fall in the same gap again, they probably would have increased their control over the trial by excluding the causes of variation.
21
National FSA Training Module 14: Statistical analysis
Table 14.x Relationship between technique of analysis and type of conclusion
Technique of analysis
ANOVA
Stability analysis
Farmer assessment
Adaptability Analysis
Conclusion
Too much variation (high CVs of variety yields), repeat the trial and control the production conditions more
Kilima is the most stable variety: it is recommended for diffusion
TMV-1 is the variety which combines short cycle with acceptable yield.
It solves the needs of most farmers
All varieties have one or more characteristics which makes them useful in specific environments
Source: Ndege and de Steenhuijsen Piters, in press
Stability analysis would have given good results, but the conclusion would have been that one variety, Kilima, is most stable and should therefore be diffused to farmers. This would have conflicted with the selection by the farmers, who prefer TMV-1. However, farmers tend to base their judgement on the status quo and often lack insights in future developments. Adaptability
Analysis describes the performance of different varieties in various environments, which makes planning for immediate and future distribution possible.
Adaptability analysis has, however, some limitations:
It has a bias towards yield and does not include other factors considered important by farmers and consumers
It is not a test and one should be careful with attributing too much importance to small differences
Adaptability analysis is based on regression models, which estimate the relationship between an individual variety and its environment. These estimates need a large number of sites to obtain good accuracy.
Environments are characterised by calculating the average yield per site. The resulting
Environmental Index is not independent of the individual variety yields and this imposes some statistical limitations. However, Eberhard and Russell stated in 1966 the following:
‘An index independent of the experimental varieties and obtained from environmental factors such as rainfall, temperature, and soil fertility would be desirable. Our present knowledge of the relationship of these factors and yield does not permit the computation of such an index. Until we can measure such factors in order to formulate a mathematical relation with yield, the average yield of the varieties in a particular environment must suffice’ (Eberhard and Russell, 1966).
At present Eberhard and Russell’s statement is still valid. Agricultural research has not developed simple tools to characterise environmental factors, and average yield is still the best index available. However, progress has been made by diversifying statistical analysis and including farmers into research. This case-study has shown that achievements in design and analysis of research have improved research efficiency. Fortunately, this did not lead to increased complication of methods of analysis. New approaches should be widely adoptable, and should not become the domain of few specialists. Impact on agricultural production will be achieved by adoption of simple, but creative tools which increase farmer participation and which facilitate researcher understanding. Now that farmer involvement in research has been widely adopted, research should increase its efforts to develop methods which fit the new situation. Only by adopting new methods of design and analysis will researchers’ awareness of the need to involve farmers really improve research quality and effectiveness.
22
National FSA Training Module 14: Statistical analysis
Eberhard, S.A and W.A Russell (1996). Stability parameters for comparing varieties. In: Crop
Science 6:36-40.
Fischer Box J. (1978). R.A. Fisher: The Life of a Scientist . John Wiley & Sons, New York,,
Brisbane, Toronto.
Harris, J.A. (1920). Practical universality of field heterogeneity as a factor-influencing plot yields. Journal of Agricultural Research, 19 (7): 279-315.
Hildebrand, P.J. (1984). Modified Stability Analysis of Farmer Managed, On-farm Trials. In:
Agronomy Journal 76:271-274.
Hildebrand, P.J. and J.T. Russell (1994) . Adaptability analysis for diverse environments.
Paper presented at the American Society of Agronomy Meeting, Seattle, Washington.
Hildebrand, P.J. and J.T. Russell (1996). Adaptability Analysis; a method for the design, analysis and interpretation of on-farm research-extension . Iowa State University Press/Ames.
McBratney, A.B. (1992). On variation, Uncertainty and Informatics in environmental Soil
Management. Australian Journal of Soil Research , 30: 913-935.
Montgomery, E.G. (1913). Experiments in Wheat Breeding: experimental error in the nursery and variation in nitrogen and yield. U.S Department o f Agriculture, Bureau of Plant Industry
Bulletin , No. 269.
Ndege, L. and B. de Steenhuijsen Piters (1995). Farmers’ Choice: flexible maize recommendations in Kagera Region, Tanzania.
Paper presented at the Fifth Regional
Conference of the Southern African Association for farming Systems Research-Extension, held
23-25 September 1996 at Arusha, Tanzania.
Ndege, L. and B, de Steenhuijsen Piters, in press. Environment-technology interaction: an approach to increase research efficiency. Paper submitted to African Crop Science Journal.
Paoletti, M.G. and D. Pimentel (eds) (1990). Biotic Diversity in Agroecosystems. Agriculture,
Ecosystems and Environment, Special Issue . Papers from a Symposium on agroecology and conservation issues in tropical and temperate regions, 26-29 September, 1990, Padova, Italy.
Salmon, S.C. & A.A. Hanson (1964). The principles and practices of agricultural research.
Leonard Hill, London.
Smith, L.H. (1909). Plot arrangement for variety experiments with corn. Proceedings of the
American Society of Agronomy , 6: 84-89.
Steenhijsen Piters, B. de (1995). Diversity of Fields and Farmers; explaining yield variations in
Northern Cameroon . PhD thesis, Wageningen Agricultural University, Netherlands.
Student (1908). The Probable Error of a Mean. Biometrika , 6 (1): 1-25.
Stroup, W.W., P.E. Hildebrand and C.A. Francis (1991). Farmers Participation for More
Effective Research in Sustainable Agriculture. In: Staff Papers Series, Institute of Food and
Agricultural Sciences, University of Florida.
23
National FSA Training Module 14: Statistical analysis
Uven, M.J. van (1935). Mathematical Treatment of the Results of Agricultural and Other
Experiments . Noordhoff N.V., Groningen-Batavia.
Vieira, S.R., J.L. Hatfield, D.R. Nielsen & J.W. Biggar (1982). Geostatistical theory and application to variability of some agronomical properties. Hilgardi, 51 (3): 1-75.
Zimmerer, K.S. (1991). Managing diversity in potato and maize fields of the Peruvian Andes.
Journal of Ethnobiology , 11 (1): 23-49.
24