The problem of variation

advertisement

National FSA Training Module 14: Statistical analysis

Module 14 Statistical analysis of on-farm trials

Objectives

The objective of the module is to enable trainees to understand:

Different approaches to statistical analysis of on-farm experiments

Environmental index and its interpretation to flexible recommendations

Simple calculation modules for different experimental designs.

Content

14.1

Background

14.2

Basic statistical analysis

14.3

Experimentation in diverse environments

14.4

Approaches to statistical analysis of on-farm experiments

14.5

An example: maize variety testing in Karagwe District

Key words

Diverse environment; environment technology interaction; stability analysis; flexible recommendation; Environmental index; ANOVA,

1

National FSA Training Module 14: Statistical analysis

14.1 Background

In agricultural research, variations in yield and other agricultural products have been observed since long and researchers have been working on problems associated with this variation since the early 1900’s (Vieira et al ., 1982). In the beginning of experimental research on agricultural stations, of which the first was founded in France in 1834 followed by the foundation of the

Rothamsted Experiment Station in the United Kingdom in 1843 (Salmon and Hanson, 1964) undesired yield variations were not yet of any concern. Experimental plots were laid out systematically in the field and treatment means were compared but not tested. Often, one of the treatments resulted in such better results that deviations of the mean were of minor importance, making its superiority obvious. As a consequence, the selected treatment could yield an unexpected result when repeated or applied under different environmental conditions. One of the first researchers paying explicit attention to this problem of undesired variation in field experiments, L.H. Smith (1909), stated in reference to his variety experiments with corn that,

“The topic proposed embraces some vital questions, and it is one which should certainly be of great interest and importance to every agronomist who has to do with field experiments. The elimination or control of the variable factors, so essential in experimentation of any sort, becomes unusually difficult in field experiments where we have to deal with so many uncontrollable conditions” . The perception of the importance of these “uncontrollable conditions” was rather new. In this period, agronomists openly expressed their concern about the perturbations in their experiments, causing numerous difficulties of interpretation. In 1913, after six years of experimentation with winter wheat, a great variation was found in the check rows, even when conditions appeared quite uniform (Montgomery, 1913). Besides this observation, Montgomery also underlined the problem of how to handle factors causing variations: “ All things being equal, the yield of the 47 plots should have been the same. But all factors can never be equal, so in row-breeding work, owing to unequal environment, we must expect a wide degree of error. The only practical way so far suggested to overcome this error is to repeat the plots, according to some systematic method, enough times to equalise variations in soil or climatic effects” (Montgomery, 1913). Methods and techniques to deal with undesired variation, inherent to field experimentation, were lacking and the single known solution was repetition. The existence of factors causing variations in yield were known, but up to the early

1920's only few writers on field experimentation sufficiently recognised, and none adequately emphasised their importance (Harris, 1920).

The statistical solution

In mathematics, similar problems of probability, population distributions and testing errors of observation, especially in the fields of astronomy and evolutionary biology were object of debate (Fisher Box, 1978). The journal Biometrika played an important role in this debate. In his article “

The Probable Error of a Mean”,

Student aimed to determine the significance of the means of a series of experiments, especially when large repetitions were not feasible, like in agricultural experiments (Student, 1908). He underlined the importance of randomisation and introduced the assumption of normality for small sample sizes, which he based on empirical analysis of random sampling in a large population of finger measurements of 3000 criminals.

Maybe Students’ major contribution to the development of agricultural research was the introduction of the z-distribution (z being a function of the mean and the standard deviation). It was Fisher who provided the actual mathematical proof, leading to acceptance of the zdistribution (later called t-distribution) and the normality assumption. With its tabulation and diffusion, the testing of the significance of means in field experimentation became possible. It still took some time before the use of tables, distributions and testing of significance was generally adopted in agricultural research. Fisher’s “ Statistical Methods for Research Workers ” may be regarded as the major breakthrough that lead to the general acceptance of testing in experimentation (Fisher, 1925). Fisher called this method “ analysis of variance-ANOVA ” and

Snedecor simplified and sophisticated the distribution, calling it the F-distribution, in honour of

Fisher (Fisher Box, 1978). Now analysis of variance and covariance were ready for distribution and use in experimental research.

2

National FSA Training Module 14: Statistical analysis

The development of statistical methods provided agricultural researchers with immediate solutions to their problems of “ uncontrollable conditions ” and undesired variation. In the

1930’s, testing based on distribution tables was widely adopted, giving an important stimulant to agricultural research. The undesired variation could be eliminated from the results by following simple techniques. However, the uncritical way in which analysis of variance was applied in experimental fieldwork was questioned and concern about it was expressed. “ Many observers, in such a case, consult a manual offering them numerous prescriptions for calculation, often without a proof. They try to find out which case has the greatest affinity to their own, and then, fully relying on the error-doctor, meekly follow the given recipe . For such blindly submissive natures this book has not been written”

(van Uven, 1935). Sceptics like van

Uven could not prevent it to become commonly adopted and applied in agricultural science, standardising procedures on the treatment of any deviation from the mean. For the moment, variation was no longer of concern to agricultural scientists and the debate on its treatment was closed.

On-farm experimentation and the need for unconventional statistics

Until the late seventies, most agricultural research was carried out on-station, where conditions can be controlled and single treatments applied. With the adoption of FSA as an approach to improve the impact of agricultural research, on-farm experimentation gained in popularity.

Testing new innovations under farmer’s conditions and with farmer participation proved to result in better adoption. However, one major constraint emerged: How do you analyse data from experiments in divers environments? Unlike on-station experiments numerous factors, affecting the experiment cannot be controlled. When analysis of variance is applied, it results in a huge error and treatment effects hardly ever reach significance levels. Many scientists do not know how to deal with data from on-farm experiments and have become frustrated by the lack of significance. Their responses to a problem of diverse variations of on-farm trials and data from experiments are multiple, such as:

 Increasing the number of repetitions (and thus increase the costs);

 Repeating the trial over several years (and thus delay the dissemination of results);

 Controlling the farmer’s conditions (and thus create an artificial environment);

 Applying complicated ANOVA techniques (and thus estrange themselves from common scientists without advanced training in statistics);

 Seeking advice from a biometrician (and thus loose control and responsibility);

 Cooking-up the results anticipated (and thus against the ethics of science)

One conclusion which was drawn from the experience with on-farm trials was that new statistical techniques were needed to analyse data derived from this type of experimentation.

Without specific techniques and procedures for on-farm trials, frustration by scientists and lack of results would eventually lead to abortion of this type of research.

Brainstorming exercise: Discuss with your fellow scientists how you were trained to analyse

(on-farm) data and how you used to deal with variation in your work as agricultural scientist.

3

National FSA Training Module 14: Statistical analysis

14.2

Basic statistical analysis

Generally, data obtained from experiments or surveys consists of raw, unorganised sets of numbers. For these numbers to be readable, they must be summarised and analysed in such a way that pertinent information can be extracted and interpreted. Statistical analysis is the basis for doing this. Statistical analysis enables us to:

 Separate the effect of different treatments

Determine the magnitude of significance difference

 Provide information for subsequent statistical analysis

The form of statistical analysis normally depends on the objective of the study, type or design of experiment (or survey) and the design used. Basic aspects that will be discussed here are:

1.

Data scrutiny

2.

Methods to summarise and present data

3.

Calculation of descriptive measures

4.

Overview of data from different experimental designs

This section assumes that trainees have gone through a basic statistics course in their formal training. Thus for further detail, readers are referred to standard text and appropriate literature. It should be noted that the use of computers and available software programmes has made statistical analysis simple. Thus the computer performs most of the tedious calculations.

However, it is important for trainees to understand the process and interpret the results.

A

B

C

D

1. Data scrutiny

This is always the first step in processing data. It should be done as early as possible while the memory of the situation in the field and the data collection procedures is good. Inconsistencies in collected data can have a variety of causes such as:

Incorrect measurement of data

Incorrect data transcription

Errors in trial implementation

Farm conditions

Treatment behaviour (may differ from researcher's expectations)

Table 14.1 Yield of sunflower varieties

Variety Block

I

614

642

59

680

II

63

602

657

592

III

504

55

673

600

IV

628

616

645

583

Consider the following hypothetical example indicated in Table 14.1. Scrutiny of data shows that:

There are quite low yields for varieties C in Block 1 (i.e. 59), Variety A in Block 2 (i.e.

63) and variety B in block 3 (i.e. 55).

If these data are analysed as they are now, they will have a very high CV.

 Scrutiny in the field record book showed that there was an error in transforming the data. The correct data were found to be 590, 630 and 550 respectively.

 Thus it is very important to scrutinise the data before attempting any analysis.

4

National FSA Training Module 14: Statistical analysis

2. Methods to summarise and present data

Useful methods for presenting data are:

Frequency distribution

Cummulative frequency distribution

Histogram

Ogave

Bar charts

Pie diagrams

Consider the numbers below to illustrate the following techniques.

22 26 42 39 17 46 35 31 34 22 34 20 11 33

20 29 27 30 33 19 39 41 19 18 14 57 28 24

33 48 9 41 27 39 21 12 24 53 36 51 42 49

44 32 25 39 43 37 32 40 36 35 30 30 21

6-10

11-15

16-20

21-25

26-30

31-35

36-40

41-45

46-50

51-55

56-60

(i) Frequency distribution

This is a tabular presentation of data showing number of occurrence (frequency) of a given class interval. The procedure of construction is as follows:

Observe the final range (maximum and minimum values)

Separate range into suitable intervals

Specify the class limit (higher and lower levels of a class)

Select the class boundaries

Specify the class mark (mid-point of the class)

Count the number of observations of each class

Calculate relative frequency (class frequency/total number of observations)

Table 14.2 Frequency distribution of weight gain of 55 pigs

Class limits Class boundaries Class mark Class frequency Relative frequency

5.5-10.5

10.5-15.5

15.5-20.5

20.5-25.5

25.5-30.5

30.5-35.5

35.5-40.5

40.5-45.5

45.5-50.5

50.5-55.5

55.5-60.5

28

33

38

43

8

13

18

23

48

53

58

9

10

8

6

1

3

5

7

3

2

1

(ii) Cumulative frequency

Represents the number of observations whose value is less than a given value.

Example

0.0182

0.0545

0.0909

0.1273

0.1636

0.1818

0.1454

0.1091

0.0545

0.3640

0.0182

Weight

Frequency

<6 <11 <16 <21 <26 <31 <36 <41 <46 <51 <56 <61

0 1 4 9 16 25 35 43 49 52 54 55

5

National FSA Training Module 14: Statistical analysis

(iii) Histogram

Is a graphic presentation of a frequency distribution and is constructed by erecting rectangles on the class intervals.

12

10

8

6

4

2

0

A B C D E F

Class interval

G H I J

(iv) Ogave

Is a graphic presentation of a cumulative frequency distribution.

60

50

40

30

20

10

0

A B C D E F G H I J K

Class interval

(v) Bar charts

Is a graphical presentation showing the proportion of each constituent contributing to the total amount of the item being considered.

6

National FSA Training

(vi) Pie diagram

Module 14: Statistical analysis

11%

16%

30%

43%

Is a circular presentation of statistical data showing the relative size of the component parts of the total.

3.

Calculation of descriptive measures

Calculation of descriptive measures is the second step in transforming raw data into suitable information. These procedures include: a.

Measures of central tendency (mean, median, standard deviation) b.

Measures of dispersion (range, variance, standard deviation)

It should be noted that with the available computers and software programmes, calculation of these statistics is no longer a problem. Thus what is more important to the trainees is to know how to apply these measures. a. Measures of central tendency:

Mean (simple arithmetic mean) measures the central tendency. Is the sum of observation values divided by the number of values or observations.

Mean

  Xi /n

Weighted mean. This is important when calculating the mean of numbers with different importance. For example when there are different numbers of observations in a class.

WeightedMe an

 

WxXi

/

w

Table 14.3 Maize yield in different agro-ecological zones

III

3800

15

Agro-Ecological Zone

Average maize yield (kg/ha)

Average land size (ha)

I

4500

4

II

4000

12

Weighted Mean = (4 x 4500) + (12 x 4000) + (15 x 3800)

4 + 12 + 15

= 123000

31

= 3968 kg/ha

7

National FSA Training Module 14: Statistical analysis

The un-weighted mean would be:

X = 4500 + 4000 + 3800

3

= 4100 kg/ha

This shows that the un-weighted mean gives a biased estimate of the maize yield.

Median is the middle value (for an odd number of observation) or the arithmetic mean of the two middle values (for an even number of observations). The median position divides a histogram into two equal parts. When data are not skewed the median may be more characteristic and provide a better description.

The mode is the value that occurs with the greatest frequency. b. Measures of dispersion (S)

These are statistical measures that contribute to the description of the precision of the measurements (observation values). They include:

Range is the difference between the largest and smallest values. It is the least suitable measure of variation.

Variance (S 2 ) is the measure of deviation from the mean

Variance

 

Xi

2 

( 

X

)

2

/

Standard deviation (S) is the square root of the variance

n

S

tan

ardDeviati on

S

2

Standard deviation of the mean is sometimes referred to as standard error.

Coefficient of variation (CVs). This is the magnitude of experimental error expressed as the percentage of the mean:

CV = Standard Deviation x 100%

Mean

In a well laid down on-station trial, a CV should not exceed 10-15%. However, in on-farm trials, the CV can go up to 30%. In these situations, field notes explaining the cause of variation are important to support the result interpretation.

Setting the hypothesis and the levels of significance

The hypothesis is a statement that describes the validity of values of one or more parameters in a population. There are two types of Hypothesis:

NullHypoth esis

H o

Alternativ eHypothesi s

H a

Xi

Xi

Xii

Xii

8

National FSA Training Module 14: Statistical analysis

Commonly used levels of significance are 1% and 5%. However, in on farm trials, significance levels of up to 10% are acceptable. When interpreting the data it is important to consider the following:

Distinction between the technical significance and statistical significance

Conclusions should be of practical importance and should be used in the interpretation of data.

Other statistical analysis methods

Analysis of data from Two Sampled and Paired Sample techniques

Analysis of Single Factor Experiments in Block Designs (ANOVA and Mean

Comparisons)

Completely Randomised Design (CRD)

Randomised Complete Block Design

Latin Square Design

Analysis of Factorial Experiments

Regression analysis

Correlation analysis

Co-Variance Analysis

Combined Analyses

Stability Analysis

Detailed analysis procedures for these methods can be found in any standard statistical analysis reference book or manual.

9

National FSA Training Module 14: Statistical analysis

14.3 Experimentation in diverse environments

During the Eighties and Nineties agricultural scientists became aware that variation in on-farm trials is not due to random factors, but results from systematic and sometimes deliberate interaction of the treatment and the environment (De Steenhuijsen Piters, 1995). In other sciences, such as ecology, pedology and sociology, comparable awareness emerged and concepts, such as biodiversity, kriging, actor-in-context and gender analysis, were adopted. In agricultural sciences many efforts were done to understand the causes of variation. Some scientists concentrated their efforts on the analysis of the divers environment in which agricultural production takes place (Harlan, 1975; Zimmerer, 1991; McBratney, 1992; Paoletti

& Pimentel, 1992; de Steenhuijsen Piters, 1995). Others contributed to the development of new statistical techniques. The most prominent author is beyond any doubt Hildebrand, once a founding father of FSA and recently the inventor of adaptability analysis. One thing these authors have in common is that they consider variation as a source of information.

Understanding the production environment

Causes of variation can be classified as natural deterministic, deliberate and random. The first category reflects the given situation at a certain moment, the second reflects human response or action and the third reflects casual, non-systematic events (de Steenhuijsen Piters, 1995).

Natural deterministic causes include the biotic, a biotic and climatic environment of production

(e.g. soils, weeds, rainfall etc.). The deliberate causes include crop genotype and management

(practices, techniques and inputs) by a farmer. Random cause include non-systematic causes of damage to the crop or livestock (e.g. elephants walk through a field, a thief shoots a cow).

Exercise: Farmers in Mbulu District have a problem with drought in maize: frequently, their maize does not attain maturity because of erratic rainfall. You propose to test three new varieties under farmers conditions. Five farmers volunteer and at the onset of the rains, they sow the three new varieties and one local variety. You standardise the fertiliser application at a rate and timing which is common practise by farmers.

Question: which major sources of variation do you expect in this trial? Which are deterministic, deliberate or random?

Trial design

Conventional trials are designed to exclude as many causes of variation as possible. However, when the objective is to test a potential innovation under real conditions, the trial must be designed to include as many relevant causes of variation. If we would exclude systematic causes of variation from our trial, how do we understand the response of our innovation when confronted with them? However, one should not include more variation than necessary.

Random variation is not of interest because it is not predictable. Well-drained sandy soils are not interesting when irrigated rice is concerned. So, include only those production environments that are relevant for a specific crop or type of livestock. A short survey and interviews with farmers are helpful tools to identify these relevant production environments, so different types of farmers must be included in your trial.

Example: In Bukoba District farmers grow maize in homegardens as well as in annual crop fields.

Results from a short survey and interviews with farmers indicate that homegardens can be categorised into 2 types according to their soil fertility ('poor' and 'fertile'). Annual crop fields can be categorised into three types ('exhausted', ,after fallow, and 'new'). Farmers explain that resource-poor households have 'poor' homegardens and mainly grow their maize on fields which are fallowed.

Resource-rich farmers grow maize on new annual crop fields. Female farmers grow maize in any type of homegarden or, if not allowed by their husbands, on annual crop fields. These results indicate that there is a systematic relationship between production environment and type of farmer. Six major production environments can be identified and a maize trial should include all of them. Because there is some interaction with the sex of the producer, it is advisable to include both male and female farmers. The proposed trial will therefore include 12 trial sites.

10

National FSA Training Module 14: Statistical analysis

In case you plan to apply adaptability analysis (see 14.5), no repetitions are needed within a site

(farmer), instead the number of farmers in different environments can be increased. If you include all relevant production environments, you do not even have to repeat your trial

(Hildebrand and Russell, 1994). It is advisable not to make your plots too small to prevent losses due to random damage.

Recording observations

In order to explain variation in a trial, you must record the most prominent causes. This implies that conducting a trial includes more than making observations on the treatment; you should also observe, record and quantify non-treatment factors that influence your treatment. Some observations only have to be made once (e.g. soil type, slope of the field etc.), while others need constant recording (e.g. crop growth, disease incidence). That's why on-farm trials need frequent visits. However, many observations can be made and even recorded by the farmer. This necessitates good instruction and sometimes some sort of training. Once capable of recording events and phenomena taking place in the trial fields, you cannot wish yourself a better field officer than a farmer. After all who else will look at the site 10 times a day? To facilitate farmer's observations treatments should always be labelled and scientific names of varieties should be avoided (e.g. 'Bahati' in stead of 'FHIA 3'). All recordings need standardised recording forms which should be entered into a spread-sheet as soon as possible. Errors in data files often occur because of negligence and late processing. Make sure that your 'hard-copies'

(recording forms) are legible by- and accessible to fellow scientists. Keep a log-book with the meaning of codes and abbreviated variable names. Check, check and check again data entry by somebody who was not involved in the execution of the trial (e.g. your secretary). Keep copies of your data files on hard- and floppy disk.

Farmer assessment

Involving farmers in the evaluation of a trial provides a lot of relevant information. It not only increases your knowledge about the treatment effects, but it also predicts future adoption of the tested innovation by farmers. For assessment of trials in divers environments keep in mind to (1) assess the performance of the tested innovation/s in multiple production environments (if relevant) and (2) include all stakeholders in the assessment. It may be necessary to group stakeholders according to their gender, interest and socio-economic position. Results of the assessment may thus indicate group-specific preferences, which are extremely important for future adoption of the tested innovations.

Example: A bean variety trial was performed in a village with two major production environments.

Include in the trial were small and large size beans, as well as white beans. During the farmer assessment resource-poor and resource-rich farmers, female farmers and traders were invited and grouped accordingly. Results revealed that resource-poor farmers preferred another bean variety than resourcerich farmers, which was due to differences in production environment. Female farmers preferred whiteseeded beans because of their palatability and rejected one variety selected by male farmers. Trader selected a variety they consider suitable for export to other regions and urban centres. Results from the farmer assessment emphasized the need for a flexible recommendation of new bean varieties.

11

National FSA Training Module 14: Statistical analysis

14.4 Approaches to statistical analysis of on-farm experiments

In most cases analysis of variance ( ANOVA) is not the most appropriate technique to analyse the results from on-farm experiments. As Stroup et al (1991) stated that: " traditional methods simply cannot accommodate the complexity of on-farm trials" . Confronted with data from experiments in divers environments, a specific approach to statistically analyse them is needed.

The environment-technology interaction

In reality there is strong environment-technology interaction. Once agreed that our trials should include a wide range of conditions the best design and techniques of analysis have to be selected. Additional criteria for our approach include cost-effectiveness and rapid dissemination of results. Statistical techniques for on-farm trials must be analytic because we need to understand the interaction of our technology with its environment and we often have no testable hypothesis about this interaction. We need intensive farmer participation and environment analysis for reasons of continuous feedback from the field and increasing the speed of adoption.

No existing approach completely satisfies these conditions. What we need is a hybrid approach which selects useful elements from different techniques. The approach presented here is inspired by Hildebrand (1984), Stroup et al. (1991) and Hildebrand and Russell (1994).

Adaptability Analysis is a derivation of the Stability Analysis which was developed during the sixties and which was first published by Eberhard and Russell (1966).

Stability Analysis was applied by crop breeders to identify a variety producing a stable yield when grown in divers environments. Although this was a great improvement it still aimed at a blanket recommendation for all farmers. During the Eighties Hildebrand (1984) adjusted the technique and called it Modified Stability Analysis. He and Russell improved the technique again and now call it Adaptability Analysis: " The procedure, as we use it, is not related to stability but rather to adaptability of technologies to different environments and Socio-economic conditions " (Hildebrand and Russell, 1994).

Guidelines for statistical analysis of data from experiments in divers environments

The statistical procedure presented here was extensively tested and further improved in the Lake

Zone of Tanzania.

(1) Evaluate your data visually.

Print the data in the spread-sheet and just look at them. Do you observe any obvious discrepancies or recording mistakes? Are there interesting cases and sites with peculiar data? In contrast to conventional statistics: do not exclude 'out-liers' (cases with extreme values), but correct only for errors (refer 14.3a).

(2) Calculate simple statistics, such as average, maximum, minimum and coefficient of variation (CV), per treatment. Low CV values indicate stability of the response of the treatment in divers environments, but it may also indicate that the environments were not so heterogeneous after all. High CV values indicate strong interaction of the treatment with the environments. Some important basic statistics:

Mean/Average:

Observatio ns

NumberofOb servations

Coefficient of Variation (CV): The standard error expressed as a percentage of the grand mean.

S

tan

dardError

GrandMean

12

National FSA Training Module 14: Statistical analysis

ANOVA (Analysis of Variance): The method and ANOVA Table layout may differ depending on the design used (e.g. RCBD, Split design, Latin Square, etc)

Example:

Source Degree of freedom

Sum of Squares Mean Sum of

Squares

F value

Treatments

Error

Total

2

4

6

24

58

82

12.0

14.5

0.83

Example: Make an ANOVA with the following data

I

II

III

Totals

A

22

17

22

B

17

23

15

C

21

26

20

Totals

(3) Perform a one-way analysis of variance with a B-Tuckey procedure to test for group differences. This technique will reveal if there are groups of treatments which significantly differ from each other. For example, group A, composed of three treatments, may have significant lower yields than group B, composed of two treatments.

(4) Calculate the site-averages and CVs for the dependant variable (e.g. yield, pest-incidence, damage etc.) The CV of the site-averages indicates how diverse your environments were. You can use the following CV values and their interpretation as a rule of thump. However it should be noted that maximum acceptable CV% differ among parameters and precision of interest.

Value of environment CV

0 - 10%

10 - 30%

30 - 50%

50 - 100%

100 - 200%

> 200%

Interpretation

Uniform environment

Moderately divers environment

Divers environment

Very divers environment

Extremely divers environment

Chaos

(5) Environmental Index (i.e. the site average).

This can be regarded as a measure for the suitability of a site for a specific crop, type of livestock, pest etc. The site average value is called the Environmental Index (EI ). Low values of EI indicate that the site (or that specific environment) was not suitable for that crop, type of livestock, pest etc. High values of EI reflect favourable conditions. Create in your spreadsheet a new variable called EI with the average values per site.

Table 14. 4 Environmental Index (site averages) of bean foliage beetle (Ootheca spp.) incidence and its interpretation.

Environmental Index in number of beetles per 250 cm 2

Site 1 7

Site 2 1

Site 3 0

Site 4 20

Site 5 80

Site 6 3

Interpretation

The site is favourable for beetles

The site is unfavourable for beetles

The site is very unfavourable for beetles

The site is very favourable for beetles

The site is extremely favourable for beetles

The site is moderately for beetles

(6) Plot in a diagram for each treatment (y) as a function of Environmental Index (x) and perform a linear regression analysis (y = ax + b). Test the fitness of the function by calculating

13

National FSA Training Module 14: Statistical analysis the R 2 and its significance. Perform this for each treatment. The slope (a) of each equation indicates the extent of interaction of the treatment with the environment (flat slope: little interaction, steep slope: strong interaction) The flatter the slope, the more stable the treatment across the environments. The intercept (b) reflects the treatment performance in unfavourable environments.

Treatment

Environmental Index

Figure 14.1 Plot of treatment A as a function of Environmental Index and results of linear regression analysis

Equation

Yield A = 0.86 EI + 120

Treatment

A

R 2 and significance

87% and p<0.001

Yield B = 0.56 EI + 200

Yield C = 1.10 EI - 80

Yield D = 1.23 EI + 10

B

C

D

76% and p<0.01

57% and p<0.05

68% and p<0.005

(7) Plot all treatments as a function of Environmental Index in one diagram Interpret the equations in relation to each other.

Treatment

A

C

B

Environmental Index

Figure 14.2 Plot of yield of three treatments as a function of Environmental Index

In the above figure treatment A has the strongest interaction with the production environment.

In unfavourable environments the treatment does not perform well. However, when conditions of production improve, treatment yields also improve. Treatment B has a more stable response.

14

National FSA Training Module 14: Statistical analysis

It performs rather well under unfavourable conditions, but it does not respond as well as treatment A to improvements.

(8)Explain variation in environmental index.

Because you included many observations on the trial site and events that occurred during the trial, it is now possible to explain why certain sites are more or less favourable than others. The most suitable technique is multiple regression analysis where EI is the dependant variable and measured (numeric) environmental/ management variables may explain its variation ( = ax + bz + … + e). Be careful with the independent variables: they should be measured on an ordinal or ratio scale. For example, soil types are nominal ('sand', ‘clay’) and cannot be included. However, if categorised according to their texture (1=light, 10=heavy), then they are suitable as independent variable to be included in the analysis. There should be no interaction between the independent variables. For example, you cannot include both organic matter content and available nitrogen in one multiple regression analysis.

Table 14.5 Results of multiple regression on EI of beans

Independent variables Cumulative R 2 Significance

Beetle incidence

Soil moisture

Date of sowing

Years after fallow

Frequency of weeding

34

56

63

69

81

< 0.0001

< 0.0001

< 0.0005

< 0.005

< 0.01

Interpretation: Five variables explain 81% of the variation in suitability of the environment for beans. Some variables are related to the biotic environment (beetle incidence), others to the biotic environment (soil properties) or to the intervention of the farmer. If variables do not meet the criteria for multiple regression, you can still analyse their interaction with the environment, but not in relation to other variables. For example, if the production environments were composed of two soil types, you can apply a T-test to analyse treatment differences and EI values between the two categories of soil. You may conclude that one soil type provides a better environment for a crop or pest than the other soil type.

Exercise: Calculate the EI of the following treatment yields. Plot the individual treatments as a function of EI in one figure. Interpret them in relation to each other.

Site

Site 1

Calculated EI

Site 2

Site 3

Site 4

1

2

3

1

2

3

1

2

3

Treatment

1

2

3

Yield (kg/ha)

2500

1600

1500

3000

2100

4000

2500

1600

2500

3500

2600

5500

15

National FSA Training Module 14: Statistical analysis

Exercise: Interpret the following treatment-environment interactions

Treatment Treatment

A

A

B

B

Environmental Index Environmental Index

Treatment Treatment

A

B

C

Environmental Index Environmental Index

C

(9) Translate your findings into flexible recommendations for farmers.

A flexible recommendation takes into account the divers production environments as well as differences in objectives of farmers. Flexible recommendations are formulated in terms of ‘if you want …….. than ………’ or ‘if you have ……… than …….’ . Flexible recommendations give the farmer the choice to select the treatment which suits his/her objectives and conditions best. Your input as researcher is to provide the information necessary for a farmer to make a choice. You can disseminate your flexible recommendations by posters and leaflets, provided your farmers are literate, or through farmer training.

Table 14.6 Flexible recommendations for cassava varieties in the medium and low

High yield

To intercrop rainfall zones of Bukoba District

If you want ………….. Plant ……

Rushura or Mulundi

Mulundi or Nigeria

But do not plant ……

Msitu zanzibar or Aipin valenca

An early harvest

Big roots

To consume leaves

To consume fresh roots

To consume cooked roots

To consume ugali

To sell your roots

Msitu zanzibar or Aipin valenca Mulundi or Rushura

Nigeria or Mulundi Aipin valenca or Msitu Zanzibar

Mulundi Nigeria or Msitu zanzibar

Mulundi Nigeria

Aipin valenca or Msitu zanzibar Mulundi

Mulundi

Mulundi

Nigeria or Msitu zanzibar

Msitu zanzibar

16

National FSA Training Module 14: Statistical analysis

14.5 An example: maize variety testing in Karagwe District

Background

Maize cultivation in Karagwe District, Kagera Region, is of recent history. The current maize gene-pool is extremely heterogeneous. Most varieties have a long growing cycle of more than

130 days and are susceptible to streak virus, which limits maize production to one season.

Farmers plant maize in the homegarden or in annual crop fields with beans and groundnuts.

Maize planting densities are very low. After a diagnostic survey in Karagwe District the

Farming Systems Research team concluded that:

 Maize is increasing in importance as a food crop, especially for low-resource households as a result of serious decline of banana production.

 Maize is a potential cash crop in remote areas.

 Local maize varieties have low yield potential and long growing cycles.

 Farmers' practices and knowledge of maize growing do not favour high production levels.

A maize research improvement programme was started in three villages that represent different agro-ecological zones. In each village close collaboration was established with groups of farmers who are interested in research. These farmer research groups (FRG) provide trial farmers and form platforms of discussion and feedback. Trials were established to test new maize varieties, fertiliser application levels and some crop husbandry practices.

Materials and methods

The maize variety trial was conducted with seven farmers in each village. These farmers represented different social strata and their fields were selected with the criterion of including all relevant maize production conditions. Varieties tested were Kito (short cycle), TMV-1 (short cycle), Staha (medium cycle), Kilima (long cycle) and UCA (long cycle). No typical farmer variety could be identified because of the high genetic diversity. It was assumed that UCA resembles most characteristics of local varieties. Planting date and population densities were standardised per agro-ecological zone. At four weeks, 25 Kg N/ha in the form of CAN was applied and when farmers observed stalk borers thiodan dust was used to control stalk borers.

Rainfall amounts and distribution was recorded for each village. Field data collected included soil type, soil depth, soil colour, stoniness and slope. Field management data included age of the field, previous crops, number of trees and ant hills in and around the field, and dates of weeding.

Crop data included days to flowering, days to maturity, yield and yield components, height of the plants, grain moisture, and damage due to birds, vermin and diseases. Household and gender characteristics were obtained through informal interviews. Farmer assessment of the trial was performed by visiting all sites with three male and three female farmers. Varieties were compared using pair-wise ranking and farmers indicated the reasons of their preferences.

Results

Statistical analysis showed that the average maize yield was 3490 kh/ha. The varieties Kito,

TMV-1 and STAHA did not differ in average yield, and neither did the varieties Kilima and

UCA. The two categories of varieties differed significantly when a one-way analysis of variance was applied (Table 14.7). Kilima and UCA also obtained higher minimum and maximum yields compared to Kito, TMV-1 and STAHA.

17

National FSA Training Module 14: Statistical analysis

Table 14.7 Results of one-way analysis of variance with B-Tukey test

Treatment Average yield kg/ha Minimum yield kg/ha Maximum yield kg/ha

Kito

TMV-1

STAHA

Kilima

UCA

2690

3170

3170

4170

4080

CV

%

51

32

32

27

29

1090

1310

1330

1950

2130

4980

5770

4960

6670

6690

Site 1

Site 2

Site 3

Site 4

Site 5

Site 6

Site 7

Site 8

Source: Maize variety trial in Karagwe District, 1996

All varieties showed high coefficients of variation (CV) of above 25% suggesting that the response of the varieties to their production environments was not uniform. To investigate this interaction the average yield per site was calculated (Table 14.8). This showed that there was a very pronounced variation in average yields between the sites, ranging from 2510 to 5830 kg/ha.

Table 14.8 Average maize yield per site

Site number Average yield kg/ha

2940

2600

2510

3030

3890

3790

4480

2220

Site number

Site 9

Site 10

Site 11

Site 12

Site 13

Site 14

Site 15

Site

Average yield kg/ha

3240

4930

5830

3140

2770

3320

3960

Source: Maize variety trial in Karagwe District, 1996

The average yield per site indicates the suitability of each site for maize production. It can also be called the Environmental Index (EI). Low EI values indicate environments which are less favourable for maize cultivation, while high EI values reflect more favourable environments.

Varieties do not have to react in a uniform way to the EI. Some varieties may respond better than other varieties in a favourable or unfavourable environment. To investigate this interaction, linear regressions were performed of each variety on the EI. Table 14.9 shows that the resulting yield equations differ very much between the varieties. The equations are plotted in Figure

14. 3.

Table 14.x9 Results of linear regression of varieties on EI

Variety Yield equation R 2 (%)

Kito

TMV-1

STAHA

Kilima

UCA

Y = 0.911*EI - 406

Y = 0.927* EI - 22

Y = 0.951*EI - 208

Y = 0896*EI + 1079

T = 1.251*EI - 229

48

56

64

43

75

Source: Maize variety trial in Karagwe District, 1996

18

National FSA Training Module 14: Statistical analysis

Yield (kg/ha)

UCA

Kilima

TMV-1

STAHA

Kito

Environmental Index (kg/ha)

Figure 14.3 Interaction between maize varieties and maize environmental index

Source: Maize variety trial in Karagwe District, 1996

Figure 14.3 clearly shows the differences in response to various production environments between the varieties. Kito, TMV-1 and STAHA hardly differed in their response and show regression lines which stay close to each other along the trajectory. Lines of Kilima and UCA are situated higher in the graph. Kilima responded better than any other variety to unfavourable environments, while UCA had the highest yield potential.

To explain the variation in EI a multiple regression was done with all non-nominal independent variables measured during the experiment. The results (Table 14.9) show that six variables explained almost 60% of the variation in EI. These variables belong to the (a) abiotic environment (soil texture, stoniness, soil depth), (b) biotic environment (pest incidence) and (c) farmer practices (years after fallow, timing of first weeding).

Table 14.9 Results of multiple regression on EI

Variables

Soil texture

Pest incidence

Stoniness

Number of years after fallow

Timing of first weeding

Soil depth

Cumulative R 2

18.6

37.7

48.7

51.5

53.5

58.6

Source: Maize variety trial in Karagwe District, 1996

Farmer assessment was performed during harvest time. Farmers did not only evaluate yield performance but also other characteristics, such as length of cycle, suitability for roasting, taste, suitability for sale etc. Table 14.10 presents the results of pooled ranking of the varieties per village. It shows that in all villages TMV-1 was the most appreciated variety among the varieties tested. Kilima, Kito and UCA were second in different villages. STAHA was least appreciated in two village and UCA in one village.

19

National FSA Training Module 14: Statistical analysis

Table 14.10 Farmer assessment results of pooled ranking of varieties by village

Variety

Kito

TMV-1

STAHA

Kilima

UCA

3

2

1

Village 1

4

5

1

3

4

Village 2

2

5

1

4

2

Village 3

3

5

5

9

7

Total score

9

15

Source: Maize variety trial in Karagwe District, 1996

Statistical analysis, farmer assessment, and field observations made it possible to summarise the results of this experiment in flexible recommendations of maize varieties. Table 14.11 and

14.12 give examples of such recommendations.

Table 14.11 Flexible maize variety recommendation: variety characteristics

Length of cycle (days)

Kito

90

Resistance to streak virus very low

Resistance to very low

Yield on poor soils

Yield on fertile soils

Suitability for roasting low medium very good

Suitability for ugali

Suitability for marketing medium not good

TMV-1

100 very good medium low medium good good medium

STAHA

110 good medium low medium medium very good medium

Kilima

120

UCA

140 low high high high not good low medium medium very high not good very good very good good good

These final results were promoted to farmers by posters and leaflets, made available at local village stockiest and markets. In 1996 Bukoba farmers purchased 1000 kg of improved seed for the short rains season. The extension service of Bukoba District contributed this success to the diffusion of the flexible recommendations and the availability of seed of all tested varieties.

Now, village information centres called 'Educate Yourself are being established and more posters are being diffused.

Table 14.12 Flexible maize variety recommendation: conditional use of varieties

If you …..

Have a field with poor soil fertility

Have a field with good soil fertility

Then sow.….

Kilima

Kilima, STAHA or UCA

But don't sow…..

Kito, TMV-1 or

STAHA

Kito or TMV-1

Have no time for early weeding

Need an early, average maize harvest

Want to produce early maize for roasting

Want to produce maize for the market

Want to intercrop in annual fields

Want to plant maize in homegarden

TMV-1, Kilima, UCA, STAHA Kito

TMV-1 Kilima or UCA

Kito or TMV-1 Kilima or UCA

Kilima or UCA

Kilima, TMV-1, or STAHA

Kilima or UCA

Kito

Kito

Kito

Source: Ndege and de Steenhuijsen Piters, 1995

Discussion and conclusions

It is remarkable that researchers and farmers differed in their appreciation of maize varieties.

Researchers concluded that Kilima is a very suitable variety, and that UCA has a high yield potential. Farmers unanimously selected TMV-1 as the most preferred variety. The reason of this discrepancy is the lack of market for maize in Karagwe District. Prices are so low that maize is almost uniquely cultivated for home consumption in times of banana scarcity.

Obtaining maximum yield is thus not an issue for most farmers. More important is the length of

20

National FSA Training Module 14: Statistical analysis the growing cycle in combination with a reasonable yield and good taste. TMV-1 is the best compromise among the varieties.

This preference for TMV-1 does not imply that no other varieties are needed. A minority of farmers has ways to sell their maize and appreciate Kilima or UCA. Some households will grow

Kito to obtain some early maize when other crops are still maturing. Although there is a general preference for TMV-1 flexible recommendations are needed to satisfy all demands of various maize producers.

Farmer’s involvement proved to be crucial to obtain a complete understanding of maize variety performance and their perspectives. In general, researchers tend to concentrate on few criteria of comparison and often have a bias towards yield. Farmers are more holistic in their evaluation and tend not to give absolute conclusions. They see advantages in more than one variety or treatment and will adopt them accordingly. The reason for this is the heterogeneity of their fields and the multiple goals of production. This confirms the necessity of flexible recommendations which fit different production environments and different farmers views and goals.

Production environments of maize proved to be very variable. High intra variety variation complicated statistical analysis and ‘blurred’ the distinction between the varieties. Analysis of the Environmental Index, also called Adaptability analysis, proved to be very effective in explaining variety yield performance within their heterogeneous environment.

Techniques of analysis define the conclusions which are derived from a trial. This is clearly shown in Table 14.x. Analysis of variance would not have given satisfactory results because of the high yield variation of the individual varieties. Maybe by applying some complicated statistical analysis it would have been possible to obtain some results, but most agronomists would have repeated the trial. Not to fall in the same gap again, they probably would have increased their control over the trial by excluding the causes of variation.

21

National FSA Training Module 14: Statistical analysis

Table 14.x Relationship between technique of analysis and type of conclusion

Technique of analysis

ANOVA

Stability analysis

Farmer assessment

Adaptability Analysis

Conclusion

Too much variation (high CVs of variety yields), repeat the trial and control the production conditions more

Kilima is the most stable variety: it is recommended for diffusion

TMV-1 is the variety which combines short cycle with acceptable yield.

It solves the needs of most farmers

All varieties have one or more characteristics which makes them useful in specific environments

Source: Ndege and de Steenhuijsen Piters, in press

Stability analysis would have given good results, but the conclusion would have been that one variety, Kilima, is most stable and should therefore be diffused to farmers. This would have conflicted with the selection by the farmers, who prefer TMV-1. However, farmers tend to base their judgement on the status quo and often lack insights in future developments. Adaptability

Analysis describes the performance of different varieties in various environments, which makes planning for immediate and future distribution possible.

Adaptability analysis has, however, some limitations:

 It has a bias towards yield and does not include other factors considered important by farmers and consumers

 It is not a test and one should be careful with attributing too much importance to small differences

 Adaptability analysis is based on regression models, which estimate the relationship between an individual variety and its environment. These estimates need a large number of sites to obtain good accuracy.

 Environments are characterised by calculating the average yield per site. The resulting

Environmental Index is not independent of the individual variety yields and this imposes some statistical limitations. However, Eberhard and Russell stated in 1966 the following:

‘An index independent of the experimental varieties and obtained from environmental factors such as rainfall, temperature, and soil fertility would be desirable. Our present knowledge of the relationship of these factors and yield does not permit the computation of such an index. Until we can measure such factors in order to formulate a mathematical relation with yield, the average yield of the varieties in a particular environment must suffice’ (Eberhard and Russell, 1966).

At present Eberhard and Russell’s statement is still valid. Agricultural research has not developed simple tools to characterise environmental factors, and average yield is still the best index available. However, progress has been made by diversifying statistical analysis and including farmers into research. This case-study has shown that achievements in design and analysis of research have improved research efficiency. Fortunately, this did not lead to increased complication of methods of analysis. New approaches should be widely adoptable, and should not become the domain of few specialists. Impact on agricultural production will be achieved by adoption of simple, but creative tools which increase farmer participation and which facilitate researcher understanding. Now that farmer involvement in research has been widely adopted, research should increase its efforts to develop methods which fit the new situation. Only by adopting new methods of design and analysis will researchers’ awareness of the need to involve farmers really improve research quality and effectiveness.

22

National FSA Training Module 14: Statistical analysis

References and selected reading

Eberhard, S.A and W.A Russell (1996). Stability parameters for comparing varieties. In: Crop

Science 6:36-40.

Fischer Box J. (1978). R.A. Fisher: The Life of a Scientist . John Wiley & Sons, New York,,

Brisbane, Toronto.

Harris, J.A. (1920). Practical universality of field heterogeneity as a factor-influencing plot yields. Journal of Agricultural Research, 19 (7): 279-315.

Hildebrand, P.J. (1984). Modified Stability Analysis of Farmer Managed, On-farm Trials. In:

Agronomy Journal 76:271-274.

Hildebrand, P.J. and J.T. Russell (1994) . Adaptability analysis for diverse environments.

Paper presented at the American Society of Agronomy Meeting, Seattle, Washington.

Hildebrand, P.J. and J.T. Russell (1996). Adaptability Analysis; a method for the design, analysis and interpretation of on-farm research-extension . Iowa State University Press/Ames.

McBratney, A.B. (1992). On variation, Uncertainty and Informatics in environmental Soil

Management. Australian Journal of Soil Research , 30: 913-935.

Montgomery, E.G. (1913). Experiments in Wheat Breeding: experimental error in the nursery and variation in nitrogen and yield. U.S Department o f Agriculture, Bureau of Plant Industry

Bulletin , No. 269.

Ndege, L. and B. de Steenhuijsen Piters (1995). Farmers’ Choice: flexible maize recommendations in Kagera Region, Tanzania.

Paper presented at the Fifth Regional

Conference of the Southern African Association for farming Systems Research-Extension, held

23-25 September 1996 at Arusha, Tanzania.

Ndege, L. and B, de Steenhuijsen Piters, in press. Environment-technology interaction: an approach to increase research efficiency. Paper submitted to African Crop Science Journal.

Paoletti, M.G. and D. Pimentel (eds) (1990). Biotic Diversity in Agroecosystems. Agriculture,

Ecosystems and Environment, Special Issue . Papers from a Symposium on agroecology and conservation issues in tropical and temperate regions, 26-29 September, 1990, Padova, Italy.

Salmon, S.C. & A.A. Hanson (1964). The principles and practices of agricultural research.

Leonard Hill, London.

Smith, L.H. (1909). Plot arrangement for variety experiments with corn. Proceedings of the

American Society of Agronomy , 6: 84-89.

Steenhijsen Piters, B. de (1995). Diversity of Fields and Farmers; explaining yield variations in

Northern Cameroon . PhD thesis, Wageningen Agricultural University, Netherlands.

Student (1908). The Probable Error of a Mean. Biometrika , 6 (1): 1-25.

Stroup, W.W., P.E. Hildebrand and C.A. Francis (1991). Farmers Participation for More

Effective Research in Sustainable Agriculture. In: Staff Papers Series, Institute of Food and

Agricultural Sciences, University of Florida.

23

National FSA Training Module 14: Statistical analysis

Uven, M.J. van (1935). Mathematical Treatment of the Results of Agricultural and Other

Experiments . Noordhoff N.V., Groningen-Batavia.

Vieira, S.R., J.L. Hatfield, D.R. Nielsen & J.W. Biggar (1982). Geostatistical theory and application to variability of some agronomical properties. Hilgardi, 51 (3): 1-75.

Zimmerer, K.S. (1991). Managing diversity in potato and maize fields of the Peruvian Andes.

Journal of Ethnobiology , 11 (1): 23-49.

24

Download