1 APPLICATION OF STATISTICAL AND NEURAL NETWORK MODEL FOR

advertisement
1
APPLICATION OF STATISTICAL AND NEURAL NETWORK MODEL FOR
OIL PALM YIELD STUDY
AZME BIN KHAMIS
Faculty of Science
Universiti Teknologi Malaysia
DECEMBER 2005
2
PSZ 19:16(Pind.1/97)
UNIVERSITI TEKNOLOGI MALAYSIA
BORANG PENGESAHAN STATUS TESISυ
JUDUL:
APPLICATION OF STATISTICAL AND NEURAL
NETWORK MODEL FOR OIL PALM YIELD STUDY
SESI PENGAJIAN: 2005/2006
Saya
AZME BIN KHAMIS
(HURUF BESAR)
mengaku membenarkan tesis (PSM/Sarjana/Doktor Falsafah)* ini disimpan di Perpustakaan
Universiti Teknologi Malaysia dengan syarat-syarat kegunaan seperti berikut;
1.
2.
3.
4.
Tesis adalah hakmilik Universiti Teknologi Malaysia.
Perpustakaan Universiti Teknologi Malaysia dibenarkan membuat salinan untuk tujuan
pengajian sahaja.
Perpustakaan dibenarkan membuat salinan tesis ini sebagai bahan pertukaran antara
institusi pengajian tinggi.
**Sila tandakan (√)
√
SULIT
(Mengandungi maklumat yang berdarjah keselamatan atau
kepentingan Malaysia seperti yang termaktub di dalam
AKTA RAHSIA RASMI 1972)
TERHAD
(Mengandungi maklumat TERHAD yang telah ditentukan
oleh organisasi/badan di mana penyelidikan dijalankan)
TIDAK TERHAD
Disahkan oleh
___________________________
___________________________
(TANDATANGAN PENULIS)
(TANDATANGAN PENYELIA)
Alamat Tetap:
NO. 11 JALAN MANIS 7,
TAMAN MANIS 2,
86400 PARIT RAJA, BATU PAHAT, JOHOR.
Tarikh: _____________________________
CATATAN:
ASSOC. PROF. DR. ZUHAIMY HJ ISMAIL
Nama Penyelia
Tarikh: ______________________
* Potong yang tidak berkenaan.
** Jika tesis ini SULIT atau TERHAD, sila lampirkan surat daripada pihak
berkuasa/organisasi berkenaan dengan menyatakan sekali sebab dan tempoh tesis ini perlu
dikelaskan sebagai SULIT atau TERHAD
υ Tesis dimaksudkan sebagai tesis bagi Ijazah Doktor Falsafah dan Sarjana secara
penyelidikan, atau disertasi bagi pengajian secara kerja kursus dan penyelidikan, atau
Laporan Projek Sarjana Muda (PSM).
3
“I/We* hereby declare that I/we* have read this thesis and in my/our*
opinion this thesis is sufficient in terms of scope and quality for the
award of the degree of Doctor of Philosophy”
Signature
: ……………………………………………….
Name of Supervisor I
: Associate Professor Dr. Zuhaimy Bin Hj Ismail
………………………………………………..
Date
: ……………………………………………….
Signature
: ……………………………………………….
Name of Supervisor II
: Dr. Khalid Bin Haron
……………………………………………….
Date
: ……………………………………………….
Signature
: ………………………………………………
Name of Supervisor III
: ………………………………………………
Date
: ………………………………………………
* Delete as necessary
4
BAHAGIAN A – Pengesahan Kerjasama*
Adalah disahkan bahawa projek penyelidikan tesis in telah dilaksanakan melalui kerjasama
antara …………………………………... dengan …………………………………………..
Disahkan oleh:
Tandatangan
: ……………………………………………....
Nama
: ………………………………………………
Jawatan
: ………………………………………………
Tarikh: ………………..
(Cop rasmi)
* Jika penyelidikan tesis/projek melibatkan kerjasama
_
BAHAGIAN B – Untuk Kegunaan Pejabat Sekolah Pengajian Siswazah
Tesis ini telah diperiksa dan diakui oleh:
Nama dan Alamat
Pemeriksa Luar
:
Prof. Dr. Zainudin Bin Hj. Jubok
Pengarah
Pusat Pengurusan Penyelidikan & Persidangan
Universiti Malaysia Sabah
Kampus Teluk Sepanggor
(Beg Berkunci 2073)
88999 Kota Kinabalu, Sabah
:
Prof Madya Dr. Jamalludin Bin Talib
Nama dan Alamat
Pemeriksa Dalam I
Fakulti Sains
UTM, Skudai
Nama Penyelia Lain (jika ada)
: ..…………………………………………………...
…………………………………………………….
…………………………………………………….
Disahkan oleh Penolong Pendaftar di SPS:
Tandatangan
: ……………………………………………
Nama
: …………………………………….……...
Tarikh: …………….
5
APPLICATION OF STATISTICAL AND NEURAL NETWORK MODEL FOR
OIL PALM YIELD STUDY
AZME BIN KHAMIS
A thesis submitted in fulfilment of the
requirements for the award of the degree of
Doctor of Philosophy
Faculty of Science
Universiti Teknologi Malaysia
DECEMBER 2005
6
I declare that this thesis entitled “Application of Statistical and Neural Network
Model for Oil Palm Yield Study” is the result of my own research except as cited in
the references. The thesis has not been accepted for any degree and is not
concurrently submitted in candidature of any other degree.
Signature
:
………………………………...
Name
:
AZME BIN KHAMIS
Date
:
…………………………………
7
ACKNOWLEDGEMENTS
‫ﺑﺴﻢ اﷲ اﻟﺮﺣﻤﻦ اﻟﺮﺣﻴﻢ‬
In the name of Allah, the most Beneficent and the most Merciful.
I would like to express my gratitude to my supervisor, Associate Professor
Dr. Zuhaimy Hj Ismail for his encouragement, patience, constant guidance,
continuous support and assistance all through out the period. Most of his invaluable
comments and suggestion would be preciously valued. His dedication to work and
perfectionism will be always be remembered and learnt as a basic necessity of a
successful scholar. I am also very grateful to my co-supervisor, Dr. Khalid Haron
from Malaysian Oil palm Board (MPOB), Kluang Station for his, comments,
suggestions and sincere support during this endeavour. I also would like to thank
Haji Ahmad Tarmizi Mohammed from MPOB Bangi for his motivation, fruitful
discussion and valuable comments.
I am especially grateful to my beloved wife, Hairani Razali, for her patience,
encouragement and constant support she gives. She is my ‘co-pilot’ and this study
would not possible without her. To my two lovely sons, Amirul Fikri and Amirul
Farhan, and my lovely daughter Amirah Afiqah: you’re daddy’s source of
inspiration. Many thanks go to my beloved parents, who constantly and remotely
gave me encouragement and advice.
I am grateful to the Kolej Universiti Teknologi Tun Hussein Onn and
Malaysian Government for the sponsorship given. Lastly, many thanks go to those
who have contributed directly and indirectly to the completion of my work in the
Universiti Teknologi Malaysia.
8
ABSTRACT
This thesis presents an exploratory study on modelling of oil palm (OP) yield
using statistical and artificial neural network approach. Even though Malaysia is one
of the largest producers of palm oil, research on modelling of OP yield is still at its
infancy. This study began by exploring the commonly used statistical models for
plant growth such as nonlinear growth model, multiple linear regression models and
robust M regression model. Data used were OP yield growth data, foliar
composition data and fertiliser treatments data, collected from seven stations in the
inland and coastal areas provided by Malaysian Palm Oil Board (MPOB). Twelve
nonlinear growth models were used. Initial study shows that logistic growth model
gave the best fit for modelling OP yield. This study then explores the causality
relationship between OP yield and foliar composition and the effect of nutrient
balance ratio to OP yield. In improving the model, this study explores the use of
neural network. The architecture of the neural network such as the combination
activation functions, the learning rate, the number of hidden nodes, the momentum
terms, the number of runs and outliers data on the neural network’s performance
were also studied. Comparative studies between various models were carried out.
The response surface analysis was used to determine the optimum combination of
fertiliser in order to maximise OP yield. Saddle points occurred in the analysis and
ridge analysis technique was used to overcome the saddle point problem with several
alternative combinations fertiliser levels considered. Finally, profit analysis was
performed to select and identify the fertiliser combination that may generate
maximum yield.
9
ABSTRAK
Tesis ini mempersembahkan kajian penerokaan terhadap pemodelan hasil
kelapa sawit melalui pendekatan statistik dan rangkaian neural buatan. Malaysia
adalah negara pengeluar minyak kelapa sawit terbesar, namun begitu penyelidikan
mengenai pemodelan hasil kelapa sawit masih berada diperingkat awal. Kajian ini
dimulakan dengan penerokaan terhadap model statistik yang popular untuk
pertumbuhan pokok seperti model pertumbuhan taklinear, analisis regresi linear
berganda dan analisis regresi-M teguh. Data hasil kelapa sawit, data kandungan
nutrien dalam daun dan data ujikaji pembajaan yang dikumpulkan daripada tujuh
buah stesen di kawasan pedalaman dan tujuh buah stesen di kawasan tanah lanar
pantai telah disediakan oleh Lembaga Minyak Sawit Malaysia (MPOB). Dua belas
model pertumbuhan taklinear telah dipertimbangkan. Kajian awal menunjukkan
model pertumbuhan taklinear logistik adalah yang terbaik untuk memodelkan
pertumbuhan hasil kelapa sawit. Kajian ini diteruskan dengan menerokai hubungan
di antara hasil kelapa sawit dengan kandungan nutrien dalam daun dan nisbah
keseimbangan nutrien. Bagi mempertingkatkan keupayaan model, kajian ini
menerokai penggunaan rangkaian neural. Kajian ini juga mengkaji kesan rekabentuk
rangkaian neural seperti gabungan fungsi penggiat, kadar pembelajaran, bilangan nod
tersembunyi, kadar momentum, bilangan larian dan data lampau terhadap prestasi
rangkaian neural. Kajian perbandingan di antara beberapa model yang dikaji telah
dilakukan. Analisis satah sambutan telah digunakan untuk menentukan nisbah baja
yang paling optimum bagi menghasilkan hasil kelapa sawit yang maksimum.
Masalah titik pelana berlaku di dalam analisis dan analisis permatang telah
digunakan untuk mengatasi masalah tersebut dengan ia menyediakan beberapa
pilihan kombinasi baja yang boleh dipertimbangkan. Akhir sekali, analisis
keuntungan dilakukan untuk memilih dan mengenalpasti kombinasi baja yang boleh
menghasilkan keuntungan maksimum.
10
TABLE OF CONTENTS
CHAPTER
1
TITLE
PAGE
TITLE
i
DECLARATION
ii
ACKNOWLEDGEMENTS
iii
ABSTRACT
iv
ABSTRAK
v
TABLE OF CONTENTS
vi
LIST OF FIGURES
xii
LIST OF TABLES
xvii
LIST OF SYMBOLS
xxi
LIST OF APPENDICES
xxv
INTRODUCTION
1.1 Introduction
1
1.2 Research Background
1
1.3 Brief History of Oil Palm Industry in Malaysia
3
1.4 Problem Descriptions
7
1.5 Research Objectives
8
1.6 Scope of The Study
9
1.6.1 Data Scope
9
1.6.2 Model Scope
11
1.6.3 Statistical Testing Scope
12
1.7 Data Gathering
13
1.8 Leaf Analysis
14
1.9 Research Importance
17
1.10 Research Contribution
18
1.11 Thesis Organisation
19
11
2
REVIEW OF THE LITERATURE
2.1 Introduction
21
2.2 Oil Palm Yield Modelling
21
2.3 Nonlinear Growth Model
27
2.4 Application of Neural Network Modelling
30
2.4.1 Neural Network in Science and Technology
31
2.4.2 Neural Network in Economy
32
2.4.3 Neural Network in Environmental and
34
Health
35
2.4.4 Neural Network in Agriculture
2.5 Response Surface Analysis
2.6
3
37
38
Summary
RESEARCH METHODOLOGY
3.1 Introduction
43
3.2 Data Analysis
43
3.3 Modelling
45
3.3.1 Nonlinear Growth Models
3.3.1.1 Nonlinear Methodology
3.3.2 Regression Analysis
3.3.2.1 Least Squares Method
45
47
51
51
3.3.3 Robust M-Regression
53
3.3.4 Neural Networks Model
55
3.3.4.1 Introduction to Neural Network
56
3.3.4.2 Fundamentals of Neural Network
57
3.3.4.3 Processing Unit
58
3.3.4.4 Combination Function
58
3.3.4.5 Activation Function
59
3.3.4.6 Network Topologies
62
12
3.3.4.7 Network Learning
64
3.3.4.8 Objective Function
65
3.3.4.9 Basic Architecture of Feed-Forward
Neural Network
3.3.5 Response Surface Analysis
66
72
3.3.5.1 Introduction
73
3.3.5.2 Response Surface: First Order
73
3.3.5.3 Response Surface: Second Order
76
3.3.5.4 Stationary Point
77
3.3.5.5 Ridge Analysis
79
3.3.5.6 Estimate the standard error of
predicted response
3.4 Summary
4
80
81
MODELLING OIL PALM YIELD GROWTH
USING NONLINEAR GROWTH MODEL
4.1 Introduction
4.2 The Nonlinear Model
4.3 The Method of Estimation
4.4 Partial Derivatives for The Nonlinear Models
4.5 Results and Discussion
4.6 Conclusion
5
MODELLING OIL PALM YIELD USING
MULTIPLE LINEAR REGRESSION AND
ROBUST M-REGRESSION
5.1 Introduction
82
84
85
87
93
104
13
5.2 Model Development
105
5.3 Results and Discussion
105
5.3.1 Multiple Linear Regression
107
5.3.2 Residual Analysis for MLR
107
5.3.3 Robust M-Regression
110
5.3.4 Residual Analysis for RMR
115
5.4 Conclusion
116
119
6
NEURAL NETWORK MODEL FOR OIL PALM
YIELD
6.1 Introduction
122
6.2 Neural Network Procedure
123
6.2.1 Data Preparation
123
6.2.2 Calculating Degree of Freedom
124
6.3 Computer Application
125
6.4 Experimental Design for Neural Network
129
6.4.1 Experiment 1
131
6.4.2 Experiment 2
131
6.4.3 Experiment 3
132
6.5 Results and Discussion
133
6.5.1 Statistical Analysis
133
6.5.2 Neural Network Performance
138
6.5.3 Residual Analysis
146
6.5.4 Results of Experiment 1
149
6.5.5 Results of Experiment 2
149
6.5.6 Results of Experiment 3
149
6.6 Comparative Study on Oil Palm Yield Modelling
155
6.7 Conclusion
167
14
7
THE APPLICATION OF RESPONSE SURFACE
ANALYSIS IN MODELLING OIL PALM YIELD
7.1 Introduction
7.2 Response Surface Analysis
7.3 Data Analysis
7.4 Numerical Analysis
7.4.1 Canonical Analysis for Fertilizer Treatments
7.4.2 Ridge Analysis for Fertilizer Treatments
7.5 Economic Analysis
7.5.1 Profit Analysis
7.6 Conclusion
8
169
169
172
173
174
179
186
187
195
SUMMARY AND CONCLUSION
8.1 Introduction
196
8.2 Results and Discussion
196
8.2.1 Initial Exploratory Study
197
8.2.2 Modelling Using Neural Network
201
8.2.3 Modelling Using Response Surface Analysis
208
8.3 Conclusion
211
8.4 Areas for Further Research
211
REFERENCES
214
Appendices A - U
231
15
LIST OF TABLES
TITLE
TABLE NO.
1.1
PAGE
The optimum value of nutrient balance ratio, NBR for
foliar analysis
17
2.1
The summary of the literature reviews in this study
39
3.1
Nonlinear mathematical models considered in the study
50
3.2
Summary of the data set types and research approaches
considers in this study
4.1
Partial derivatives of the Logistic and Gompertz and von
Bertalanffy growth models
4.2
4.4
87
Partial derivatives of the Negative exponential,
Monomolecular, log-logistic and Richard’s growth models
4.3
81
88
Partial derivatives of the Weibull, Schnute and MorganMercer-Flodin growth models
89
Partial derivatives of the Champan-Richard and Stannard
90
growth models
4.5
Parameter estimates of the logistic, Gompertz, negative
exponential, monomolecular, log-logistic, Richard’s and
Weibull growth models for yield-age relationship
94
16
4.6
Parameter estimates of the MMF, von Bertalanffy,
Chapman-Richard and Stannard growth models for yieldage relationship
4.7
Asymptotic correlation for each nonlinear growth models
fitted
4.8
95
96
The actual and predicted values of FFB yield, the
associated measurement error and correlation coefficient
between the actual and predicted values for Logistic ,
Gompertz, von Bertalanffy, negative exponential, mono
molecular and log-logistic growth models
4.9
98
The actual and predicted values of FFB yield, the
associated measurement error and correlation coefficient
between the actual and predicted values for Richard’s ,
Weibull, MMF, Chapman-Richard, Chapman-Richard*
(with initial) and Stannard growth models
4.10
99
The parameter estimates an asymptotic correlation for von
Bertalanffy and Chapman-Richard when an initial growth
response data point is added
4.11
The number of iteration and the root mean squares error for
nonlinear growth models consider in this study
5.1
119
The regression equation for the inland and coastal station
using MNC and NBR as independent variables
5.3
104
The regression equations and R2 values for the inland and
coastal areas
5.2
103
Regression equation using robust M-regression for the
114
17
inland and coastal areas
6.1
The F statistics value for ANOVA for different activation
functions used for inland area
6.2
137
Mean squares error for training, validation, testing and
average of the neural networks model in the inland area
6.6
136
Duncan test for the average of MSE for homogeneous
subsets for the inland and coastal areas
6.5
135
The Chi-Square value of MSE testing for the inland and
coastal areas
6.4
134
The F statistics value for ANOVA for different activation
functions used for the coastal area
6.3
116
138
Mean squares error for training, validation, testing and
average of the neural networks model in the coastal area
139
6.7
The correlation coefficient of the neural network model
140
6.8
The MAPE values of the neural network model
141
6.9
The t-statistic values in the training data
152
6.10
The t-statistic values for the test data
155
6.11
The MSE, RMSE, MAE and MAPE for MLR, MMR and
neural networks performance for inland
6.12
157
The MSE, RMSE, MAE and MAPE for MLR, MMR and
neural networks performance for coastal area
158
18
6.13
The correlation changes from the MLR and MMR models
to neural network model
6.14
The performance changes of the MAPE from the MLR and
MMR to the neural network model
7.1
175
The eigenvalues and predicted FFB yield at stationary
point for each critical fertilizer level in inland area
7.4
174
The average of FFB yield, MSE, RMSE and R2 values for
coastal area
7.3
164
The average of FFB yield, MSE, RMSE and R2 values for
inland area
7.2
163
176
The eigenvalues, the predicted FFB yield at the stationary
points and critical values of fertiliser level for CLD1 and
CLD2 stations
7.5
177
The eigenvalues, the predicted FFB yield at the stationary
points and critical values of fertiliser level for CLD3,
CLD4, CLD5, CLD6 and CLD7
7.6
The estimated FFB yield and fertiliser level at certain
radius for stations ILD3 and ILD4 in the inland area
7.7
181
The estimated FFB yield and fertiliser level at certain radii
for station ILD7
7.9
180
The estimated FFB yield and fertiliser level at certain
radius for stations ILD5 and ILD6 in the inland area
7.8
178
The estimated FFB yield and fertiliser level at certain radii
182
19
for stations CLD1 and CLD2 in the coastal area
7.10
The estimated FFB yield and fertiliser level at the certain
radii for stations CLD4 and CLD5 in the coastal area
7.11
198
The RMSE, MAPE and R2 values for the MLR and MMR
modeling for the inland and coastal areas
8.3
193
The adequacy of fit measurement used for the nonlinear
growth models
8.2
192
The estimated FFB yield and the foliar nutrient
composition levels in (%) for the coastal area
8.1
189
The estimated FFB yield and the foliar nutrient
composition levels in (%) for inland area
7.15
186
The fertiliser level, average estimated of FFB yield and
total profit for the inland and coastal areas
7.14
185
The estimated FFB yield and fertiliser level at the certain
radii for station CLD7 in the coastal area
7.13
184
The estimated FFB yield and fertiliser level at the certain
radii for stations CLD5 and CLD6 in the coastal area
7.12
183
199
The RMSE, MAPE and R2 values for the MLR and MMR
modeling for the coastal area
200
8.4
The F values of the analysis of variance for different
activation functions for the inland and coastal areas
8.5
202
The MAPE values and the correlation of the neural
network models for the inland and coastal areas
203
20
8.6
The F value of analysis of variance for Experiment 1, 2 and
3
8.7
204
The comparison of the MAPE values and the correlation
values among the MLR, MMR and NN models for inland
and coastal areas
8.8
The accuracy of the MLR, MMR, NN models and the
accuracy changes for the inland area
8.9
209
The fertiliser level, average estimated of FFB yield and
total profit for the coastal area
8.12
207
The fertiliser level, average estimated of FFB yield and
total profit for the inland area
8.11
207
The accuracy of the MLR, MMR, NN models and the
accuracy changes for the coastal area
8.10
205
209
The average estimated of the FFB yield and the foliar
nutrient composition levels for the inland and coastal areas
210
21
LIST OF FIGURES
TITLE
FIGURE NO.
1.1
Annual production of crude palm oil (1975-2003)
including Peninsular Malaysia, Sabah and Sarawak
1.2
PAGE
4
Oil palm planted area: 1975 – 2003 (hectare) including
Peninsular Malaysia, Sabah and Sarawak
5
1.3
Annual export of palm oil: 1975 – 2003 (in tonnes)
5
1.4
World major producers of palm oil (‘000 tonnes)
6
1.5
World major exporter of palm oil and including reexporting country (*)
1.6
6
Summary of research framework and research
methodology used in this study
10
3.1
Data analysis procedure used in this study
44
3.2
FFB yield growth versus time (year of harvest)
46
3.3
Processing unit
58
3.4
Identity function
60
3.5
Binary step function
60
22
3.6
Sigmoid function
61
3.7
Bipolar sigmoid function
61
3.8
Feed-forward neural network
62
3.9
Recurrent neural network
63
3.10
Supervised learning model
65
3.11
Backward propagation
70
3.12
The descent vs. learning rate and momentum
72
4.1
Residual plot for Logistic, Gompertz, von Bertalanffy,
Negative exponential, Monomolecular and Log logistic
growth models
4.2
100
Residual plot for Richard’s, Weibull, Morgan-MercerFlodin, Chapman-Richard, Chapman-Richard* and
Stannard growth models
5.1
The error distribution plots of MLR model in coastal
stations
5.2
5.5
112
The error distribution plots of RMR model in inland
stations
5.4
111
The error distribution plots of MLR model in inland
stations
5.3
101
117
The error distribution plots of RMR model in coastal
stations
118
The R2 value for each model proposed for inland area
119
23
5.6
The R2 value for each model proposed for coastal area
6.1
Three layers fully connected neural networks with five
input nodes and one output node
6.2
144
The actual and predicted FFB yield for CLD4, CLD5,
CLD6 and CLD7 using the NN model
6.10
143
The actual and predicted FFB yield for ILDT, CLD1,
CLD2 and CLD3 stations using the NN model
6.9
142
The actual and predicted FFB yield for ILD4, ILD5,
ILD6 and ILD7 stations using the NN model
6.8
130
The actual and predicted FFB yield for ILD1, ILD2 and
ILD3 stations using the NN model
6.7
129
The three layers fully connected neural networks with
nine input nodes and one output node
6.6
128
The correlation coefficient between the actual and
predicted value
6.5
128
The mean squares error for training, validation and
testing
6.4
125
The early stopping procedure for feed-forward neural
network
6.3
120
145
The actual and predicted FFB yield for CLDT using the
NN model
146
24
6.11
The error distribution plot of neural network model for
the inland stations
6.12
The error distribution plots of neural network model for
the coastal stations
6.13
160
Comparison of the MAPE values between MLR, MMR
and NN for inland area
6.20
159
The correlation coefficient from the MLR, MMR and
NN models for coastal area
6.19
153
The correlation coefficient from the MLR, MMR and
NN models for inland area
6.18
153
The MSE values for different levels of the magnitudeoutliers in the test data
6.17
151
The MSE values for different levels of the percentageoutliers in the test data
6.16
150
The MSE values for different levels of the magnitudeoutliers in the training data
6.15
148
The MSE values for different levels of the percentageoutliers in the training data
6.14
147
160
Comparison of the MAPE values between MLR, MMR
and NN for coastal area
161
6.21
Comparison of the accuracy of models for inland area
165
6.22
Comparison of the accuracy of models for coastal area
165
25
6.23
The percentage changes of the model accuracy for
inland area
6.24
The percentage changes of the models accuracy for
coastal area
7.1
166
166
The response surface plots for fertiliser treatments in
ILD1 and ILD2 stations in inland and CLD2 and CLD7
stations in coastal area
7.2
171
Data analysis procedure in obtaining the optimum level
of fertiliser level and foliar nutrient composition
172
7.3
The fertiliser levels for each station in the inland area
190
7.4
The fertiliser levels for each station in the coastal area
191
7.5
The foliar nutrient composition levels for each station
in the inland area
7.6
The foliar nutrient composition levels for each station
in the coastal area
7.7
8.1
192
194
Comparison between the N and K fertiliser level needs
by oil palm for the coastal and inland areas
194
The factors that contribute to oil palm yield production
213
26
LIST OF SYMBOLS
FFB
-
Fresh Fruit Bunches
FELDA
-
Federal Land Development Authority
RISDA
-
Rubber Industry Smallholders Development Authority
SADC
-
State Agriculture Development Corporations
FELCRA
-
Federal Land Consolidation and Rehabilitation Authority
LSU
-
Leaf Sampling Unit
NN
-
Neural Network
MLR
-
Multiple Linear Regression
RMR
-
Robust M-Regression
RSA
-
Response Surface Analysis
MSE
-
Mean Square Error
RMSE
-
Root Mean Square Error
MAPE
-
Mean Absolute Percentage Error
N
-
Nitrogen
P
-
Phosphorus
K
-
Potassium
Ca
-
Calcium
Mg
-
Magnesium
TLB
-
Total Leaf Basis
NBR
-
Nutrient Balance Ratio
CLP
-
Critical Leaf Phosphorus Concentration
MNC
-
Major Nutrient Component
AS
-
Ammonium Sulphate
CIRP
-
Christmas Island Rock Phosphate
KIES
-
Kieserite
27
LIST OF APPENDICES
APPENDIX
TITLE
PAGE
A
The list of oil palm experimental stations
231
B
The rate and actual value of fertiliser (kg/palm/year)
232
C
Summary of macro nutrients needed by plants
234
D
The list of paper published from 2001 until Now
236
E
The ridge analysis
239
F
Nonlinear least squares iterative phase, nonlinear
least squares summary statistics and normal
probability plot for the nonlinear growth models
G
240
The parameters estimate using multiple linear
regression for MNC as independent variables for
255
inland area
H
The parameters estimate using multiple linear
regression for MNC as independent variables for
256
coastal area
I
Normal probability plot of multiple linear regression
for the inland area
J
257
Normal probability plot of multiple linear regression
for the coastal area
258
28
K
The parameters estimate using multiple linear
regression using MNC and NBR as independent
variables for the coastal area
L
259
The parameters estimate using multiple linear
regression using MNC and NBR as independent
variables for the inland area
260
M
The Q-Q plot for inland stations
261
N
The Q-Q plot for coastal stations
262
O
Example of the Matlab programming for neural
network application
P
Graphical illustration for the best regression line
fitting for inland stations
Q
276
The calculation of total profit (RM) for the inland
stations according to each radius
U
274
The MSE, RMSE, MAE and MAPE values for each
neural network model in the coastal area
T
270
The MSE, RMSE, MAE and MAPE values for each
neural network model in the inland area
S
266
Graphical illustration for the best regression line
fitting for coastal stations
R
263
278
The calculation of total profit (RM) for the coastal
stations in the coastal areas according to each radius
281
29
CHAPTER 1
INTRODUCTION
1.1
INTRODUCTION
This chapter presents the introduction to this thesis. It begins by describing
the overall research background followed by a brief history of the oil palm industry
in Malaysia. Research objectives, the scope of this study, research framework and
discussion on the research contribution are also given. Finally, the brief of each
chapter is outlined.
1.2
RESEARCH BACKGROUND
In the oil palm industry, modelling plays an important role in understanding
various issues. It is used in decision making and the advance in computer technology
has created new opportunity for the study of modelling. Modelling can be
categorised into statistical and heuristic modelling. Statistical modelling is defined
as the analysis of the relationship between multiple measurements made on groups of
subjects or objects, and the model usually contains systematic elements and random
effects. As a mathematical aspect, statistical modelling can be defined as a set of
probability distributions on the sample space. Modelling involves the appropriate
application of statistical analysis techniques with certain assumptions on hypothesis
testing, data interpretation, and applicable conclusion.
Statistical analysis requires careful selection of analytical techniques,
verification of assumptions and verification of the data. In conducting statistical
30
analysis, it is normal to begin with the descriptive statistics, graphs, and relationship
plots of the data to evaluate the legitimacy of the data, identify possible outliers and
assumption violations, and form preliminary ideas on variable relationships for
modelling.
The heuristic approach is defined as pertaining to the use of general
knowledge based on experimentation, evaluating possible answers or solutions, or
trial-and-error methods relating to solving problems by experience rather than theory.
Heuristic is also the problem-solving procedure that involves conceiving a
hypothetical answer to a problem at the outset of an inquiry for purposes of giving
guidance or direction to the inquiry. One of the heuristic approaches is the neural
network model, which is based on the rules of thumb and widely used in various
fields. A very important feature of neural networks is their adaptive nature where
‘learning by example’ replaces ‘programming’ in solving problems. This feature
renders these computational models very appealing in application domains, where
one has little or incomplete understanding of the problem to be solved, but where
training data or examples are available.
Neural networks are viable and very important computational models for a
wide variety of problems. These include pattern classification, function
approximation, image processing, clustering, forecasting and prediction. It is
common practice to use the trial and error method to find a suitable neural networks
architecture for a given problem. A number of neural networks are successfully used
and reported in literature (Zuhaimy and Azme, 2001; Zuhaimy and Azme, 2002).
Neural network also has been applied in various fields such as in environmental
(Corne et al., 1998; Hsieh and Tang, 1998; Navone and Ceccatto, 1994), in
economy and management (Boussabaine and Kaka, 1998; Franses and Homelen,
1998; Garcia and Gency, 2000; Indro et al., 1999; Klein and Rossin, 1999b; Tkacz
and Hu, 1999; Yao et al., 2000) and in agronomy (Shearer et al., 1994; Drummond
et al., 1995; Liu et al., 2001; Kominakis et al., 2002; Shrestha and Steward, 2002).
There are different types of the network are perceptron network, multiple
layer perceptron, radial basis function network, Kohonen network, Hopfield network
etc. However, the multiple layer perceptron is widely reported and used neural
31
networks in application. The most popular architecture, in the class of multiple layer
perceptron, is the feedforward neural network.
The developments of models for agriculture are normally divided into three
steps. The first step is to develop a preliminary model, which is inadequate. This
preliminary model does not have to be a good model but it acts as a basis. This leads
to further research, to develop a comprehensive model incorporating all the processes
that appear to be important. Such a model is valuable for research, but far too
complex for everyday use. To overcome this, a set of summary models is produced,
each containing enough detail to answer limited questions. For example, there might
be a summary model to predict the response to fertilisers on different soil types.
Another model might be used to predict cyclic variation in yield. Modelling helps to
make predictions more accurate. There is no doubt that modelling will maintain its
importance in oil palm research as the problems set more complex and difficult. This
study proposes the development of statistical model and neural network in modelling
oil palm yield.
1.3
BRIEF HISTORY OF OIL PALM INDUSTRY IN MALAYSIA
Oil palm (Eleais guineensis. Jacq.), is a plant of African origin and is grown
commercially in Africa. In the early 19th century the oil palm was brought into this
country by the British. The oil palm was first planted in 1848 in Bogor-Indonesia
and in Malaysia in 1870, at the same time rubber seeds were brought in (Hartley,
1977). Due to lower profitability of oil palm in comparison to rubber, the
development of oil palm industry was rather slower. The first commercial planting
of oil palm in Malaysia took place in 1917, six years after its systematic cultivation
in Sumatera. The early planting was undertaken by European plantations, including
Tannamaran Estate in Selangor and Oil Palm Malaya Limited. The 1960s and 1970s
were marked by extensive development of oil palm undertaken largely by private
32
plantations and the Federal Land Development Authority (FELDA). In addition, a
number of State Agriculture Development Corporations (SADC) became involved in
oil palm cultivation after learning about its good prospects. The Rubber Industry
Smallholders’ Development Authority (RISDA) and the Federal Land Consolidation
and Rehabilitation Authority (FELCRA) were also involved in cultivating abandoned
and idle rubber and paddy areas with oil palm (Teoh, 2000).
From year 1975 to year 2000, the worldwide area planted with oil palm
(Elaeis guineensis Jacq.) has increased by more than 150 percent. Most of this
increase has taken place in Southeast Asia, with a spectacular production increase in
Malaysia and Indonesia. The production of crude palm oil (CPO) in 2003 increased
markedly, by 12.1 percent or 1.4 million tonnes to 13.35 million tonnes from 11.91
million tonnes in 2002 (Figure 1.1) (Teoh, 2000).
4000000
3500000
H e cta re
3000000
2500000
2000000
1500000
1000000
500000
03
20
01
20
97
99
19
19
95
19
93
19
91
19
89
19
87
19
85
19
81
83
19
19
79
19
77
19
19
75
0
Year
Figure 1.1: Oil palm planted area: 1975 – 2003 (hectares) including Peninsular
Malaysia, Sabah and Sarawak
(Source: Department of Statistics, Malaysia: 1975-1989; MPOB: 1990-2003)
33
The production of crude palm kernel also rose substantially by 11.6 percent in
to 1.6 million tonnes year 2003 from 1.47 million tonnes in year 2002. The increase
was mainly attributed to the expansion in the matured area (Figure 1.2), favourable
weather conditions and rainfall distribution as well as constant sunshine throughout
the year. Exports of palm oil increased by 12.5 percent or 1.36 million tonnes to
12.25 million tonnes from 10.89 million tonnes in 2002 (Figure 1.3) (MPOB, 2003).
Crude palm oil (tonnes)
16000000
14000000
12000000
10000000
8000000
6000000
4000000
2000000
20
03
20
01
19
99
19
97
19
95
19
93
19
91
19
89
19
87
19
85
19
83
19
81
19
79
19
77
19
75
0
Year
Figure 1.2: Annual production of crude palm oil (1975-2003) including Peninsular
Malaysia, Sabah and Sarawak. (Source: Department of Statistics, Malaysia: 19751989; MPOB: 1990-2003)
14000000
Palm oil (tonnes)
12000000
10000000
8000000
6000000
4000000
2000000
20
03
20
01
19
99
19
97
19
95
19
93
19
91
19
89
19
87
19
85
19
83
19
81
19
79
19
77
19
75
0
Year
Figure 1.3: Annual export of palm oil: 1975 – 2003 in tonnes. (Source: MPOB)
34
Malaysia is the major producer and exporter of palm oil in the world (Teoh,
2000). Figure 1.4 shows Malaysian production of palm oil compared to Indonesia
and other countries from 1999 to 2003. It shows that Malaysia and Indonesia
recorded an increase in production every year. While Figure 1.5 presents the world’s
major palm oil exporters of palm oil from year 1999 to 2003, it also indicates that
Malaysia and Indonesia also recorded the higher volume. In 2003, the Malaysian
palm oil exporting industry has increased by around 12.5 percent to 12,248 million
tonnes, from 10,886 million tones the previous year. Indonesia only recorded a 7.07
percent increase over the same period. The development of the oil palm industry is
growing at a fast rate and requires a lot of research. This study took the challenge to
contribute our knowledge to the development of the oil palm industry.
Production ('000 tonnes
16000
14000
12000
10000
8000
6000
4000
2000
0
Msia
Indon
Nigeria Colomb
Cd'Ivoeir
PNG
Thai
Other
World countries
1999
2000
2001
2002
2003
Figure 1.4: World major producers of palm oil (‘000 tonnes)
Source: Oil World (December 12, 2003), Oil World Annual (1999-2003)
35
Export ('000 tonnes)
14000
12000
10000
8000
6000
4000
2000
0
Msia
Indon
PNG
CColomb Sing*
d'Ivoeir
HK*
Other
World countries
1999
2000
2001
2002
2003
Figure 1.5: World major exporter of palm oil, including re-exporting country (*)
Source: Oil World (December 12, 2003), Oil World Annual (1999-2003)
1.4
PROBLEM DESCRIPTIONS
The problem in modelling oil palm yield growth is that it does not follow a
linear model. It normally follows a nonlinear growth curve. In modelling a
nonlinear curve, the complexity of the problem increases with the increase in the
number of independent variables. The function of a growth curve has a sigmoid
form, ideally its origin is at (0,0), a point of inflection occurring early in the
adolescent stage and either approaching a maximum value, an asymptote or peaking
and falling in the senescent stage (Philip, 1994). Normally, oil palm can be harvested
after three years of planting. The oil palm yield will increase vigorously until the
tenth year of planting. The yield will then increase at a low increment until the
twenty-fifth year. From our exploratory study on modelling practices, little work has
been reported on modelling the oil palm yield growth (Corley and Gray, 1976).
In most cases, researchers focused their study on the effect of environmental
factors, such as evapotranspiration, moisture and rainfall to the oil palm growth.
Chan et al. (2003) conducted a study on the effect of climate change to fresh fruit
bunches (FFB) yield, and found that climate change has significantly affected oil
palm yield. The most popular method used in the oil palm industry is multiple linear
36
regression. This model is used to investigate the causal effect of the independent
variables to the dependent variable. The literature shows that the foliar nutrient
composition can be used as an indicator to estimate the oil palm yield. Nevertheless
the foliar nutrient composition is also dependent on several factors, such as climate,
soil nutrients, fertilisers, pest and diseases, but little had been done on modelling
these factors. This study explores the possibility of improving the model but in
particular, in improving the level of accuracy it can produce. The proposed model
should give smaller error values than previous model (Multiple Linear Regression,
MLR).
The response surface analysis is the technique used to model the relationship
between the response variable (Fresh Fruit Bunch yield, FFB) and treatment factors
(fertilisers). The factor variables are sometimes called independent variables and are
subject to the control by the experimenter. In particular, response surface analysis
also emphasises on finding a particular treatment combination, which causes the
maximum or minimum response. For example, in the oil palm industry there is a
relationship between the response variable (oil palm yield) and the four fertiliser
treatments, namely nitrogen (N), phosphorus (P), potassium (K) and magnesium
(Mg). The expected yield can be described as a continuous function of the
application level of fertiliser used. A continuous second-degree-function (N2, P2, K2
or Mg2) is often a sufficient description of the expected yield over the range of factor
levels applied (Verdooren, 2003). If the fertiliser application rates are greater or
smaller than the optimum application rate it may result in reduced yields. Fertilisers
are wasted if the amount applied is more than the optimum rate. The advantage of
this technique is that the effects of treatment combinations that have not been carried
out in the experiment may still be estimated.
The use of response surface analysis is necessary to obtain the optimum level
of fertiliser requirements. In response surface analysis, the eigenvalues will
determine whether the solution gives a maximum, minimum or saddle point of the
response curve. From our exploratory study on the use of response surface analysis,
there is no solution if the stationary point is a saddle. This study will propose to use
ridge analysis as an alternative solution to overcome the saddle point problem.
37
1.5
RESEARCH OBJECTIVES
Even though Malaysia is the largest producer of palm oil in the world, studies
on modelling yields have been very limited. The modelling of Malaysian oil palm
yield has been a recent phenomenon for decades. Literature reviews on research
conducted in this field are confined to simple models. The oil palm industry is
currently under going a structural change and is becoming more complex due to
technological advances, agricultural management, product demand and planting
areas (Teoh, 2000).
This research is an attempt to present a proper methodology for modelling oil
palm yield. The model may then be used for estimating and managing the oil palm
industry.
We further refine the objectives as follows:•
To study current modelling and estimating practices in the oil palm industry.
•
To explore and propose the best model for oil palm yield growth.
•
To explore the use of neural network to model oil palm yield.
•
To optimise fertiliser level which will generate optimum yield.
These objectives will be achieved by following the research framework as presented
in Figure 1.6.
1.6
SCOPE OF THE STUDY
This section is divided into three subsections. The first section will discuss
the scope of the data, followed by a discussion on the model scope, and finally the
discussion on statistical testing deployed in this study.
38
1.6.1
Data Scope
For modelling oil palm yield growth data used in this study is secondary data
taken from research done by Foong (1991; 1999). The research was conducted at
Serting Hilir in Negeri Sembilan with relatively wet weather. The annual rainfall in
this area is between 1600 mm to 1800 mm with two distinct droughts in January to
March and June to August. The data used here is the average fresh fruit bunches
(tonnes/hectare) from 1979 to year 1997.
The Malaysian Palm Oil Board (MPOB) provided us with a data set taken
from several estates in Malaysia. The factors included in the data set were foliar
composition, fertiliser treatments and FFB yield. The variables in foliar composition
include percentage of nitrogen concentration N, percentage of phosphorus
concentration P, percentage of potassium concentration K, percentage of calcium
concentration Ca, and percentage of magnesium concentration Mg. The fertiliser
treatments included N, P, K and Mg fertilisers, and they were measured in kg per
palm per year, example 3.7 kg N fertilisers were needed for one palm per year. The
foliar composition data was presented in the form of measured values while the
fertiliser data in ordinal levels, from zero to three.
39
Research Design Review
Secondary
Data
DATA GATHERING
MPOB
DATA MINING
DATA ANALYSIS
MODELLING
Oil Palm Yield
Growth Data
Foliar Composition
Nonlinear Growth
Curve
Fertiliser Data
Response Surface
Analysis
MLR
RMR
Neural Network
Goodness of Fit
Testing
No
Yes
Comparative Study
No
Yes
Oil Palm Yield Model
Figure 1.6: Summary of research framework and research methodology used in this
study
40
1.6.2
Model Scope
This study will confine the scope of models, namely the nonlinear growth
model (NLGM), multiple linear regression (MLR), robust M-regression (RMR),
response surface analysis (RSA) and neural network (NN) models. The nonlinear
growth model will be used to model the data of oil palm yield growth. Using foliar
analysis data we employ the multiple linear regression and robust M-regression to
estimate the oil palm yield. In the MLR model the independent variables are N, P,
K, Ca and Mg concentration (or as we call it, major nutrient component, MNC) and
the dependent variable is fresh fruit bunches (FFB) yield. Aside from MNC
concentration, we also introduce the use of nutrient balance ratio (NBR), critical leaf
phosphorus concentration (CLP), total leaf basis (TLB), deficiency of K (defK) and
deficiency of Mg as independent variables in the second part in MLR. In MM
regression we only consider N, P, K, Ca and Mg concentration as independent
variables and FFB yield as the dependent variable.
We propose the use of the neural network to model oil palm yield. The
discussion on the selection of neural network architecture and some statistical
analysis will be given in Chapter 6. Chapter 7 will describe the use of response
surface analysis to obtain the optimum fertiliser rate to produce an optimum FFB
yield. Following this is a simple economic analysis to select the best combination of
fertilisers input that generates the maximum profit.
1.6.3
Statistical Testing Scope
In this study we considered several statistical tests. They are the error model,
sum of squares error (SSE), root mean squares error (RMSE), determination
coefficient (R2), coefficient of correlation (r), t-test, F test and chi-square test. The
discrepancy between the predicted value from the model fitted, ŷ i and actual value yi
is used to measure the model goodness of fit. The difference between the actual and
the estimated value as known as the model error, and can be written as follows;
ei = yi - ŷ i
i = 1, 2,…, n
41
where ei is the model error in observation i. yi is the actual observation i, and ŷ i is
the estimated value at i observation. If the model performance is ‘good’, the model
error will be relatively small.
For the purposes of measuring the accuracy of model fitting, we consider the
four measurements commonly used in any research on model fitting. Namely sum
squares error, root mean squares error, determination coefficient R2 and correlation
coefficient. All formulas are given below;
n
(i)
Sum Squares Error, SSE =
∑ ( yi − ŷi )2 ,
i = 1, 2,…, n
i =1
∑ ( yi − yˆ i )
n
(ii)
Mean Squares Error, MSE =
2
i =1
, i = 1, 2,…, n
n
n
∑ ( yi − ŷi )2
(iii)
Root Mean Squares Error, RMSE =
i =1
, i = 1, 2,…, n
n
n
(iv)
Determination of coefficient, R2 = 1-
∑ ( yi − ŷi )2
i =1
n
2
, i = 1, 2,…, n
∑ ( yi − y )
i =1
and
(v)
Coefficient of correlation, r =
n
( xi − x )( y − y )
∑ Var ( x)Var ( y)
, i = 1, 2,…, n
i =1
where yi observed value, ŷ predicted value, n number of observation, x and y are
the mean of xi observation and yi observation, respectively, var(x) is the variance of X
and var(y) is the variance of Y. SSE, MSE and RMSE are used to measure the model
accuracy. The R2 value is a measure of how well the explanatory variables explain
the response variable. Correlation coefficient is used to identify the strength of the
relationship between any two variables.
In the case of more then two samples, one-way analysis of variance (anova)
can be used to test the different between the groups using F-test. The anova F-test is
42
calculated by dividing an estimate of the variability between the groups by the
variability within the groups;
F=
Variance between groups
Variance withion groups
A high value of F, therefore, is evidence against the null hypothesis of equality of all
population means. If the test shows the mean difference between groups to be
statistically significant, the Multiple Duncan test can be used to examine which
groups are different to each other (Montgomery, 1991). Another alternative to oneway analysis of variance is the Chi-square test, which is a nonparametric test which
can be used when assumption of normality is not needed.
The model performance will be measured using sum squares error, mean
squares error, mean absolute, root mean squares error, mean absolute percentage
error, coefficient of determination and coefficient of correlation.
1.7
DATA GATHERING
The Malaysian Palm Oil Board (MPOB) provided data from the MPOB
database of oil palm fertiliser treatments, which have been carried out from fourteen
oil palm estates. All the data from each estate has been collected, recorded and
compiled by MPOB researchers in the Research Database Center. All treatments
were based on a factorial design with at least three levels of N, P and K fertiliser
rates. Although different types of fertiliser were used in the treatments, the rates
quoted in the final analysis will be equalized to the amounts of ammonium sulphate
(AS), muriate of potash (KCI), Christmas Island Rock Phosphate (CIRP) and
kieserite (Kies). Cumulative yields obtained over a period of two to five years in
each trial were analyzed. The data of this study is experimental basic and was
collected for a certain period of time and differs for each experiment. We study
fourteen experimental stations (including Peninsular Malaysia and East Malaysia),
seven stations in inland areas and seven stations in coastal areas. Appendix A
presents the background of the experimental stations including age of oil palm, soil
type and the location of the station.
43
Fresh fruit bunches (FFB) yield data used in this study was measured in
tonnes per hectare per year or the average of FFB yield in one year. Foliar analysis
was only done once a year and the samples are taken either on March or July every
year. For example, if this year foliar analysis conducted in July, the next sample also
conducted in July next year, and so on. The type of FFB yield data and foliar
analysis data is continuous, and a fertiliser input is in coded form (0, 1, 2, and 3). If
recode data is needed, the coded value will be recoded to the exact value (Appendix
B). The detail of the leaf analysis procedure is presented in section 1.8.
1.8 LEAF ANALYSIS
The best method of determining the kind and amount of fertiliser to apply to
fruit trees is by leaf analyses. It effectively measures macro and micronutrients and
indicates the need for changes in fertiliser programs (Cline, 1997). Leaf analyses
integrate all the factors that might influence nutrient availability and uptake. The
essentials of macronutrients to oil palm tree were listed in Appendix C. However,
leaf analysis indicates the nutritional status of the crop at the time of sampling
(Pushparajah, 1994). It also shows the balance between nutrients for example,
magnesium (Mg) deficiency may be the result of a lack of Mg in the soil or due to
antagonistic effect with excessive K levels or both of these conditions. It also shows
hidden or incipient deficiencies. Adding N, for example, when K is low may result
in a K deficiency because the increased growth requires more K (Fairhurst and
Mutert, 1999).
The leaf analysis was conducted to determine the nutritional status of leaflets
from frond 9 on immature palms and frond 17 on mature palms (Corley, 1976). This
is conducted to assist the preparation of annual fertiliser programmes. In each
nominated leaf sampling the appropriate frond is correctly sampled for each leaf
sampling unit (LSU). Frond 17 is sampled from the labeled reference LSU palm in
some or all fields in a LSU and prepared for analysis. Cleanliness is essential at all
44
stages to prevent sample contamination and sampling time between 6.30 am and
12.00 noon.
A frond 17 is identified by counting from the first fully open frond in the
center of the crown (frond 1) (and moved three steps downward (frond 1, 9, 17) with
the same stack) and removed with a sickle. The frond is cut into approximately three
equal sections (to get the average of the nutrient concentration). The top and base
sections are discarded and placed in the frond stack. Twelve leaflets are selected and
removed from each frond. Six leaflets are cut from each site at the mid-point of the
frond section (Corley, 1976). Ensure that the 12 leaflets comprise of three from the
upper rank and three from the lower rank from each side of the rachis. The leaflets
samples from each field (or smaller area if required) are put together in a large
labeled plastic bag. About 500 leaflets are collected from each field of 30 hectare.
The samples are then sent to the estate laboratory or sample preparation room
for further preparation. The leaflets are bundled and trimmed to retain the 20-30 cm
mid-section; it is not necessary to wash the leaves. The mid-rib of each leaflet’s
section is removed and discarded. The remaining parts of the leaflet’s (lamina) are
then cut into small pieces 2 cm long and placed on aluminium trays to be dried. The
leaflets are dried in a fan-assisted oven for 48 hours (650C) or 24 hours (1050C). The
leaf N concentration will be reduced if the temperature exceeds 1050C.
After drying, the leaflets are placed in a labeled plastic bag. Half of the
sample retained as a backup for future reference (stored in a cool, dry place) while
the other is submitted for analysis. The LSU sample results from the laboratory are
then formatted as a spreadsheet and the variability is calculated. Leaf samples are
analyzed for N, P, K, Ca and Mg. Other nutrients may be included for palms planted
on particular soil types.
Leaf sampling is carried out once each year. Sampling is frequently
conducted to examine specific areas or to investigate particular nutritional problems.
Leaf sampling should be done at the same time each year and not during wet or very
dry periods. Complete the sampling procedure in the shortest possible time.
45
Because of the synergism between nitrogen (N) and phosphorus (P) uptake,
leaf concentration must be assessed in ratio to leaf N concentration (Ollagnier and
Ochs, 1981). This is due to the constant ratio between N and P in protein compounds
found in plant tissue (Fairhurst and Mutert, 1999). A critical curve has been
developed where CLCp is defined as;
Critical Leaf P concentration, CLCP = 0.0487 x Leaf N concentration + 0.039
A different approach to determine whether potassium (K) and magnesium
(Mg) are deficient taking into account the relative concentrations of the leaf cations
K, Mg and calcium (Ca). First, the total amount of bases in leaf (TLB) is calculated
and K and Mg are assessed as a percentage of TLB (Foster 1999). TLB can be
derived from equation below;
TLB (cmol/kg) = (% leaf K/39.1 + % leaf Mg/12.14 + % leaf Ca/20.04) x 1000
roughly, K and Mg deficiency can then be assessed individually, based on their
percentage of TLB. The deficiency of K and Mg can then be obtained
⎛ X ⎞
as ⎜
⎟ x100 , where X is partial to TLB of K and Mg. The K and Mg deficiency
⎝ TLB ⎠
can be rated into three categories; If the value is below than 25 the rating is
deficient, a low rating is between 25 to 30 and a rating more than 30 is considered
sufficient. Nutrient Balance Ratio, NBR is defined as the ratio between the foliar
nutrient composition and another foliar nutrient composition. For example, the NBR
between N and K in foliar, is defined as the ratio between N and K concentration.
The range of the NBR values for oil palm presented in Table 1.1.
46
Table 1.1: The optimum value of nutrient balance ratio (NBR) for foliar analysis
Nutrient ratio
N/K
1.9
NBR
2.50 – 3.00
N/Mg
14.00– 18.00
N/P
11.00 – 17.00
N/C
4.00 – 9.00
K/Mg
4.00 – 10.00
K/Ca
2.00 – 5.00
Mg/Ca
0.25 – 0.55
RESEARCH IMPORTANCE
The nonlinear growth models are used in modelling the nonlinear
phenomenon. Since the nonlinear growth model has not yet been explored in oil
palm industry (Foong, 1999 and Ahmad Tarmizi et al., 2004), we proposed the use
of the nonlinear growth model in the oil palm yield growth study. Here we will
provide some mathematical basis in parameter estimation for modelling oil palm
yield growth. Then from the results and analysis we can study the biological process
of oil palm yield growth.
Multiple linear regression can be used to find the relationship between the
dependent variable and the independent variable. There can be more than one
independent variable, which allows for the additional relevance of the independent
variable to the model. In these sense, multiple linear regression is rather flexible.
Our study emphasizes the proposed new independent variables into the model, an
area yet to be explored by researchers. In real life, nothing seems to work linearly all
the time. Data are sometime inclusive of outlier or unusual observation. We
proposed the use of multiple robust regression to overcome the negative impact of
outlier to the model’s development.
47
To improve the models, there are various new heuristic methods suggested in
this literature. We explore the flexibility of the neural network to improve the
estimated performance and the model’s accuracy. Previous studies in oil palm
stopped when the stationary point was saddle (Ahmad Tarmizi, 1986). This caused
did not make allowances for the possibility an incomplete inference from the model
than produce inefficient decision. It also caused difficulties in implementing
improvements in practice outcomes. This study proposes the use of ridge analysis
when the stationary point is saddle to improve data analysis.
1.10 RESEARCH CONTRIBUTION
There are many contributions in this study. Since it is an area of high
importance for the sustainability of the oil palm industry, the contributions can be
categorized as follows;
•
Identifying several nonlinear growth models for oil palm yield growth.
•
The investigation on the relationship between foliar nutrient composition and
yield was conducted using MLR and RMR. A practical model and procedure
were developed for this purpose.
•
Development of neural networks model to predict the oil palm yield and NN
results more reliable compared with the MLR and RMR models.
•
This study proposes statistical testing to evaluate the factors that influence NN
performance. The findings indicated that the combination activation and number
of hidden nodes have a significant effect on the NN performance. However, the
learning rate, momentum term and number of runs do not give any effect on the
NN performance.
•
This study investigates the effects of outliers on NN performance. The findings
show that percentage-outliers and magnitude-outliers significantly affect the NN
performance.
•
The response surface analysis when combined with the ridge analysis was used
to obtain the optimum level of foliar nutrient composition and fertiliser input to
produce optimum oil palm yield.
48
Several of the contribution demonstrated above has been published in various form
as described in Appendix D
1.11
THESIS ORGANISATION
This thesis contains eight chapters. Chapter 1 is the introduction. This
chapter gives an introduction to the problem’s description, research objectives,
research scopes, research importance, research data and a brief description on the
usage of the data in this research.
Chapter 2 is the Literature Review. This chapter contains a discussion on the
current and past research on oil palm yield. Here we present the application of
neural network modelling in several fields, such as economic, management and
agronomy. A summary is included at the end of the chapter.
Four main models used in the thesis are explained in Chapter 3. It discusses
the statistical methods such as nonlinear growth models, multiple linear regression,
response surface analysis and the neural networks model. This chapter also proposes
the research framework
In Chapter 4 the use of the nonlinear growth curve to model the oil palm yield
growth is considered. Twelve nonlinear growth models are presented and the partial
derivative for each models are provided. Comparisons among the model is done and
given at the end of the chapter.
Chapter 5 discusses the development of multiple linear regression and robust
M-regression to investigate the relationship between fresh fruit bunch and the
nutrient foliar composition. The use of nutrient balance ratio, deficiency of
magnesium, deficiency of potassium and critical leaf phosphorus as independent
variables are proposed in this chapter. The numerical results from both methods are
presented and compared in terms of modelling performance.
49
Chapter 6 presents the development of neural network to oil palm yield
modelling. The experimental design is conducted to investigate the effect of the
number of hidden nodes, the number of runs, momentum terms learning rate and
outliers data to the NN performance. The results and conclusion of model selection
have been carried out. The results from multiple regression analysis and neural
network model are compared in terms of goodness of fit and model accuracy.
Numerical results of the foliar nutrient composition and fertiliser treatments
performed by response surface analysis are reported in Chapter 7. The use of ridge
analysis is discusses to overcome the ‘saddle point’ problem at the stationary point.
This chapter ends with a simple economics analysis to generate the optimum
fertilisers level in order to maximise the profit.
Chapter 8 concludes the relevant and important findings from this research.
Recommendations on areas related to the findings and possible directions for future
research are presented.
50
CHAPTER 2
REVIEW OF THE LITERATURE
2.1
INTRODUCTION
The purpose of this literature review is to study the modelling and predicting
practices in the oil palm industry. This chapter presents the review of the literature
on modelling that is relevant to the oil palm yield in Malaysia and to the application
of neural networks. The first section seeks to describe the methods used in oil palm
yield modelling. This hopefully will enable us to establish a clearer picture of how
oil palm yield is being modelled. The second section attempts to identify the
nonlinear growth model, which may be applied to oil palm growth data. A broad
discussion of the applications of neural networks is provided, presenting the
advantages of neural networks and their usage in various fields. The last section will
be the discussion on literature about the response surface. Finally, the essential
points of review will be summarized as a guide for this research.
2.2 OIL PALM YIELD MODELLING
Traditionally, oil palm yield is estimated based on the black bunch count.
The simplest method for short term prediction of oil palm yield is to count the
number of black bunches. This is widely used as a rough estimate for the prediction
of oil palm yield. Several stages should be undertaken in order to count the black
51
bunches in the field. The black bunch count was usually conducted per hectare basis,
with each hectare containing between 140 to 148 palms. The method can be used to
forecast bunches per hectare for 3 to 6 months ahead (Chan, 1999). A typical black
bunch count is as follows; assume the average black bunch is 3 per palm, and assume
the average density is 145 per hectare. The bunch forecast is therefore 435 bunches
per hectare (or 145 x 3). The oil palm yield per hectare is easily obtained by
multiplying average weight of one bunch and the number of bunches per hectare.
Multiple linear regression has also been used to model potential oil palm
yield. Variables considered in this model are potential yield/production, total area
under planting, re-planting area, new planting, total matured area and average yield
per hectare. The functional relationship can be expressed as multiple linear
regression as
Mt = Mt-1 – Rt + αNt-3 + βNt-4
where Mt is the matured area at year t, Rt is the replanting area at year t and Nt-k is
the total new planting and replanting at year, k before year t. Chow (1984)
suggested that α + β = 1, to ensure that the estimated total planted area is equal to
total matured area. Then the total potential production area can be estimated using
the equation below;
Pt = Mt - γ[αNt-3 + βNt-4] ;
0<γ<1
where Pt is the total potential production area at time t, Mt is the matured area at
year t, and Nt-k is the total new planting and replanting at year, k before year t. The
total potential oil palm production, TPP can be calculated by multiplying the total
potential production area and the average oil palm yield per hectare.
TPPt = Pt * average FFB per hectare
Green (1976) performed an experimental design analysis to find the impacts
of fertiliser level to yield. In his experiment, he considered three types of fertiliser;
nitrogen (N) , phosphorus (P) and potassium (K). The method used in his study was
multiple linear regression with quadratic and second order interaction. Green used
52
linear regression to investigate the relationship between foliar composition and yield.
He found that yield and the leaf level of the four major elements N, P, K, and Mg are
highly correlated. In addition, Ahmad Tarmizi et al. (1986) carried out intensive
research on fertiliser trials at several estates in Peninsular Malaysia, and found that
the fertiliser response to yield varied according to the type of soil nutrient contents.
Ahmad Tarmizi et al. (1999) conducted two trials. In the first trial, he found
significant quadratic effects when using N and K treatments. The second trial shows
that there was a significant quadratic and linear effect of N fertiliser treatment.
Response to P fertiliser was also significant but response to K fertiliser was found
not to be significant. It was found that the size of the response to fertiliser also
depends on two other factors, viz., the yield level and the status of other nutrients.
Therefore, correcting these factors is essential before comparing the efficiency of
fertiliser recovery in different situations. The correction was made by fitting
response equation using linear regression to the fertiliser trial data, viz., agronomic
factors, site characteristics, soil and climate data (Foster et al., 1985; Foster et al.,
1987).
According to Ahmad Tarmizi et al. (1991) the efficiency of urea in each
trial was too complicated to be predicted by a general model (linear regression) using
common factors. This was due to the interaction of many factors in the field. The
factors that contributed significantly to the variation in the efficiency of urea are
identified, namely soil pH, annual drainage, drought, rainfall, relative humidity,
organic matter in the soil and ground cover characteristics. Ahmad Tarmizi and
Omar (2002) and Ahmad Tarmizi et al. (2004) suggested that the efficient fertiliser
practices could increase the oil palm yield from 14 tonnes per hectare to 30 tonnes
per hectare. Foong (2000) found that the climate, nutrient composition and
agronomic practices were significantly effect to the potential oil palm yield.
Foong (1991; 1999) found that oil palm production in Malaysia is strongly
influenced by the frequent droughts lasting two or three months in most part of the
country, especially the severe droughts which result in inflorescence abortion and
53
unfavorable sex differentiation. Moisture limitation will restrict the oil palm height
increment, although it is less restrictive in trees of older age, probably due to their
greater affinity for sunlight because of the heavy mutual shading of the canopies. He
also found that irrigation does not change the pattern in oil palm yield. Kee and
Chew (1991) studied oil palm yield responses to nitrogen content and drip irrigation.
They found that the moisture and nutrients were important factors that limit yields in
the northern interior of the East Coast of Peninsular Malaysia. Significant responses
to N were obtained in all trials. They used the quadratic model to obtain the
responses of N to yield. They also found that there were no interactions between N
fertiliser and irrigation. In the absence of irrigation, substantially higher fertiliser
rates are required to achieve a similar yield to the irrigated treatment.
Chow (1988) shows that the seasonality in oil palm yield, being highly
significant in the Peninsular level, could be quite independent of rainfall, although
rainfall should have interacted with and modified the seasons to some extent. There
is a positive correlation between production and rainfall at lags of 20-24 months and
7-11 months before harvesting. Significant negative lagged effects on yield are
observed at lag of 12-13, 24-25 months and the 36 months, which probably account
for the yield fluctuations that seem independent of seasonal and rainfall effects.
Chan (1999) suggested an integration of the six factors (economic evaluation;
fertiliser input, R & D findings, nutrient management, site characteristic and
procurement and distribution) with a systematic approach towards efficient fertiliser
management to exploit the maximum yield.
Harun (2000) discussed that oil palm yield depends on several factors such as
planting materials, agronomic inputs, photosynthesis activities and seasonal climatic
conditions. Furthermore, the bunch yields of oil palm are known to fluctuate
seasonaly. This leads to a corresponding variation in oil palm and palm kernel
production, and hence in supply to markets (Henson and Harun, 2004). Research
about the effects of weather to yield component was carried out by Oboh and
Fakorede (1999) in Nigeria, and they concluded that the minimum relative humidity
and sunshine hours 18-24 months prior to the year of harvesting could serve as
indicators to the yield pattern, and that the use of the path coefficient analysis along
with correlations and regression analysis would give a better understanding of the
54
relationship between yield and weather factors in palm oil. They also found a
negative correlation between the numbers of bunches and the mean bunch weight,
this is because the climatic factors which increase mean bunch weight are
detrimental to the formation of additional bunches. Henson (2000) found that the
haze was effected the oil palm productivity.
The relationship between leaf analysis and plant productivity is generally an
evident for most crops, and an assessment of fertiliser needs can be based on such
analysis. However for a cost effective approach, leaf analysis has to be integrated
with soil analysis (Pushparajah, 1994). Foster (2003) used regression analysis to
predict yield responses from the results of leaf nutrient analysis. He fitted linear and
quadratic regression lines and found that the quadratic model gave the best fit
compared to the linear model. As expected, response to N correlated most highly
with leaf N using a single linear regression line.
Soon and Hong (2001) studied the responses of N and K fertiliser to fertiliser
on two major soil types in Sabah that is Paliu and Limisir soil, and found that the
nitrogen fertiliser was the most important nutrient affecting FFB yield.. Nitrogen
was found to be the most important nutrient affecting FFB production. There was no
yield response to K fertiliser on Paliu Family soil, but on Lumisir Family soil, the K
fertiliser increased the FFB yield. The P fertiliser markedly improved FFB yield on
Paliu Family soil but this was not observed on Lumisir Family soil. There was no
significant response of FFB yield to Mg fertiliser on either soil type. There was a
systematic increase in leaf N, leaf P and leaf K with the application of N, P and K
fertilisers respectively on both soil types. K fertiliser had an antagonistic effect on
the amount of leaf Mg. Fertiliser experiments are used to determine oil palm
response to mineral fertiliser under particular agro-ecological conditions and the
results help to guide the preparation of routine fertiliser programmes for commercial
plantation. From their studied, it was found that the optimum yield was achieved at
0.69 to 0.75 kg N per palm per year. Large amounts of costly mineral fertiliser are
usually required to achieve optimum economic yields (Foster, 2003).
55
Makowski et al. (2001) showed how the statistical analysis used to describe
wheat yield, grain protein content, and residual mineral N responses to applied N,
can be used to predict responses to applied N and calculate optimal N rates. Teng
and Timmer (1996) demonstrated that simultaneous prescriptions of nitrogen (N) and
phosphorus (P) fertiliser for tree seedling culture can be facilitated by response
surface models generated from a multi-level factorial experiment designed to
quantify growth, nutritional response and (N x P) interactions in white spruce ( Picea
glauca Voss). They also studied the interaction between nitrogen and phosphorus
using response surface model and estimate the optimum levels of nitrogen and
phosphorus in forestry application.
B́elanger et al. (2000) evaluated quadratic, exponential and square root
models describing the yield responses of potato (Solanum tuberosum L.) to six rates
of N fertiliser (0 to 250 kg N per hectare), with and without supplemental irrigation,
at four sites in Canada. They found that the quadratic model was best suited to
describe the yield response of potato to N fertiliser, and to predict the optimum
economic of N rates for areas with a similar ratio of the cost of N fertiliser to the
price of potatoes to Atlantic Canada.
Wendroth et al. (2003) showed in his study how autoregressive state space
model can be used to evaluate the quality of barley yield predictions. Two sets of
data set were used; the first data set comprised with soil information (texture, organic
carbon content) and yield from previous year, and the second set was comprised of
crop information (vegetation index, nitrogen status and land surface elevation). He
found that both sets of variables elicited a similar quality of prediction. Drummond
et al. (1995; 2002) conducted a study to understand the complex relationship
between crop yield, site and soil characteristics in the soybean industry. They used
statistical and neural network methods and they found that the neural network
methods were consistently more accurate on the individual site-year analyses,
particularly when compared to the linear statistical technique.
Crop response to fertiliser nutrients such as phosphorus (P) and potassium
(K) is commonly predicted using soil test information. Fertiliser recommendations
56
from soil tests are usually based on calibration curves (Kastens et al., 2000). They
used data from specific farming at Northwest Kansas to produce a model that
estimated dryland wheat yield for the farm. They found that P fertiliser had little
effect on wheat yield. Kastens et al. (2000) conducted a second study to estimate
yield model, which incorporated detailed site-specific field information (soil test and
fertilisers N and P, soil organic matter content, soil texture, and soil pH). It was
found that the wheat yield responded principally to soil test P, meaning that P should
be treated as a capital investment and that optimal P fertilization depended on the
length of the land tenure. An analytical framework was developed in which fertiliser
P after crop removal builds up the amount of soil test P in future years, increasing
wheat yields.
Kastens et al. (2003) suggested to use statistical technique with maximum
entropy to predicted wheat yield using fertiliser N and P, soil test N and soil test P,
and other causal factors as independent variables independent. Instead of the
information from the farm, they also considered soil lab recommendations
information. They found that the information blending techniques resulted in models
that predicted yield out of sample as well as or better than a model estimated with
only farm data.
2.3
NONLINEAR GROWTH MODEL
The analysis of growth data becomes more important in many fields of study.
Biologists are interested in the description of biological growth and in trying to
understand its underlying biological process. In agriculture there are obvious
economic and management advantages in knowing how large things grow, how fast
they grow, and how does these factors respond to environmental conditions or
treatment over time. Social scientists are interested, for example, in the growth of
populations, new political parties, the foods supply and energy demand. The same
types of model occur when the explanatory variable x is no longer time but
increasing intensity of some other factors, such as the growth of smog with increase
in solar radiation, weight gains with increased nutrient in diet, changes in crop yield
57
with increase density of planting, and so on. Tsoularis and Wallace (2002) gave a
comprehensive discussion on the characteristics of logistic growth models.
The applications of non-linear growth model have been studied by number of
researchers, for example Amer and William (1957) applied the Gompertz curve to
leaf Pelargonium growth. Rawson and Hackett (1974) fitted the Gompertz curve to
leaf growth. Causton and Venus (1981) found that Gompertz curve is the most
frequently used to fit leaf growth data. Hunt (1982) pointed out the most frequent
research data used in nonlinear growth models is organism growth. Fekedulegn et
al. (1999) discussed the application of the nonlinear growth models in forestry. Md
Yunus (1999) fitted the Gompertz curve to the cocoa tree growth. Meanwhile,
Zuhaimy et al. (2003) studied the Gompertz curve to tobacco growth data and Azme
and Zuhaimy (2004) further explored to fit several numbers of the nonlinear models
to tobacco leaf growth data and found that the nonlinear growth curve could be fitted
very well to the tobacco growth data.
Many authors have written about the possibility of using growth model in
modelling yield growth. Penman (1956) has related growth yield (increase in dry
weight) with accumulated potential transpiration. Nelder et al. (1960) has used
growth model in their study on the relationship between weight of crop yield and the
time. They found that the effects of years are frequently greater than those of the
treatments. Holliday (1960) studied the relationship between the grain yield and plant
population. He used yield per unit as the dependent variable and the plant population
as the independent variable. He also conducted a study on the relationship of yield
of dry matter per unit area and plant population.
Kruse (1999) studied the yield growth for corn yield in China, Argentina and
European Union (EU), the yield growth for wheat yield in China and the yield
growth for soybean yield in Argentina. The dependent variable chosen was the
soybean yield in metric tons per hectare and the independent variable was time
(year). Naylor et al. (1997) provided statistical support for the yield variability using
yield growth model.
58
Garcia (1983; 1988; 1989; 1989) gave detail discussions on the application of
growth and yield models for forecasting purposes in forest industry. BertalanffyRichards growth model was used by Lei and Zhang (2004) in their study on forest
growth and yield data with the dependent variable was the growth volume per
hectare. This shows that growth model has been commonly used for modelling
forest yield.
Harris and Kennedy (1999) studied the use of logistic and exponential yield
growth model for cereal and maize for the developed country, wheat for the United
State and paddy rice for the Japan. Their study showed that the logistic and
exponential yield growth models fits are essentially equally good. They also found
that the logistic yield growth model is better than the exponential yield growth model
used in the Japan paddy rice yield.
These literatures do demonstrate that it is not unusual to model yield using
growth model as by definition growth is a measure of change in some characteristics
(weight, basal area, volume, etc) over some specified amount of time and yield
actually is the amount of some characteristic that can be harvested per period.
Reviews above have directed this work towards adopting a similar approach of
modelling oil palm yield using growth models.
.
2.4
APPLICATION OF NEURAL NETWORK MODELLING
Neural networks technique has attracted a lot attention in the last decade.
Neural networks can be regarded, in one aspect, as multivariate nonlinear analytical
tools, and are known to be very good at recognizing patterns from noisy, complex
data and estimating their nonlinear relationship. Neural network (NN) technology
offers significant support in term of organizing, classifying and summarizing data.
The literature points at several limitations in multiple regression that are
overcome by neural networks. The limitations include the following;
59
The complexity of multiple regression models is limited to the linear
(i)
combination of decision variables (Hill and Remus, 1994).
Neural networks models are not subject to model misspecification as is
(ii)
the case with multiple regression, especially in short-term modelling (Hill
and Remus, 1994; Gorr, 1994).
(iii)
Neural networks can partially transform the input data automatically,
whereas this represents a rather exhaustive task with multiple regression
(Hill and Remus, 1994; Connor, 1988; Donaldson et al., 1993).
Neural networks models are capable with their nonlinear threshold
(iv)
functions of handling almost any kind of nonlinearity, while this is not a
quality performance guarantee with multiple regression models (Gorr,
1994; Hill and Remus, 1994).
However, the neural network approach has not been used as an alternative
approach in oil palm modelling, even though this approach has proved appealing to
many social scientists. Zuhaimy and Azme (2001; 2003) gave a brief review on
neural network and its application in forecasting. The review of application neural
network will be categorised into four categories. The first is the neural network in
science and technology. The second category is application of the neural network in
economy. While the third category is the neural network in environmental and health
and the fourth category is the application of the neural network in agriculture and
agronomy.
2.4.1
Neural Network in Science and Technology
Neural networks in meteorology and oceanography was studied intensively
by Tangang et al. (1997; 1998). They made a comparison between the statistical
methods with neural network models and, they found that NN method was involved
a versatile and powerful technique capable of augmenting traditional linear statistical
methods in data analysis and forecasting. They conclude that NN model is also
found to be a type of variational (adjoint) data assimilation, which allows it to be
60
readily linked to dynamical models under adjoint data assimilation, resulting in a
new class of hybrid neural dynamical.
Michaelsen (1987) and Mihalakakou et al. (2000) described a feedforward
backpropagation neural network approach for modelling and making short-term
predictions on the total solar radiation time series. They found that neural networks
approach leads to better predictions than the autoregressive model. Navoane and
Ceccatto (1994) predicted summer monsoon rainfall over India using neural
networks model. As a general outcome, they conclude that neural network can be
advantageously used in his context, showing comparable or better performance than
conventional methods. Jiang et al. (1994) studied the NN in area of remote sensing,
the results shows that small networks performed better than the larger one. The
application of neural networks model for load forecasting in Taiwan was carry out by
Hsu and Chen (2003). Results suggest that the neural network model yields more
accurate than the regression based model under the same exogenous variables used.
2.4.2
Neural Network in Economy
Evans (1997) applied NN model in currency forecasting. For the purpose of
his study, the network was trained on data derived from Forex rates based on the four
principle international currencies, US$, German DM, Japanese Yen, and £UK,
processed in various ways with the UK/DM cross-rate used as the target currency for
forecasting purposes. He concluded that with NN model the currency could be
estimate up to ten days ahead with reliable results.
Yao and Poh (1995) used neural network to predict the KLSE index. There
are four challenges beyond the choice of either technical or fundamental data for
using neural network to forecast the stock prices, the inputs and output variables,
type of neural network, neural network architecture and evaluate the quality of
trained neural network. They found that NN model can be use as forecasting tools as
well as traditional forecasting methods such as Box-Jenkins. Yao et al. (2000)
applied the backpropagation neural network to forecast option prices of Nikkei 225
61
index future. The results suggest that for volatile markets, a neural network option
model outperforms the traditional Black-Scholes model, and NN have the ability to
model nonlinear patterns and learns from the historical data.
Gan and Ng (1995) applied artificial neural network for forecasting
multivariate FOREX and found it performed better than statistical approach. Yao
and Tan (2000) used NN to study the exchange rates between American Dollar and
five other major currencies, Japanese Yen, Deutsch Mark, British Pound, Swiss
Franc and Australian Dollar. They found that result from NN is better than the
traditional methods. Thiesing and Vornberger (1997) studied the performance of the
neural network and two prediction technique used in the supermarket i.e. naïve
prediction and statistical prediction to forecast sales. The result shows that neural
networks outperform the two conventional techniques with regard to the prediction
quality. Smith and Gupta (2000) clarified the potential of multilayered feedforward
neural networks, Hopfield neural networks, and self-organizing neural networks.
They found that neural network approaches to solving business problems as very
similar to statistical methods, with some relaxation of assumption and more
flexibility.
Boussabaine and Kaka (1998) investigated the feasibility of using neural
networks for predicting the cost flow of construction projects, and they found that the
results are very encouraging because the different between actual and predicted value
are very small. Moshiri and Cameron (2000) compared the performance of the
backpropagation neural networks with the traditional econometric approaches to
forecast the inflation rate. The results show the backpropagation neural networks
model are able to forecast as well as all the traditional econometric methods, and to
outperform them in some cases.
Uysal and El Roubi (1999) studied the usefulness of artificial neural networks
as an alternative approach to the use of multiple regression in tourism demand
studies. Canadian tourism expenditure in the United State are used as a measure of
demand to demonstrate its application. The results reveal that the use of neural
networks in tourism demand studies may result in better estimates in terms of
prediction bias and accuracy. Law and Au (1999) and Law (2000) showed that neural
62
networks model outperform multiple regression, naïve regression, moving average
and exponent smoothing in terms of forecasting accuracy.
Motiwalla and Wahab (2000) studied the application of neural networks in
stock market problems. The evidence suggests that neural network models are more
successful compared to regression in providing a good fit to the data generating
process of stock returns and in issuing profitable trading signals. In the financial and
monetary problems, Tkacz and Hu (1999) tried neural networks model to improve
forecasting performance. The main findings are that, at the first quarter forecasting
horizon, neural network yields no significant forecast improvements. At the fourth
quarter horizon, however, the improved forecast accuracy is statistically significant.
In addition, they concluded that, the improved forecast may be capturing more
fundamental nonlinearities between financial variables and real output growth at the
longer horizon. Angstenberger (1996) studied the application of the neural network
model in predicting the S and P 500 Index. Wong and Selvi (1998) and Wong et al.
(2000) discussed in detail the application of neural networks modelling in business
and finance.
Zhang and Hu (1998) examined the effects of number of input and hidden
nodes to neural networks performance to forecast British pound/US dollar exchange
rate. They conclude that neural networks outperform linear model, in addition, the
number of inputs nodes has a greater impact on performance than the number of
hidden nodes, while a larger number of observation do reduce forecast error. Zhang
et al. (2001) studied an experimental evaluation of neural networks for nonlinear
forecasting. Results show that neural networks are valuable tools for modelling and
forecasting nonlinear time series while traditional linear methods are not as
competent, in addition the number of inputs nodes is much more important than the
number of hidden nodes. Nasr et. al. (2003), investigated the application of neural
networks in gas consumption. In general, they found, the multivariate model showed
better forecasting performance over the univariate model. Zuhaimy and Azme
(2003) studied the application of NN to predict crude palm oil prices and found that
the NN performance is better than multiple linear regression. Limsombunchai et al.
(2004) studied the used of hedonic price model and neural network model in
63
predicting the house price, and found that neural network approach is better than
hedonic price model.
2.4.3
Neural Network in Environmental and Health
Corne et al. (1998) evaluated the performance of artificial neural networks
for flood forecasting and comparison was made with other conventional
hydrodynamic model, and the NN performance are sufficient good and robust do as
to provide a basic for practical short term flood forecasting. Gaudart et al. (2003)
applied neural networks model to epidemiological data. The performance of
multiplayer perceptron and that of linear regression were compared, with regard to
the quality of prediction and estimation and the robustness to deviations from
underlying assumption of normality, homoscedasticity and independence of errors.
The results show that both model used had comparable performance and robustness.
2.4.4
Neural Network in Agriculture
Corn yield prediction using a neural network classifier trained using soil
landscape features and soil fertility data was done by Shearer et al. (1994). They
found that neural network classifiers can be used in predicting spatial yield
variability that described the actively growing corn crop.
Drummond et al. (1995) compare several methods for predicting crop yield
based on soil properties. They noted that the process of understanding yield
variability is made extremely difficult by the number of the factors that affect yield.
They used several methods such as multiple linear regression MLR, stepwise linear
regression SMLR, partial least squares PLS, projection pursuit regression PPR, and
back-propagation neural network. They found that neural network gives better result
than MLR, SMLR and PLS, but quite similar to PPR. They also conclude that lesscomplex statistical methods, such as standard correlation, did not seem to be
particularly useful in understanding yield variability. The correlation matrices
described each factor’s linear relationship to yield. However, when complex
64
nonlinear relationships between factors exist, correlation may provide inaccurate and
even misleading information about these relationships.
Prediction capabilities were highest for nonlinear and non-parametric
methods. One method Drummond et al. (1995) tried to used was a feed-forward,
backpropagation neural network for corn and soybean yield prediction. They
included soil properties, such as phosphorus, potassium, soil pH, organic matter,
content soil depth, and magnesium saturation as inputs and compared the results with
other statistical models. The ANN showed promise as aid in understanding yield
variability, although their network model needed further improvements for
increasing accuracy and they did not include weather information and other factors in
their ANN.
There have been many applications of ANNs reported for the interpretation of
images in agri-food industry. Studies showed that artificial neural networks can be as
accurate as procedural model for the interpretation of images (Deck et al., 1995;
Timmermans and Hulzebosh, 1996). Yang (1997) conducted a study on the used of
neural network model to estimate soil temperature. Wang et al. (1999) applied NN
model for classification of wheat kernal colour using the neural networks model.
Yang et al. (2000) discussed on the application of artificial neural networks in image
recognition and classification of crop and weeds. The objective of their study was to
develop a back-propogation artificial neural network model that could distinguish
young corn plant from weeds. They found that an ANN-based weed recognition
system can potentially be used in the precision sprying of herbicides in agriculture
field. The NN becomes more popular in agriculture since it can overcome
complicated algorithm by intrepreting images quickly and effectively.
Neural network model also has been used in predicting forest characteristics
in southeast Alaska (Corne et al., 2000). They used neural networks model,
specifically learning vector quantization, to generate models based upon simple
inventory parameters such as geographical location, elevation, slope and aspect with
complementary satellite image data. Predictive maps were generated by obtaining
input data from digital elevation models. The results of the predictions were
compared with both the data recorded in the database and with published
65
classification maps for part of the study area produced by standard satelite image
interpretation methods.
Liu et al. (2001) used an artificial neural network to build a corn yield
prediction model for precision farming application. A feedforward, completely
connected, back-propagation artificial neural network was design to approximate the
nonlinear yield function relating corn yield to factors influencing yield. After the
artificial neural network was developed and trained, they considered three aspects of
the input factors were investigated; (i) yield trends with four input factors, (ii)
interaction between nitrogen application rate and late July rainfall, and (iii)
optimization of the 15 input factors with a genetic algorithm to determine maximum
yield. They concluded that, the back-propagation, feed-forward neural network
predicted corn yields with 80 percent of accuracy. When an example with
abnormally low yield was discarded, accuracy rose to 83.5 percent. They also found
that the network was able to capture the expected interaction between rainfall and
amount of applied nitrogen fertiliser.
Shrestha and Steward (2002) used the neural network model in modelling
early growth stage of corn plant and found that neural network aproach was effective
in adjusting the segmentation decision surface based on general measures of lighting
changes. Kominakis et al. (2002) applied neural network model to prediction of milk
yield in dairy sheep and use Pearson and rank correlations between predicted and
observed to test the goodness of fit. The result showed the average difference
between observed and predicted yields was generally statistically non-significant.
2.5
RESPONSE SURFACE ANALYSIS
This section will review the application of response surface analysis as a tool
to obtain the optimum levels of certain factors that influenced yield production, for
example fertiliser application in oil palm industry.
66
Ahmad Tarmizi et al. (1986) also analyzed fertiliser trials carried out over a
range of environments in Peninsular Malaysia where their yield response functions
for specific soil series have also been used for formulation of fertiliser
recommendation. Yield response equation which take into account curvilinear
response to each fertiliser treatment and two and three factor interactions between
these treatments, were fitted to the plot data. Analysis of variance indicated the
significance of the individual variables in these equations. Chan et al. (1991) studied
the fertilizer efficiency in oil palm in different location in Peninsular Malaysia. The
yield response and environmental factors effected to the fertilizers application was
investigated by Ahmad Tarmizi et al. (1991), and found that the environmental
factors contributed negatively to the efficiency of urea.
Verdooren (2003) and Goh et al. (2003) conducted an experiment to
determine the optimum levels of fertiliser inputs that gives the optimum yields.
Statistical techniques involves in his study were the regression analysis and analysis
of variance. He concluded that fertiliser experiments with at least three quantitative
levels can be used to derive an estimate of the agronomic and economic optimum
rate but much better to include five quantitative levels based on central composite
design to obtain a reliable estimate for the optimum with a small standard error.
2.6
SUMMARY
The view has provided us with an overview of the area of research with the
method used as summarised in Table 2.1. It shows that MLR is one of the most
common methods used in studying the relationship between the oil palm yield as
dependent variable with other variables, such as climate, rainfall, sunshine, etc.
Neural networks is a new area which imitate the behavior the memory process in
human brain as being application to many area. Its application in oil palm yield is
still at its infancy and has a great potential. Nonlinear growth model is not as
popular as MLR and has not been explored yet in oil palm industry. So this study
will be the first to apply the nonlinear growth model and neural network model in oil
palm yield modelling in Malaysia. Here, we will also provide the mathematical
67
aspect in estimating the parameters. Nonlinear growth model will be used to
understand the biological process of the oil palm yield growth.
The term of response surface analysis is rarely used in the earlier research in
fertiliser trials. The term used is analysis of variance and quadratic response
analysis. When the stationary point is saddle they could not made any decision in
their results. In this study, we explore the used of ridge analysis to overcome the
saddle point problem.
Table 2.1: The summary of the literature reviews in this study
Author/year
Green (1976)
Research area
Method used
Study the impact of fertiliser level or input Multiple linear
oil palm yield and investigate the
regression
relationship between foliar composition
and oil palm yield.
Chow (1984)
Forecasting oil palm yield.
Multiple linear
Variables: Matured area, replanting area
regression
and total new planting and re-planting
area
Foster (1985; 1987)
Study the effect of agronomic factors, site
Multiple linear
characteristics, soil and climate to oil
regression
palm yield.
Ahmad Tarmizi et
Study the formulation of fertiliser
Regression and
al. (1986)
recommendation.
analysis of
variance.
Chow (1988)
Study the seasonality and rainfall effect to
Time series
oil palm yield.
analysis and
correlation
Kee and Chew (1991) Investigate oil palm yield response to
Regression method
nutrient content and moisture in East
Coast of Peninsular Malaysia.
Chan et al. (1991)
Study the interaction of fertiliser to oil
Analysis of
palm yield.
variance.
68
Author/year
Research area
Method used
Ahmad Tarmizi et al.
Study carried out to determine
Regression and
(1991)
circumstances under which urea could be
partial correlation.
efficiently used instead of conventional
inorganic nitrogen.
Ahmad Tarmizi et al.
The effects of N, P and K fertiliser to oil
Linear regression
(1999)
palm yield.
Oboh and Fakorede
Investigate the effects of weather to yield
Regression and
(1999)
component (number of bunch, fresh fruit
correlation
bunch and mean bunch weight) in Nigeria.
Foong (1999)
The impact of moisture and potential
Correlation analysis
evapotranspiration, growth and yield of
palm oil.
Leng et al. (2000)
Study the effects of fertiliser withdrawal
Analysis of
on yield and yield components (bunch
variance.
weight).
Henson and Chang
Study the physiology of oil palm and it’s
Plant growth
(2000)
effect to yield. Factors namely solar
analysis.
energy incident on the crop canopy, the
fraction of solar, efficiency of conversion
and fraction of dry matter.
Belanger et al.
Study the effect of nitrogen fertiliser to
(2000)
potato.
Soon and Hong
Study the fertiliser (N, P and K) response
(2001)
to oil palm yield in Sabah.
Chin (2002)
Investigate the relationship between
Regression
Regression.
Correlation analysis
climate factors, soil and planting density
to yield.
Foster (2003)
Predicting yields response from leaf
Regression and
analysis.
correlation
69
Author/year
Research area
Method used
Wendroth et al.
Prediction barley yield with some soil and
Regression
(2003)
nitrogen information.
Vendooren (2003)
Experimental design to obtain optimum
Regression and
level of fertiliser inputs.
analysis of
variance.
Chan et al. (2003)
Study the climate change and its effects
Discussion
on yield of oil palm.
Ahmad Tarmizi et al.
Study the fertiliser programme towards
Regression and
(2004)
higher yield.
correlation.
Henson and Mohd
Study the effects of seasonality variation
Time series
Hanif (2004)
in oil palm fruit bunch production.
analysis and
correlation.
Azme et al. (2003)
Modelling the interaction between the
Multiple linear
palm oil prices and others vegetable oils.
regression
Azme and Zuhaimy
Study the collinearity effect to the model
Linear regression
(2003)
performance
and principle
component
regression.
Garcia (1983; 1988;
Nonlinear growth model in forestry
1989; 1989)
Kruse (1999)
Nonlinear growth
model
Study yield growth in various contry
Nonlinear growth
model
Harris and Kennedy
Modelling cereal and maize yield
(1999)
Mohd Yunus (1999)
Nonlinear growth
model
Fitting Gompertz curve to cocoa growth.
Nonlinear growth
model
Zuhaimy et al. (2003) Fitting Gompertz curve to tobacco growth
data.
Nonlinear growth
model
70
Author/year
Research area
Method used
Azme et al. (2004)
Comparative study on nonlinear growth
Nonlinear growth
models to tobacco leaf growth data.
models.
Lei and Zhang (2004) Study the forest growth and yield
Nonlinear growth
model
Drummond et al.
Compare several methods for predicting
(1995)
crop yield based on soil properties.
Corne et al. (2000)
Predict forest characteristics in southeast
Neural network
Neural network
Alaska
Liu et al. (2001)
Build a corn yield prediction model for
Neural network
precision farming application
Shrestha and Steward
Modelling early growth stage corn plant
Neural network
Prediction of milk yield in dairy sheep
Neural network
Forecasting crude oil palm prices.
Neural network
(2002)
Kominakis et al.
(2002)
Zuhaimy and Azme
(2003)
From this summary, we found that multiple linear regression is the most
common method used in oil palm modelling. But nonlinear growth model and neural
network have not been explored in this area, so neural network model to be the major
contribution into the development of the oil palm yield modelling. This is partly
attributed to the fact that the statistical methodology used for fitting nonlinear models
to oil palm yield growth data is closely related to the mathematics of the models and
has not yet been explored. Study shows that the neural networks model more
superior compared to any other model earlier proposed by other researchers.
71
CHAPTER 3
RESEARCH METHODOLOGY
3.1
INTRODUCTION
This chapter discusses the research methodology development for this study.
It describes several approaches in the modelling of the oil palm yield. A detailed
description of the modelling approaches in the study of oil palm yield currently the
used of methods such as nonlinear growth model, multiple linear regression, neural
networks modelling and response surface analysis. An extensive coverage of those
models is given to describe the methodology developed to suit the modelling of oil
palm yield.
3.2
DATA ANALYSIS
One of the most important components in the success of any
modelling approach is the data. The quality, availability, reliability, relevance, and
potential for repetition of the data used to develop and run the system is critical to its
success. Even a primitive model can perform well if the input data has been
processed in such a way that it clearly reveals the important information. On the
other hand, even the best model is worth very little if the necessary input information
is presented in a complex and confusing way.
72
Data processing starts with the collection and analysis of the data, followed by
pre-processing; after which the data is fed to the neural network. Finally, postprocessing is needed to fit the outputs of the network to the required outputs (Figure
3.1), if necessary. This data analysis procedure was applied to all modelling
approaches in this study.
The data sets used in this study were analysed using statistical packages such
as the Statistical Package for Social Science (SPSSx), the Statistical Analysis System
(SAS), the S-Plus and Matlab. The SPSSx package was used to model the oil palm
yield using multiple linear regression analysis, while robust M-regression modelling
was developed using the S-Plus package. The nonlinear growth model and response
surface analysis were performed with the help of the SAS package. Matlab software
was used to develop the neural network model.
Data Collection
Data Pre-Processing
Data Analysis and Modelling
-
Nonlinear growth model
-
Multiple linear regression
-
Robust M-regression
-
Neural network
-
Response surface analysis
No
Data Post-processing
Yes
Output
Figure 3.1: Data analysis procedure used in this study
73
3.3
MODELLING
This is an exploratory research project for the purpose of selecting best model
to describe the data. The data was analysed using nonlinear growth model, multiple
linear regression, robust M-regression, the neural network model and response
surface analysis. This study began by looking at previous work on the model of
nonlinear growth applied to various types of data (refer to Figure 3.1). These models
were studied separately, and the results are discussed. A comparative study on each
model is also given.
The nonlinear growth model is used to analyse the oil palm growth data,
which is naturally nonlinear. It helps us to understand the biological process of the
yield growth. In analysing the foliar data, multiple linear regression and robust Mregression are used to investigate the relationship between the foliar data and the oil
palm yield. The neural network model is also used in the analysis of foliar data, in
order to improve the model performance. The multiple linear regression and neural
network models use the same data because the natures of both models are able to
model the casual relationship between dependent and independent variables. Linear
relationship can be modelled by using the multiple linear regression and neural
network models, but the neural network model has the ability to adapt to nonlinear
relationships depending on the activation function used in the architecture.
Finally, response surface analysis is introduced to obtain the optimum level
of fertiliser rate in order to generate the maximum profit. The goodness of fit for all
models used in this study is measured using MSE, RMSE, MAPE, the correlation
coefficient and the coefficient of determination. The details of those measurements
are given in Chapter 1. The theory and concept of these models are given below.
3.3.1
The Nonlinear Growth Model
Suppose that Y is the dependent variable and X is the independent variables.
The relationship between Y and X is nonlinear if changes in X and Y are not
74
consistent within the range of independent variables. It is also nonlinear if the
parameters in the model are not linear.
A simple nonlinear regression model is as follows,
Y = f(β, X) + ε
( 3.1)
where Y is the dependent variable, β is an unknown parameter (β1, β2, β3,…, βp), X
is the independent variable or exploratory variable and ε is the error term. If the
relationship of Y and X is not linear, then the expected value of β is also not linear.
Draper and Smith (1981) discussed that the nonlinearity in the relationship depends
only on the value of the parameter’s expected characteristic in the independent
variables and not in the dependent variable. For example, in the early stages of oil
palm yield growth (Figure 3.2), the yield growth increases vigorously until year five,
then the yield increases slowly until year ten. This is continued until achieves a
consistent level of FFB yield that normally fluctuated.
45
FFB (ton/hec/year)
40
35
30
25
20
15
10
5
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19
Year
Figure 3.2: FFB yield growth versus time (year of harvest)
Draper and Smith (1981) and Ratkowsky (1983) discussed the details on the
family of the growth models. Ratkowski (1983) also discussed five important points
to consider when developing nonlinear regression modelling:
(i)
Parsimony: the model should contain as few parameters as possible;
75
Parameterisation: parameter with the best estimation properties should be
(ii)
used;
(iii)
Range of applicability: the data must cover the entire range described by
the model;
(iv)
Stochastic specification: the error should structure also be modeled;
(v)
Interpretability: parameters with a physical meaning are preferred;
Generally the growth rate does not steadily decline, but rather increases to a
maximum value before declining to zero. This growth curve also known as the Sshaped model or sigmoidal, and the growth rate is given by
df ( t )
∝ g( f ){ h( α ) − h( f )}
dt
(3.2)
where g and h are increasing functions with g(0) = h(0) = 0.
3.3.1.1 Nonlinear Methodology
For the system of equation represented by the nonlinear model
Y = F(X1, X2, …, Xn, β0, β1, …, βp,) + ε
Yt = F(β, Xt) + εt
(3.3)
where X is a matrix of the independent variables, β is a vector of unknown
parameters, ε is the random error vector, F is a function of the independent variables
and the parameters and Yt is the observed value on the tth experiments (t =1, 2, …,n).
The least squares estimate of β* , denoted by β̂ , minimises the sum squares of error
S(β) =
n
∑ [Y
t =1
t
− F( X t ,β )]
2
(3.4)
where S(β) is the sum squares of error. It should be noted that this nonlinear least
squares situation may have several relative minima in addition to the absolute
minimum β̂ . We can estimate the value of β̂ using the linear approximation as given
in equation (3.4). Let say that in a small neighborhood of β*, lies the true value of β.
By using linear the Taylor expansion
76
fi(β) ≈ fi(β*) +
k
∂f
∑ ∂βi
r =1
( β r − β r* )
(3.5)
r β*
we can also rewrite equation (3.5) as f(β) ≈ f(β*) + F(β - β*). Hence S(β) = ||z –
F.θ||2, where z = Yt – f(β) and θ = β - β*. The properties of the linear model are
minimised when θ is given by θ̂ = (F.TF.)-1F.Tz, where F. =
∂f ( β )
.
∂β r
When n is large enough, β̂ is almost certain to be within a small neighborhood of β*.
Hence β̂ - β* ≈ θ and β̂ - β* ≈ (F.TF.)-1F.Tz.
In the nonlinear situation, both X and F(β) are functions of β, and generally a
closed-form solution does not exist. Thus the nonlinear procedure of SAS package
uses an iterative process. A starting value for β is chosen and continually improved
until the error of sum-squares, εTε (SSE), is minimised. The iterative techniques
involving the matrix X are used to evaluate the current values of β and ε = Y – F(β),
and the residual is used to evaluate the current values of β. The iterative process
begins at a point (starting/initial value) β0. X and Y are then used to compute a value
of δ such that SSE(β0 + kδ) < SSE(β0).
Most nonlinear models cannot be solved analytically, so the used of the
iteration method is required. Let β(k) represent an approximation to the least squares
estimate β̂ of a nonlinear model. For value of β close to β(k), we use the linear Taylor
expansion f(β) ≈ f(β(k) + F.(k)( β-β(k). If r(β) is the residual vector, then r(β) = Y-f(β)
≈r(β(k) – F(k)( β-β(k)). By substituting S(β) = rT(β)r(β), this leads to the equation
S(β)≈rT(β(k))r(β(k))-2rT(β(k))F(k)( β-β(k)) + (β-β(k))TF(k)TF(k)( β-β(k)). The right side is
minimised with respect to β when β-β(k)={(F(k)TFk)-1F(k)Tr(β(k))}=δ(k). If β(k) is the
starting value, the next approximation should be β(k+1) = β(k) + δ(k). This procedure
provides an iterative scheme for obtaining the β̂ values. There are four different
methods to determine how the value of δ is computed to change the vector(s) of
parameters (SAS, 1992), these are;
(i)
Gradient, δ=XTe
(ii)
Gauss-Newton, δ=(XTX)-1XT e
77
(iii)
Newton, δ=(G-1)XTe, where G is Moore-Panrose matrix
(iv)
Marquardt, δ= (XTX + λdiag(XTX)-1XTe.
Many authors who discussed nonlinear least squares in their work, such as
Seber and Wild (1989), Ralston and Jennrich (1978), Gallant (1987) and Ratkowsky
(1983), recommended relative change convergence criteria based on changes in S(β)
and the parameters in going from the ith to the (i+1)th iteration. That is, if the relative
change in the sum of squares at the ith iteration,
(S( β
) - S( β ( i +1 ) ))
S( β( i ) )
(i)
(3.6)
falls in the interval of 0 to ξs, where ξs is a pre-selected tolerance level such as 10-4,
then the reduction in the sum of squares error is considered insufficient to warrant
further computation. This is usually accompanied by a parameter relative change
criterion such as
| β(ji +1 ) − β(ji ) |
| β(ji ) |
< ξ p , j = 1, 2, …, p.
(3.7)
When every relative parameter changes at the ith iteration is less than ξp, the
parameter increments are too small to warrant further computation and the program
terminates.
Gallant (1987) and Seber and Wild (1989) showed that the confidence
interval of β̂ is given by
βi ± tα/2
s 2 ĉii
(3.8)
(
)
ˆ )F( β
ˆ ) −1 .
where ĉii is the i-th diagonal element of Ĉcii = F T ( β
The starting value of β(1), which is the initial guess at the minimum of
β̂ value, can sometimes be suggested by prior information. Occasionally, there will
be a starting value that tends to work well for a class of problems. Fisher’s scoring
algorithm for generalised linear models as an iterative re-weighted least square
method suggests a uniform starting mechanism for the whole class of models
(McCullagh and Nelder, 1983; Bates and Watts, 1988).
78
However, it is quite difficult to deduce anything about the production of good
starting values in general. Methods that are sometimes suggested include a grid
search or a random search over a defined rectangular region of the parameter space.
If no sensible bounds can be suggested for a parameter βr, a transformed parameter
can be used. For example, ϕ=
eβ
and ϕ = arctan(β) both satisfy 0 < ϕ < 1.
1 + eβ
Draper and Smith (1981) and Ratkowsky (1983) gave detailed discussions on the
starting value of nonlinear models. Table 3.1 lists the nonlinear growth models
considered in this study.
Table 3.1: Nonlinear mathematical models considered in the study
Model
Equation
Source
Logistic
φ ( T ) = α /( 1 + β exp( − κ T )) + ε
Draper and Smith (1981)
Gompertz
φ (T ) = α exp(− β exp(−κT )) + ε
Draper and Smith (1981)
Von Bertalanffy
φ ( T ) = [α
Negative
φ (T ) = α (1 − exp(−κT )) + ε
Philip (1994)
Monomolecular
φ (T ) = α (1 − β exp(−κT )) + ε
Draper and Smith (1981)
Log-logistic
φ (T ) = α /(1 + β exp( −κ ln( T ))) + ε
Tsoularis and Wallace
1− δ
− β e −κT
]
1
1− δ
+ε
Ratkowsky (1983)
exponential
(2002)
Richard’s
1
⎡
⎞⎤
⎛
δ ⎟
⎜
φ (T ) = α / ⎢1 + β ⎜ exp(−κT )) ⎟⎥ + ε
⎠⎦⎥
⎝
⎣⎢
Ratkowsky (1983)
Weibull
φ (T ) = α − β exp(−κT δ ) + ε
Ratkowsky (1983
Schnute
φ (T ) = (α + β exp(κT ) )δ + ε
Schnute (1981), Yuancai
et al. (1997)
Morgan-Mercer-
φ (T ) = ( βγ + αT δ ) / γ + T δ + ε
Ratkowsky (1983)
Flodin
ChapmanRichards
Stannard
Morgan et al. (1975),
φ (T ) = α (1 − β exp(−κT ))
1
1− δ
+ε
φ(T ) = α [1 + exp(− (β + κT δ ))]δ + ε
Draper and Smith (1981)
Tsoularis and Wallace
(2002)
79
where α, β, κ and δ are unknown parameters, ε is random error and T is time.
3.3.2 Regression Analysis
Regression analysis has been widely used in modelling oil palm yield. This
technique is used to discover the relationship between foliar nutrient composition and
oil palm yield. Although, it is a common practice in the modelling of oil palm yield
to use regression analysis, this study focuses on different experimental locations.
We also introduced robust regression analysis to overcome the outlier problem in the
data set.
One of the most popular techniques which can be used to find the relationship
between two variables is the regression analysis. Regression analysis is carried out
to fit an equation to observe variables at a certain degree of accuracy. Since the
theoretical models can never exactly describe a real world process, modelling the oil
palm yield from leaf nutrient composition can never be perfect. The regression
analysis technique was used widely in oil palm industry by agronomists in their
research to aid policy making and decision making. The models for a multiple
regression are similar to a simple linear regression model, except that they contain
more terms and can be used to propose relationships more complex than a straight
line.
3.3.2.1 Least Squares Method
Consider a linear equation
Y = βX + ε
Equation (3.9) can be written in matrix form as follows;
(3.9)
80
⎡ y1 ⎤ ⎡1 x11 L x1 p ⎤ ⎡ β 0 ⎤ ⎡ ε1 ⎤
⎢ y ⎥ ⎢1 x L x ⎥ ⎢ β ⎥ ⎢ε ⎥
21
2p⎥ ⎢ 1 ⎥
⎢ 2⎥ = ⎢
+ ⎢ 2⎥
M
M ⎥ ⎢ M ⎥ ⎢M⎥
⎢ M ⎥ ⎢M M
⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢
⎣ yn ⎦ ⎢⎣1 xn1 L xnp ⎥⎦ ⎢⎣ β p ⎥⎦ ⎣ε n ⎦
(3.10)
In general, y is an (n x 1) vector of observations, x is an (n x p) matrix of the
levels of the independent variables, β is (p x 1) vector of the regression coefficients,
and ε is an (n x 1) vector of random errors.
We wish to find the vector of least squares estimators, b, that minimizes
n
Q=
∑ε
i =1
2
i
′
= ε′ε = [y − Xβ ] [y − Xβ ]
(3.11)
Q may be expressed as
Q = y′y − 2β′X′y + β′X′Xβ
(3.12)
Since β′X′y is a (1 x 1) matrix, its transpose ( β′X′y )′ = y′Xβ is the scalar. So,
the least squares estimators must satisfy
∂Q
= −2 X′y + 2 X′Xb = 0
∂β b
(3.13)
X′Xb = X′y
(3.14)
which implies
Equation (3.13) is set of least squares normal equations in matrix form. To
solve the normal equations, multiply both sides of equation (3.13) by the inverse of
X′X. Thus, the least squares estimator of β is
b = (X′X)-1X′y
( 3.15)
and the correspondence weight variance-covariance matrix are given as
Var( θ̂ ) = σ2(XT X)-1
(3.16)
Let say we are interested in calculate the mean response of y for the fixed
value of xr. Therefore, we have to calculate the mean and variance of the mean
81
response. If ŷ r represents the unbiased estimator of yr, the expectation of mean
response can be given by
E( ŷ r ) = E( X Tr θˆ ) = X Tr θ
(3.17)
and the variance of mean response by
Var( ŷ r ) = σ 2 X Tr ( X T X) −1 X r
(3.18)
By substituting s2 to estimate σ2, a 100(1-α)% confidence interval of mean response
can be given by
ŷr ± tα / 2,( n − p −1) s X Tr ( X T X) −1 X r
(3.19)
Multiple linear regression can also be used to predict the future value of a
dependent variable by a given input vector, Xf (Chatterjee and Price, 1991; Birkes
and Dodge, 1993). These different unknown parameter vectors, θˆ will give different
predictions. Let yp represent the prediction value on a new point, xp. The prediction
value of yp can be obtained by
yp = xp θ̂ + ep
The mean and variance of this prediction are therefore
E(yp) = E(xp θ̂ + ep)
= xTpθˆ = xTpθ
(
)
Var(yp) =Var(xp θ̂ + ep) = σ2 + xpT Var[θˆ ] x p
= σ2 + σ2 xpT (XT X ) x p
−1
= σ2(1 + xpT (XT X ) x p )
−1
A 100(1-α)% confidence interval of mean response for a predicted value is given as
yˆ p ± tα / 2, ( n − p −1) s 1 + xTp (xT x) −1 x p
(3.20)
3.3.3 Robust M-regression
A statistical procedure is regarded as robust if it performs reasonably well
even when the assumptions of the statistical model are not true. If we assume our
82
data follows standard linear regression, least squares estimates and test perform quite
well, but they are not robust with the presence of outlier observation(s) in the data set
(Rousseeuw and Leroy, 1987). In this case we are proposed the robust M-regression
to model the yield data, since the Q-Q plot detected the present of outlier in the data
set. Peter Huber introduced the idea of M-regression in year 1964 (Huber, 1981).
In least squares estimation, values of β̂ are chosen so that
∑ηˆ
2
i
is small as
possible for i = 1, 2, .., n, where η is error. In the least absolute deviation estimation,
the value of
∑ | ηˆ
i
| is smallest. In robust M-regression, this idea is generalized and
values of β̂ are chosen so that
∑ ρ (ηˆ ) is minimised, where ρ(η̂ ) is some function
i
of η. Therefore, least squares and least absolute deviation estimation can be
regarded as special cases of M-estimation where ρ(η) = η2 and ρ(η) = |η|
respectively.
Huber (1981) defined the function of ρ(e) as follows;
⎧ η2
ρ(η) = ⎨
2
⎩2 k | η | − k
if
if
−k ≤η ≤ k
η<k
or
k <η
(3.21)
Following Huber, let k = 1.5 σ̂ , where σ̂ is an estimate of the standard
deviation σ of the population of random errors. In order to make ρ(η) a smooth
function, 2k|η| - k2 is used instead of |η|, σ̂ = 1.483(MAD), where MAD is the
median absolute deviation | η̂ |. The multiplier 1.483 was chosen to ensure that σ̂
would be a good estimate of σ if it were the case that the distribution of the random
errors was normal.
The Huber M-estimates of β̂ are the values of b that minimize
∑ ρ (y
i
− (b0 + b1 xi1 + ... + bp xip ) )
where ρ(η) is the function defined in (3.21).
(3.22)
83
It is convenient to use vector notation. The vector β̂ of the Huber M-estimates is
defined as the vector b that minimizes
∑ ρ(y
i
− b′x i ) .
The vector of regression coefficients, denoted by β , is first estimated by the
vector of least-squares estimates. This initial estimate of β is used to calculate
deviations and an estimate of σ. The algorithm iterated in this way until a step is
reached at which the improved estimate of β is the same (or at least approximately
the same) as the previous estimate.
For example, at any step suppose that b0 be the current estimate of β .
Calculate the deviation yi − (b 0 )′ xi and from this result calculate σ̂ = 1.483(MAD).
Now we make the following adjustment of the y values to get rid of any large
deviations. The deviation of yi from the current estimated regression line is
η i0 = y i − (b 0 )′ xi . Thus y i = (b 0 )′ xi + η i0 . Then, define y i* = (b 0 )′ xi + η i* , where
η i* is the adjusted deviation obtained by truncating η i0 so that no deviation is larger
than 1.5 σ̂ at its absolute value. Let the improved estimate of β the least squares
estimate obtained from the adjusted data y1* , y 2* ,..., y n* . Huber M-estimates are
obtainable from the S-Plus package (Becker et al., 1988).
3.3.4
Neural Network Model
Use of the neural network model has influenced research activities in
numerous fields. This method has been applied to estimate the yield of short term
crops (Shearer et al., 1994; Drummond et al., 1995; Belanger et al., 2000; Liu et
al., 2001; Welch et al., 2003), but from the literature it is found that the application
of the neural network model in the oil palm yield modelling has not been explored.
84
3.3.4.1 Introduction to the Neural Network
Neural network or popularly known as Artificial Neural Networks (ANN),
are computational models that consist of a number of simple processing units which
communicate by sending signals to each other over a large number of weighted
connections. The original neural network design was inspired by the human brain.
In human brains, a biological neuron collects signals from other neurons through a
host of fine structures called dendrites. The neuron sends out spikes of electrical
activity through a long, thin stand known as an axon, which splits into thousands of
branches. At the end of each branch, a structure called a synapse converts the
activity from the axon into electrical effects that inhibit or excite activity in the
connected neurons. When a neuron receives excitatory input that is sufficiently large
in comparison to its inhibitory input, it sends a spike of electrical activity down its
axon. Learning occurs by changing the effectiveness of the synapses so that the
influence of one neuron on another is changed.
Like human brains, neural networks also consist of processing units (artificial
neurons) and connections (weights) between those units. The processing units
transport incoming information on their outgoing connections to other units. The
"electrical" information is simulated with specific values stored in those weights that
give these networks the capacity to learn, memorize, and create relationships
between data.
A very important feature of these networks is their adaptive nature where
"learning by example" replaces "programming" in solving problems. This feature
renders these computational models very appealing in applications where one has
little, or an incomplete understanding, of the problems to be solved, but where
training data is available.
There are many different types of neural networks, and they are being used in
many fields. New uses for neural networks are being devised daily by researchers.
Some of the most traditional applications include (Master, 1993; Welstead, 1994;
Bishop, 1995; Patterson, 1996):
85
•
Classification – To determine military operations from satellite photographs; to
distinguish among different types of radar returns (weather, birds, or aircraft); to
identify diseases of the heart from electrocardiograms.
•
Noise reduction – To recognize a number of patterns (voice, images, etc.)
corrupted by noise.
•
Prediction – To predict the value of a variable, given historic values. Examples
include forecasting of various types of loads, market and stock forecasting, and
weather forecasting.
The model built in this study falls into the category of prediction. It may be classified
into modelling using ANN to predict oil palm yield.
3.3.4.2 Fundamentals of Neural Networks
Neural networks, sometimes referred to as connectionist models, are paralleldistributed models that have several distinguishing features (Patrick and Smagt,
1996; Patterson, 1996; Lai 1998; Hykin, 1999):
a) A set of processing units;
b) An activation state for each unit, which is equivalent to the output of the unit;
c) Connections between the units. Generally each connection is defined by a
weight wjk that determines the effect that the signal of unit j has on unit k;
d) A propagation rule, which determines the effective input of the unit from its
external inputs;
e) An activation function, which determines the new level of activation based on
the effective input and the current activation;
f) An external input (bias, offset) for each unit;
g) A method for information gathering (learning rule);
h) An environment within which the system can operate, provide input signals
and, if necessary, error signals.
86
3.3.4.3 Processing Unit
A processing unit (Figure 3.3), also called a neuron or node, performs a
relatively simple job; it receives inputs from neighbours or external sources and uses
them to compute an output signal that is propagated to other units.
x0
x1
θj
w j0
w j1
Σ
...
w jn
xn
n
j
aj
aj = ∑ w jixi + θj
g(a j)
zj
zj = g (aj )
i =1
Figure 3.3: Processing unit
Within the neural systems there are three types of units:
(i)
Input units, which receive data from outside the network;
(ii)
Output units, which send data out of the network;
(iii)
Hidden units, whose input and output signals remain within the network.
Each unit j can have one or more inputs x0, x1, x2, …, xn, but only one output zj.
An input to a unit is either data from outside the network, the output of another unit,
or its own output.
3.3.4.4 Combination Function
Each non-input unit in a neural network combines values that are fed into it
via synaptic connections from other units, producing a single value called net input.
The function that combines those values is known as the combination function,
which is defined by a certain propagation rule. In most neural networks it is assumed
that each unit provides an additive contribution to the input of the units with which it
is connected. The total input to unit j is simply the weighted sum of the separate
outputs from the connected units plus a threshold or bias term θj :
87
n
aj = ∑ wjixi + θj
(3.23)
i =1
The contribution for positive wji is considered as an excitation and the
contribution for negative wji is considered an inhibition. We call units with the above
propagation rule sigma units. In some cases more complex rules for combining inputs
are used. One of the propagation rules known as sigma-pi has the following format as
follows
n
m
i =1
k =1
aj = ∑ wji∏ xik + θj
(3.24)
Many combination functions use a "bias" or "threshold" term in computing the
net input to the unit. For a linear output unit, a bias term is equivalent to an intercept
in a regression model. It is required in much the same way as the constant
polynomial ‘1’ is required for approximation by polynomials.
3.3.4.5 Activation Function
Most units in neural network transform their net inputs by using a scalar-toscalar function called an activation function, yielding a value known as the unit's
activation. With the possible exception of output units, the activation value is fed to
one or more other units. Activation functions with a bounded range are often called
squashing functions. Some of the most commonly used activation functions are
(Fausett, 1994):
i) Identity function (Figure 3.4)
g ( x) = x
(3.25)
It is clear that the input units use the identity function. Sometimes, a constant is
multiplied by the net input to form a linear function.
88
g(x)
1
0
-1
0
1
x
-1
Figure 3.4: Identity function
ii) Binary step function (Figure 3.5)
Also known as threshold function or Heaviside function. The output of this
function is limited to one of the two values:
⎧1
g ( x) = ⎨
⎩0
if( x ≥ θ )
if( x < θ )
(3.26)
This kind of function is often used in single layer networks.
g(x)
1
0
-1
x
0
1
2
3
Figure 3.5: Binary step function
iii) Sigmoid function (Figure 3.6)
g ( x) =
1
1 + e−x
(3.27)
This function is especially advantageous for use in neural networks trained by
back-propagation, because it is easy to differentiate, and thus can dramatically reduce
the computation burden for training. It applies to applications whose desired output
values are between 0 and 1.
89
g(x)
1
0
-6
-4
-2
x
0
2
4
6
Figure 3.6: Sigmoid function
iv) Bipolar sigmoid function (Figure 3.7)
g ( x) =
1 − e− x
1 + e−x
(3.28)
This function has similar properties to the sigmoid function. It works well for
applications that yield output values in the range of [-1,1].
g(x)
1
0
-6
-4
-2
0
2
4
6
x
-1
Figure 3.7: Bipolar sigmoid function
Activation functions for the hidden units are needed to introduce non-linearity
into the networks. The reason for this is that a composite of linear functions is also a
linear function. However, it is the non-linearity (i.e., the capability to represent
nonlinear functions) that makes multi-layer networks so powerful. Almost any
nonlinear function does the job, although for back-propagation learning it must be
differentiable and it helps if the function is bounded. The sigmoid functions are the
most common choices (Sarles, 1997).
For the output units, activation functions should be chosen to be suited to the
distribution of the target values. We have already seen that for binary (0,1) outputs,
the sigmoid function is an excellent choice. For continuous-valued targets with a
90
bounded range, the sigmoid functions are again useful, provided that either the outputs
or the targets can be scaled to the range of the output activation function. However, if
the target values have no known bounded range, it is better to use an unbounded
activation function. The identity function (which amounts to no activation function) is
often used for this purpose. If the target values are positive but have no known upper
bound, an exponential output activation function can be used (Sarles, 1998).
3.3.4.6 Network topologies
The number of layers, the number of units per layer, and the interconnection
patterns between layers defines the topology of a network. They are generally
divided into two categories based on the pattern of connections:
1) Feed-forward networks (Figure 3.8), where the data flow from input units to
output units is strictly feed-forward. The data processing can extend over
multiple layers of units, but no feedback connections are present. That is,
connections extending from outputs of units to inputs of units in the same layer
or previous layers are not permitted. Feed-forward networks are the main focus
of this study (Patterson, 1996; Hykin, 1999).
x0
b ia s
h0
x1
b ia s
y1
h1
x2
w
(1 )
ji
hm
H idd en L ayer
…
In pu t L aye r
…
…
xl
y2
h2
w
(2)
kj
yn
O utp ut La yer
Figure 3.8: Feed-forward neural network
2) Recurrent networks (Figure 3.9), which contain feedback connections.
Contrary to feed-forward networks, the dynamical properties of the network
91
are important. In some cases, the activation values of the units undergo a
relaxation process to ensure that the network will evolve to a stable state in
which activation does not change further. In other applications in which the
dynamical behavior constitutes the output of the network, the changes of the
activation values of the output units are significant (Patterson 1996 and Hykin
1999).
h
0
x
1
x
l
h
1
…
…
h
In p u t L a y e r
0
y
0
y
1
y
n
…
x
m
H id d e n L a y e r
O u tp u t L a y e r
Figure 3.9: Recurrent neural network
Alternatively, neural networks can be divided in two classes, supervised and
unsupervised neural networks. Supervised neural networks, such as the perceptron,
use a supervised learning algorithm, which means that input and output data is
required during the training phase. The most common training algorithm is the
backpropagation algorithm. On the other hand, unsupervised neural networks, such
as the Kohonen Network, require only input data to be trained. They organise the
input data themselves, according to a similarity metric.
A recurrent neural network is a neural network where the connections
between the units form a directed cycle. Recurrent neural networks must be
approached differently than feedforward neural networks, both when analysing their
behavior and training them. Recurrent neural networks behave chaotically. Usually,
dynamical systems theory is used to model and analyse them.
92
3.3.4.7 Network Learning
The functionality of a neural network is determined by the combination of the
topology (number of layers, number of units per layer, and the interconnection
pattern between the layers) and the weights of the connections within the network.
The topology is usually held fixed, and a certain training algorithm determines the
weights. The process of adjusting the weights to make the network learn the
relationship between the inputs and targets is called learning, or training. Many
learning algorithms have been invented to help find an optimum set of weights that
results in the solution of the problems. They can roughly be divided into two main
groups:
(i)
Supervised Learning - The network is trained by providing it with inputs
and desired outputs (target values). These input-output pairs are provided
by an external teacher, or by the system containing the network. The
difference between the real outputs and the desired outputs is used by the
algorithm to adapt the weights in the network (Figure 3.10). It is often
posed as a function approximation problem - given training data consisting
of pairs of input patterns x, and corresponding target t. The goal is to find a
function f(x) that matches the desired response for each training input.
Oil Palm Data
Input
Desired output
Feedforward Neural
Network
+
in
out
Weight
changes
error
Objective
Function
Training Algoritham
- Back-propagation
- Training using
Levernberg-Marquardt
Figure 3.10: Supervised learning model for oil palm data
93
(ii) Unsupervised Learning - With unsupervised learning, there is no feedback from
the environment to indicate if the outputs of the network are correct. The network
must discover features, regulations, correlations, or categories in the input data
automatically. In fact, for most varieties of unsupervised learning, the targets are the
same as inputs. In other words, unsupervised learning usually performs the same
task as an auto-associative network, compressing the information from the inputs.
3.3.4.8 Objective Function
The objective function is used to determine how well the network performs.
The function must be defined to provide an unambiguous numerical rating of system
performance. The selection of an objective function is very important because the
function represents the design goals and decides what training algorithm should be
taken. Even though it is difficult to develop an objective functions that measures
exactly, a few basic functions may be used, such as sum squares error function,
E=
1 P
∑
NP p =1
N
∑ (t
pi
− ypi ) 2
(3.29)
i =1
where p indexes the patterns in the training set, i indexes the output nodes, and tpi and
ypi are, respectively, the target and actual network output for the ith output unit on the
pth pattern. In real world applications, it may be necessary to complicate the function
with additional terms to control the complexity of the model.
3.3.4.9 Basic Architecture Feed-Forward Neural Network
A layered feed-forward network consists of a certain number of layers, and
each layer contains a certain number of units. There is an input layer, an output
layer, and one or more hidden layers between the input and the output layer. Each
unit receives its inputs directly from the previous layer (except for the input units)
and sends its output directly to units in the next layer (except for the output units).
Unlike the Recurrent network, which contains feedback information, there are no
94
connections from any of the units to the inputs of the previous layers, nor to other
units in the same layer, nor to units more than one layer ahead. Every unit only acts
as an input to the next immediate layer. Obviously, this class of networks is easier to
analyze theoretically than other general topologies because their outputs can be
represented with explicit functions of the inputs and the weights.
An example of a layered network with one hidden layer is shown in Figure 3.8.
In this network there are l inputs, m hidden units, and n output units. The output of the
jth hidden unit is obtained by first forming a weighted linear combination of the l input
values, then adding a bias,
l
aj = ∑ w (ji1) xi + w (1) j 0
(3.30)
i =1
where w (ji1) is the weight from input i to hidden unit j in the first layer and w (j10) is the
bias for hidden unit j. If we consider the bias terms as being weights from an extra
input x 0 = 1 , equation (3.30) can be rewritten in the form of,
l
aj = ∑ w (ji1) xi
(3.31)
i =0
The activation of hidden unit j, can then be obtained by transforming the linear
sum using an activation function g (x) :
hj = g (aj )
(3.32)
The outputs of the network can be obtained by transforming the activation of the
hidden units using a second layer of processing units. For each output unit k, we first
acquire the linear combination of the output of the hidden units,
m
ak = ∑ wkj( 2 ) hj + wk( 20)
(3.33)
j =1
Again, we can absorb the bias, and rewrite the above equation as,
m
ak = ∑ wkj( 2) hj
j =0
(3.34)
95
Next, by applying the activation function g 2 ( x) to equation (3.34) we can acquire the
kth output
yk = g 2 (ak )
(3.35)
Combining equations (3.31), (3.32), (3.34) and (3.35) we acquire the complete
representation of the network as
⎡m
⎛ l
⎞⎤
yk = g 2 ⎢∑ wkj( 2) g ⎜ ∑ w (ji1) xi ⎟⎥
⎝ i =0
⎠⎦
⎣ j =0
(3.36)
The network of Figure 3.8 is a network with one hidden layer. We can extend
it to a network with two or more hidden layers easily as long as we continue with the
above transformation. One thing we need to note is that the input units are very
special units. They are hypothetical units that produce outputs equal to their
supposed inputs. These input units do no processing.
(a)
Backpropagation
Back-propagation is the most commonly used method for training multi-layer
feed-forward networks. It can be applied to any feed-forward network with
differentiable activation functions. This technique was popularized by Rumelhart,
Hinton and Williams (Rumelhart et al., 1986).
For most networks, the learning process is based on a suitable error function,
which is then minimised with respect to the weights and bias. If a network has
differential activation functions, then the activations of the output units become
differentiable functions of input variables, the weights and bias. If there is a
differentiable error function of the network outputs, such as the sum-of-squares error
function, then the error function itself is a differentiable function of the weights.
96
Therefore, we can evaluate the derivative of the error with respect to the weights, and
then by using either the popular gradient descent or other optimisation methods, we
can use these derivatives to find weights that minimise the error function. The
algorithm for evaluating the derivative of the error function is known as backpropagation, because it propagates the errors backwards through the network.
(b)
Error Function Derivative Calculation
We consider a general feed-forward network with arbitrary differentiable nonlinear activation functions and a differential error function. From Section 3.3.4.9 (a),
we know that each unit j is obtained by first forming a weighted sum of its inputs of
the form,
aj = ∑ wjizi
(3.37)
i
where zi is the activation of an unit or input. We then apply the activation function
zj = g (aj )
(3.38)
Note that one or more of the variables zj in equation (3.37) could be an input, in
which case we will denote it using xi. Similarly, the unit j in equation (3.38) could be
an output unit, which we will denote using yk.
The error function will be written as a sum, over all the patterns in the training
set, of an error defined for each pattern separately,
E = ∑ Ep , Ep = E (Y ;W ) ,
(3.39)
p
where p indexes the patterns, Y is the vector of outputs, and W is the vector of all
weights. Ep can be expressed as a differentiable function of the output variable yk.
97
The goal is to find a way to evaluate the derivatives of the error function E with
respect to the weights and bias. Using equation (3.39) we can express these
derivatives separately for each pattern as sums over the training set patterns of the
derivatives.
For each pattern (with all the inputs) we can acquire the activations of all hidden
and output units in the network by successive applications of equation (3.37) and
equation (3.38). This process is called forward propagation or forward pass. Once
we have the activations of all the outputs, together with the target values, we can
calculate the full expression of the error function Ep.
Now consider the evaluation of the derivative of Ep with respect to some weight
wji . Application of the chain rule result in the partial derivatives
∂Ep ∂Ep ∂aj
∂aj
=
= δj
= δjzi
∂wji ∂aj ∂wji
∂wji
(3.40)
where we define
δj =
∂Ep
∂aj
(3.41)
From equation (3.40) it is easy to see that the derivative can be obtained by
multiplying the value of δ (for the unit at the output end of the weight) by the value of
z (for the unit at the input end). The task is now to find the value of δj for each of the
hidden and output units in the network.
For the output units, δk is very straightforward,
δk =
∂Ep ∂Ep
=
g ′(ak )
∂ak ∂yk
(3.42)
For a hidden unit, δk is obtained indirectly. Hidden units can influence the error only
through their effects on the unit k to which they send output connections. δk is obtained
using the equation below,
δj =
∂Ep
∂Ep ∂ak
=∑
∂aj
k ∂ak ∂aj
(3.43)
98
The first factor is simply the δk of unit k, therefore we can write the equation as
δj =
∂Ep
∂ak
= ∑ δk
∂aj
∂aj
k
(3.44)
For the second factor we know that if unit j connects directly to unit k then
∂ak ∂aj = g ′(aj ) wkj , otherwise it is zero. In this way, we can form the following
backpropagation formula,
δj = g ′(aj )∑ wkjδk
(3.45)
k
which means that the values of δ for a particular hidden unit can be obtained by
propagating the value of δ’s backwards from units later on in the network, as shown in
Figure 3.11. By recursively applying the equation we obtain the values of δ’s for all
of the hidden units in a feed-forward network, no matter how many layers it has.
i
wji
wkj
j
δj = g ′( aj )∑ wkjδk
k
δk
…
δj
k
Figure 3.11: Backward propagation
(c)
Weight Adjustment with the Gradient Descent Method
Once we obtain the derivatives of the error function with respect to weights, we
can use them to update the weights so as to decrease the error. There are many
varieties of gradient-based optimisation algorithms based on these derivatives. One
of the simplest of such algorithms is called gradient descent or steepest descent.
With this algorithm, the weights are updated in the direction in which the error E
decreases most rapidly (along the negative gradient). The process of weight updation
begins with an initial guess for weights (which may be chosen randomly), and then
generates a sequence of weights using the following formula,
∆w (jiτ +1) = −η
∂E
∂wji
(3.46)
99
where η is a small positive number called the learning rate (which is the increase in
size required we need for the next step). Gradient descent only tells us the direction
we should move to the step size, or learning rate, needs to be determined as well.
Setting a learning rate too low result in slow development of the network, while with
too high a learning rate will lead to oscillation. One way to avoid oscillation for large
values of η is to make the weight change dependent on the past weight change by
adding a momentum term,
∆w (jiτ +1) = −η
∂E
+ α∆w (jiτ )
∂wji
(3.47)
That is, the weight change is a combination of a step down the negative
gradient, plus a fraction α of the previous weight change, where 0 ≤ α < 1 and
typically 0 ≤ α < 0.9 ( Reed and Mark, 1999).
The role of the learning rate and the momentum term are shown in Figure
3.12 (Patrick and Smagt, 1996). When no momentum term is used, a low learning
rate typically results in a long wait before the minimum is reached (a), whereas for
large learning rates the minimum may be never reached because of oscillation (b).
When adding a momentum term, the minimum will be reached faster (c).
Figure 3.12: The descent vs. learning rate and momentum
100
There are two basic weight-update variations: batch learning and incremental
learning. With batch learning, the weights are updated using all of the training data.
The following loop is repeated: a) Process all the training data; b) Update the weights.
Each loop through the training set is called an epoch. For incremental learning, the
weights are updated separately for each sample. The following loop is repeated: a)
Process one sample from the training data; b) Update the weights.
3.3.5 RESPONSE SURFACE ANALYSIS
Response surface analysis has been widely used in oil palm research. The
purpose of implementing this technique is to determine the optimum levels of
fertiliser usage in order to optimise oil palm yield. Although, it is been a common
practice in modelling of oil palm yield to use response surface analysis, this study
focus on experimental location. Conclusions cannot be drawn if the stationary point
is saddle. Hence, this study proposes the used of ridge analysis to offer an alternative
solution for the problem mentioned above.
3.3.5.1 Introduction
Response surface analysis (RSA), also known as response surface
methodology (RSM), is a collection of statistical and mathematical techniques useful
for developing, improving and optimisation processes (Myers and Montgomery,
1995). It also has important applications in the design, development and formulation
of new products, as well as in the improvement of existing product designs. Usually,
RSA applications deal with several input variables in which we assume there is a
potential influence; some performance measure, or output of a certain product, or
process, or oil palm yield in agronomy. The input variables are sometimes called
independent variables, and they are subject to the control of the researcher or
scientist (at least for purposes of an experiment).
101
The common characteristic in experimental design analysis is to find the
influence of the various factors on the variables analysed. Response surface analysis
still considers that characteristic, but the emphasis includes finding the particular
treatment combination which causes the maximum or minimum response in yield or
the variables analysed (Anderson and McClean, 1974). In general, the response
surface will be a nonlinear function, which may be sufficiently approximated with a
quadratic polynomial in a small region near the optimum operating condition.
It should be remembered that just because a higher order effect is statistically
significant, it does not follow that the effect is of practical importance. It may be
possible to ignore statistically significant higher order effects if they have little
practical effect on the estimated response surface (Christensen, 2001).
3.3.5.2 Response Surface: First Order
Suppose that the scientist or experimenter is concerned with the maximum of
oil palm yield, involving a response y that depends on the controllable variables ζ1,
ζ2, …, ζp. The relationship can be defined as
y = φ (ζ1+ ζ2 +…+ ζp) + ε
(3.48)
where the form of the true response function f is unknown and may be very
complicated, and ε is a term that represents other sources of variability not accounted
for in f. Therefore, ε includes effects such as the measurement error on the response
and the effects of other variables, and so on. We often assume ε as a statistical error
and which has a normal distribution with a mean of zero and variance σ2. If so, then
the expectation of y, E(y) is
E(y) = E[f(ζ1, ζ2, …, ζp)] + E(ε) = f(ζ1, ζ2, …, ζp)
(3.49)
The variables ζ1, ζ2, …, ζp are usually called natural variables (Myers and
Montgomery, 1995), because they expressed the original unit. In RSA work it is
convenient to transform the natural variables to coded variables x1, x2, …, xp, where
102
these coded variables are usually defined to be dimensionless with zero mean and
same standard deviation. Equation (3.49) can now be written as
φ(x) = φ(x1, x2, …, xp)
(3.50)
Because the form of the true function f is unknown, we must make some
approximations. Usually, a low order polynomial in some relatively small region of
the independent variables space is appropriate. If there is no interaction among
independent variables, the model is known as the main effects model, because it
includes only the main effect(s) of the variable. If the interaction among the
independent variables is significant, as in equation (3.50), the interaction variables
can be added to the model easily as
φ(x) = φ(x1, x2, …, xp, x1x2, …, xp-1xp)
(3.51)
Equation (3.50) and equation (3.51) are also known as first order models,
because they only include the main effect(s) and the interaction with polynomials
order one.
The first order Taylor approximation about the center vector (x = 0) of the data is
written as
p
⎡ ∂φ ( x)
φ ( x) = φ (0) + ∑ ⎢
i =1
⎢⎣ ∂xi
⎤
⎥ xi = φ(0) + x′dφ(0)
xi = 0 ⎥
⎦
where dφ(0) is the vector of partial derivatives
(3.52)
∂φ ( x)
evaluated at the vector x = 0.
∂xi
We do not know f(x), so we do not know the partial derivatives; they are simply
unknown values. We first identify;
∂φ ( x)
∂x1
β0 = φ(0), β1=
, …, βp =
x =0
∂φ ( x)
∂x p
(3.53)
x =0
then we can write the equation as
φ(x) = β0 +
p
∑β
j =1
j
x j = β0 + x′β, where β = (β1, …, βp)′
(3.54)
Applying equation (3.48) for each observation i = 1, …, n to this result gives
yi = β0 +
p
∑β
j =1
j
xij + ε i
(3.55)
103
Often in the real cases, the true response surface is strong enough that the first
model (even with the interaction term included) is inadequate. A second order model
will likely be required in these situations. A second order model can be expressed as
follows,
φ(x) = f(x1, x2, …, xp, x1x2, x12, x22, …, xp-1xp)
(3.56)
Myers and Montgomery (1995) had a long discussion about the first and second
order models. They noted that the second order model is widely used in RSA
research, because of three reasons.
The second order model is very flexible; it can take on a wide variety of
(i)
functional forms.
It is easy to estimate the parameter in the second order model (for
(ii)
example by using least square method).
(iii)
There is considerable practical experience indicating that second order
models work well in solving real response surface problems.
3.3.5.3 Response Surface: Second Order
In response surface analysis work it is assumed that the true functional
relationship
y = f(x, β)
(3.57)
is, in fact, unknown. Here (x1, x2, …, xp) are in a centered and scaled design unit. The
genesis of the first order approximating model, or the model that contains first order
terms and low order interaction terms, is the notation of the Taylor series
approximation of equation (3.52). In general, second order response surface models
can be written as
y = β0 + β1x1 + β2x2 + … + βpxp + β11x12 + … + βppxp2 +
β12x1x2 + β13x1x3 + … + βp-1, pxp-1xp + ε
where βi is an unknown parameter and ε is a random error.
(3.58)
104
The second order Taylor approximation about the center vector x = 0 of the
data is written as
φ(x) = φ(0) + x′dφ(0) + x′[d2φ(0)]
x
2
(3.59)
where dφ(0) was defined previously, and d2φ(0) is the p x p matrix of the second
partial derivative evaluated at the vector x = 0. The element of d2φ(0) in the ith row
∂ 2φ ( x )
and j column is
evaluated at x = 0.
∂xi x j
th
Again, we do not know φ(x), so we do not know the derivatives and we must
write
φ(x) = β0 + x′β + x′Bx,
(3.60)
where β = (β1, …, βp)′ = dφ(0) as before.
We can now define B̂ as
b1p ⎤
⎡
b12
L
⎢ b11
⎥
2
2⎥
⎢
b
ˆ = ⎢ M b22 L 2 p ⎥ = 1 d 2φ (0)
B
⎢
2⎥ 2
⎢ M
O M ⎥
⎢
⎥
bpp ⎥⎦
⎢⎣sym.
(3.61)
With this definition of B, the approximation becomes
φ(x) = β0 +
p
p
∑ β j x j + ∑∑ β jk x j xk
j =1
(3.62)
j =1 k ≥ j
Applying equation (3.48) to each observation to this result gives the equation
yi = β0 +
p
p
∑ β j xij + ∑∑ β jk xij xik + ε i
j =1
(3.63)
j =1 k ≥ j
This approximation is multiple linear regression, which we already know how to fit.
105
3.3.5.4 Stationary Point
Consider the true functional relationship with the second order polynomial as
yˆ = b0 + x′b + x′Bˆ x
(3.64)
where b0, b and B̂ contains estimates of the intercept, linear and second order
coefficients, respectively. In fact, x′ = [x1, x2, …, xp], b′=[b1, b2, …, bk] and B̂ is the
k x k symmetric matrix
b1 p ⎤
⎡
b12
L
⎢ b11
⎥
2
2 ⎥
⎢
b2 p ⎥
b22 L
Bˆ = ⎢ M
⎢
2 ⎥
⎢ M
O M ⎥
⎢
⎥
b pp ⎦⎥
⎣⎢ sym.
(3.65)
One can differentiate ŷ in equation (3.65) with respect to x and obtain
∂yˆ
= b + 2Bˆ x
∂x
(3.66)
where b is the estimate of the linear coefficients and is the estimate of the second
order coefficients.
Allowing the derivative to be set to 0, we can solve for the stationary point, xs
of the system. As a result, we obtain the solution xs as
− Bˆ −1b
xs =
2
(3.67)
The sign of the stationary point is determined from the signs of the eigenvalues of the matrix B̂ . It turns out that the relative magnitudes of these eigenvalues can be helpful in the total interpretation. For example, let the k x k matrix G
be the matrix whose columns are the normalised eigenvectors associated with the
eigenvalues of Bˆ . We know that G′ Bˆ G = Λ, where Λ is a diagonal matrix
containing the eigenvalues of B̂ as main diagonal elements. If we translate the
model of equation (3.54) to a new centre, namely the stationary point, and rotate to
axes corresponding to the principle axes of the contour system, we have
v = x - xs and w = G′v
(3.68)
106
This translation gives
ŷ = b0 + (v + xs)′ b + (v + xs)′ B̂ (v + xs)
= ŷ s + v′ B̂ v
(3.69)
The rotation gives
ŷ = ŷ s + w′ G′ B̂ Gw = ŷ s + w′Λw
(3.70)
The w-axes are the principle axes of the contour system. Equation (3.60) can be
written as
p
ŷ = ŷ s +
∑λ w
i =1
i
2
i
(3.71)
where ŷ s is the estimated response at the stationary point, and λ1, λ2, λ3, …, λp are
the eigenvalues of B̂ . The variables w1, w2, w3, …, wp are known as canonical
variables.
The sign of the λ values determine the nature of the stationary point, and the
relative magnitude of eigenvalues help us to gain a better understanding of the
response system. If all the λ values are negative, the stationary point is a point of
maximum response. If all λ values are positive, the stationary point is a point of
minimum response, and if the λ values have mixed signs, the stationary point is a
saddle point (Myers and Montgomery, 1995 ; Christensen, 2001).
3.3.5.5 Ridge Analysis
The main purpose of ridge analysis is to ensure that the stationary point is
inside the experimental region. The output of the analysis is the set of coordinates of
the maxima (or minima) along with the predicted response, ŷ , at each computed
point on the path. This analysis provides useful information regarding the roles of
the design variables inside the experimental region. Ridge analysis may provide
some guidelines regarding where future experiments should be made in order to
107
achieve condition that are more desirable. However, ridge analysis is generally used
when the practitioner feels that the point is near the region of the optimum.
Consider equation (3.54), which maximize ŷ subject to the constraint x′x =
H2, where x′= [x1, x2,…, xp] and the center of the design region is taken to be x1 = x2
= …= xp = 0. Using Langrange multipliers, differentiate, J = b0 + x′b + x′ B̂ x - κ(
x′x – H2) with respect to the vector x. The derivative of J with respect to x is given
by
∂J
= b + 2Bˆ x − 2κx , and the constrained stationary point is determined by setting
∂x
∂J ∂x = 0. This gives the result
(Bˆ − κI )x = − 12 b
(3.72)
As a result, for any fixed value of κ, a solution x of equation (3.62) is a
stationary point on H = (x′x)1/2. However, the appropriate solution x is that which
results in a maximum ŷ on H or a minimum ŷ on H, much depending on which is
desired. The appropriate choice of κ depends on the eigenvalues of the B̂ matrix.
Myers and Montgomery (1995) provided important rules on selecting the value of κ.
(i)
If κ exceeds the largest eigenvalue of B̂ , the solution x in equation (3.62)
will result in an absolute maximum for ŷ on H = (x′x)1/2.
(ii)
If κ is smaller than the smallest eigenvalue of B̂ , the solution x in
equation (3.62) will result in an absolute minimum for ŷ on H = (x′x)1/2.
Appendix E provides some mathematical insight into (i) and (ii) above. Then we
also examined the relationship between H and κ. The analyst desires to observe
results on a locus of points. As a result, the solution of equation (3.72) should fall in
the interval [0, Hb], where Hb is a radius approximately representing the boundary of
the experimental region. The value H is actually controlled through the choice of κ.
In the working region of κ namely κ > λm or κ < λ1, (where λ1 is the smallest
eigenvalue of B̂ and λm is the largest eigenvalue of B̂ ), H is a monotonic function of
κ.
108
3.3.5.6 Estimate of the standard error of predicted response
We now consider the equation
ŷ (x) = x(Ω)′b
(3.73)
where b = (X′X)-1X′y, x(Ω) is the function of the location at which one is predicting
responses, and the Ω notation indicates ‘model space’, that is, the vector reflects the
form of the model as X does.
From the result of the multiple linear regression, with the assumption of
constant error variance σ2, we have
Var ŷ (x) = x(Ω)′(X′X)-1 x(Ω).σ2
(3.74)
An estimated standard error of ŷ (x) can be given by
s yˆ = s x ( Ω )′′ ( X′X) −1 x ( Ω )
(3.75)
where s is the root mean square error of the fitted response surface.
Then, we use equation (3.75) to determine the confidence limits around a
predicted response. The 100(1-α)% confidence interval on the mean response E(y/x)
is given by
ŷ (x) ± tα / 2,n − p s x ( Ω )′′ ( X′X) −1 x ( Ω )
(3.76)
3.4. SUMMARY
The methodology of this research can be categorised as an exploratory
research. It explores the potential of various nonlinear growth models and the used
of heuristic methods to model three types of data. In this study three types of data set
are considered to be analyzed using different approaches. The summary of the data
analyzed is given in Table 3.2. This chapter also describes the functions used to
measure the goodness of fit for each model tested.
109
Table 3.2: Summary of the data set types and research approaches considers in this
study
Type of data
1. Oil palm growth data
Research approach
1. Nonlinear growth model
1. Multiple linear regression
2. Foliar composition
2. Robust M-regression
3. Neural network model
3. Fertiliser treatment data
1. Response surface analysis
110
CHAPTER 4
MODELLING OIL PALM YIELD USING NONLINEAR GROWTH MODEL
4.1
INTRODUCTION
Predicting the future growth and yield of oil palm is an essential part of the
planning process for oil palm industry. Growth is measured as change in some
characteristics (weight, basal area, volume, etc) over some specified amount of time
and yield actually is the amount of some characteristic that can be harvested per
period. Growth may also be defined as an increase in size, number, frequency and
others; as, the growth of trade, the growth of power, the growth of yield, etc. Growth
can be something which has grown or is growing, anything produced, product,
consequence, effect or result. It is also defined as the rate of increase in size per unit
time (Alder, 1980; Seber and Wild, 1988; Garcia, 1988; 1989; Meade and
Islam,1995; Lei and Zhang, 2004). Generally, the maximum amount of output that
oil palm trees can yield at any time is the growth that has accumulated up to that
time. Growth and yield can be measured in physical units or in value.
A growth model predicts future values of certain outputs (Garcia, 1993),
example, timber volume, crop yield, population growth, etc. The inputs and outputs
are function of time. Alder (1980) stated that growth models attempt to predict
directly the course over time of the quantities of interest (volumes, weights, mean
diameter etc.).
111
Seber and Wild (1988) defined empirical growth curve is a scatterplot of some
measure of size of an object or individual against time, T.
Growth curves or growth models have been widely used in modelling yield
and many studies have investigated the pattern of yield growth models for crops. In
Chapter 2, we have given the detail reviews on the application of growth model for
predicting crop yield. Mean while, Zuhaimy et al. (2003) and Azme and Zuhaimy
(2004) have studied on tobacco yield in Malaysia. Growth curves were used to
model and forecast the development of many different fields such as economic
growth (Bass, 1969; Oliver, 1970; Heeler and Hustad, 1980), marketing and sales
development (Chadda and Chitgopekar, 1971; Meade, 1984; Rao, 1985; Meade,
1985; Amstrong et al., 1987; Bewley and Fiebig, 1988; Lee et al., 1992; Meade
and Islam 1995), human populations (Meade, 1988) and transportation growth
(Tanner, 1978).
In this chapter, we presented our work on employing growth models to
predict oil palm yield at any time, T. There are similarities in characteristics
between the growth model and oil palm yield. Both have inflection points (where
growth rate is maximum), a stable or saturated level and fitted in sigmoid or concave
curve shape. It is therefore possible to use growth model in modelling oil palm yield.
The growth model methodology described in the chapter 3, has been widely
used to model plant growth. Since the growth of living things are normally nonlinear
(Richards, 1969) it is reasonable to explore the use of the nonlinear growth model to
oil palm yield. In this chapter a nonlinear growth model is developed, and the use of
partial derivatives of twelve nonlinear growth models is proposed. Growth studies in
many branches of science have demonstrated that more complex nonlinear functions
are justified and required if the range of the independent variable encompasses
juvenile, adolescent, mature and senescent stages of growth (Philip, 1994). In the oil
palm industry, there are only a few theoretical models formulated specifically for oil
palm industry applications. The modelling the growth model in other disciplines and
applications has revealed considerable potential for the modelling of fresh fruit
bunch growth and oil palm yield. This is partly attributed to the fact that the
statistical methodology used for fitting nonlinear models to oil palm yield growth
112
data is closely related to the mathematics of the models and has not yet been
explored.
4.2
THE NONLINEAR MODEL
For a nonlinear regression model
φi = f(Ti, β) + εi
(4.1)
i = 1, 2, …, 19, where φ is the response variable, t is the independent variable, β is
the vector of parameters βj to be estimated (β1, β2, …, βk), εi is a random error term, k
is the number of unknown parameters. Some of the nonlinear models explored in
this study are as follows;
1. Logistic
φ ( t ) = α /( 1 + β exp( − κ T )) + ε
2. Gompertz
φ (t ) = α exp(− β exp(−κT )) + ε
3. Von Bertalanffy
φ ( t ) = [α
4. Negative exponential
φ (t ) = α (1 − exp(−κT )) + ε
5. Monomolecular
φ (t ) = α (1 − β exp(−κT )) + ε
6. Log-logistic
φ (t ) = α /(1 + β exp( −κ ln( T ))) + ε
7. Richard’s
1− δ
− β e − kT
⎡
⎛
⎢⎣
⎝
]
1
1− δ
1
⎞⎤
φ (t ) = α / ⎢1 + β ⎜⎜ exp(−κT )) δ ⎟⎟⎥ + ε
⎠⎥⎦
8. Weibull
φ (t ) = α − β exp(−κT δ ) + ε
9. Schnute
φ (t ) = (α + β exp(κT ) )δ + ε
10. Morgan-Mercer-Flodin
φ (t ) = ( βγ + αT δ ) / γ + T δ + ε
11. Chapman-Richards
12. Stannard
+ε
1
φ (t ) = α (1 − β exp(−κT )) 1−δ + ε
φ(t) = α [1 + exp(− (β + κT δ ))]δ + ε
Where α, β, κ and δ are unknown parameters, ε is random error and T is time.
Nonlinear models are more difficult to specify and to estimate than linear models as
113
the solutions are determined using the iterative procedure (Ratkowsky, 1983; Draper
and Smith, 1981) as discussed in Chapter 3.
This study explores the use of partial derivatives of twelve nonlinear growth
models and demonstrates the method of parameter estimation using experimental oil
palm yield growth data. A nonlinear procedure (NLIN) in Statistical Analysis
System (SAS) will be employed to estimate the model’s parameters (SAS, 1992).
4.3
THE METHOD OF ESTIMATION
The estimators of βj’s are found by minimizing the sum squares error (SSerr)
function (equation 3.4), and can be rewritten as follows,
SSerr =
2
19
∑ [φ i − f (t i , β)]
(4.2)
i =1
under the assumption that the εi is normal and independent with a mean zero and a
common variance σ2. Since φi and ti are fixed observations, the sum of squares error
is a function of β. Least squares estimates of β are values which when substituted
into equation (4.2) will make the SSerr a minimum, and are found by differentiating
the equation (4.2) with respect to each parameter and setting the result to zero. This
provides the k normal equations that must be solved for β̂ . These normal equations
take the form
19
⎡
⎤
i =1
⎣⎢ ∂β j
⎦⎥
∂f (t , β)
∑ {φ i − f (t i , β)}⎢ i ⎥ = 0
(4.3)
for j = 1, 2, ..., k. When the model is nonlinear in the parameters, the normal
equations cannot be solved directly. Consequently, for the nonlinear models
considered in this study, it is impossible to obtain a closed form solution to the least
squares estimate of the parameters by solving the k normal equations described in
equation (4.3). Hence an iterative method must be employed to minimize the SSerr
(Draper and Smith, 1981; Ratkowsky, 1983; Gallant, 1989).
114
The Marquardt iterative method is an estimator method, which represents a
compromise between the Gauss-Newton method and the steepest descent method. It
is a method that combines the best features of both while avoiding their most serious
limitations (Seber and Wild, 1983). Due to this characteristic we decided to use the
Marquardt method. The Marquardt iterative method requires specification of the
names and starting values of the parameters to be estimated. This model uses a
single dependent variable, and partial derivatives of the model with respect to each
parameter (SAS, 1992). The usual statistical test which is appropriate in the case of a
general linear model is in general, not appropriate when the model is nonlinear. One
cannot simply use the F statistic to obtain conclusions at any stated level of
significance (Draper and Smith, 1981; Ratkowsky, 1983).
This study considers several procedures to test the goodness of fit for the
nonlinear model, such as confidence interval of parameter estimates, asymptotic
correlation matrix and residual analysis. The normality probability plot was carried
out. We also consider the four measurements commonly use in any research on
model fitting, namely, mean squares error (MSE), mean absolute error (MAE),
correlation coefficient, r between actual values and estimated values, and mean
absolute percentage error (MAPE) as describe in section 1.6.3.
4.4
PARTIAL DERIVATIVES FOR THE NONLINEAR MODEL
Let the symbols of the parameters α, β, κ and δ, in the non-linear model be
replaced by new symbols α0, α1, α2 and α3 respectively. The parameters for all
models considered here are defined as follows: α0 is the asymptote or the potential
maximum of the response variable; α1 is the biological constant; α2 is the parameter
governing the rate at which the response variable approaches its potential maximum;
and α3 is the allometric constant. The partial derivatives of the models with respect
to each parameter ( ∂φ ∂α j ) are given in Table 4.1 to Table 4.4. The NLIN
procedure in SAS (1992) requires the integral form (s) and the partial derivatives of
the nonlinear models must be entered in the program using a valid SAS syntax.
115
Table 4.1: Partial derivatives of the Logistic, Gompertz and von Bertalanffy growth
models
Model and partial derivatives
Logistic: φ(t) = α0 /(1 + α1 exp(−α2t))+ ε
∂φ ∂α 0
= 1/(1 + α1 exp(-α2t))
∂φ ∂α1
= (-α0 exp(-α2t))/(1 + α1 exp(-α2t))2
∂φ ∂α 2
= (α0α1t)/(1 + α1 exp(-α2t))2)(exp(-α2t))
Gompertz: φ (t ) = α 0 exp(−α 1 exp(−α 2 t )) + ε
∂φ ∂α 0
= exp(α1 exp(-α2t))
∂φ ∂α1
= -α0 exp(-α1 exp(-α2t))(exp(-α2t))
∂φ ∂α 2
= α0α1t exp(-α1 exp(-α2t))(exp(-α2t))
[
Von Bertalanffy: φ (t ) = α 0
∂φ ∂α 0
∂φ ∂α1
∂φ ∂α 2
= (α 0
−α 3
[
)α0
(1−α 3 )
(1−α 3 )
− α 1 exp(−α 2 t )
(
= (− exp(−α 2 t ) /(1 − α 3 ) ) α 0
(1−α 3 )
[
= (α 1t /(1 − α 3 ) )(exp(−α 2 t ) ) α 0
∂φ ∂α 3
− α 1 exp( −α 2 t )
(1−α 3 )
)
+ε
1
−1
(1−α 3 )
− α 1 exp(−α 2 t )
(1−α 3 )
)
1
−1
(1−α 3 )
− α 1 exp(−α 2 t )
(
0
1
1−α 3
]
⎧⎪ ⎛
(1−α )
= ⎨exp⎜⎜ (1 /(1 − α 3 )) ln α 0 3 − α 1 exp(−α 2 t
⎪⎩ ⎝
{(ln(α
]
) (
)
]
1
−1
(1−α 3 )
(
− α 1 exp(−α 2 t ) /(1 − α 3 ) − ln(α 0 ) α 0
1
−1
(1−α 3 )
⎫⎪
⎞
⎟⎟ (1 − α 3 )⎬ *
⎪⎭
⎠
(1−α 3 )
)/(α
(1−α 3 )
0
− α 1 exp(−α 2 t )
))}
116
Table 4.2: Partial derivatives of the Negative exponential, Monomolecular, loglogistic and Richard’s growth models
Negative exponential: φ (t ) = α 0 (1 − exp(−α 2 t )) + ε
∂φ ∂α 0
= (1-exp(-α2t))
∂φ ∂α1
- does not exist
∂φ ∂α 2
= (α0t exp(-α2t))
Monomolecular: φ (t ) = α 0 (1 − α 1 exp(−α 2 t )) + ε
∂φ ∂α 0
= (1- α1 exp(-α2t))
∂φ ∂α1
= (-α0 exp(-α2t))
∂φ ∂α 2
= (α0α1t exp(-α2t))
Log-logistic: α0/(1 + α1 exp(-α2ln(t))) + ε
∂φ ∂α 0
= 1/(1 + α1 exp(-α2ln(t)))
∂φ ∂α1
= [α0 exp(-α2ln(t))][ 1 + α1 exp(-α2ln(t))]2
∂φ ∂α 2
= [α0α1ln(t) exp(-α2ln(t))]/ (1 + α1 exp(-α2ln(t)))2
1
⎡
⎤
Richard’s: φ (t ) = α 0 / ⎢(1 + α 1 exp(−α 2 t )) )α 3 ⎥ + ε
⎣
⎦
∂φ ∂α 0
∂φ ∂α1
∂φ ∂α 2
−1
= (1 + α 1 exp(−α 2 t ) )α 3
−1
= (− α 0 / α 3 )(1 + α 1 exp(−α 2 t ) )α 3
−1
−1
= (α 0α 1t / α 3 )(1 + α 1 exp(−α 2 t ) )α 3
∂φ ∂α 3
−1
(exp(−α 2 t ) )
−1
(exp(−α 2 t ) )
= α 0 (1 + α 1 exp(−α 2 t ) )α 3 ln(1 + α 1 exp(−α 2 t ) )α 3− 2
117
Table 4.3: Partial derivatives of the Weibull, Schnute and Morgan-Mercer-Flodin
growth models
Weibull: φ (t ) = α 0 − α 1 exp(−α 2 t α 3 ) + ε
∂φ ∂α 0
= 1.0
∂φ ∂α1
= − exp − α 2 t α 3
∂φ ∂α 2
(
)( )
= exp(− α t )α α ln(t )t
∂φ ∂α 3
(
)
= exp − α 2 t α 3 α 1t α 3
2
α3
1
2
α3
Schnute: φ (t ) = (α 0 + α 1 exp(α 2 t ) ) 3 + ε
α
∂φ ∂α 0
∂φ ∂α1
∂φ ∂α 2
∂φ ∂α 3
(
= α 3 (α 0 + α 1 exp(α 2 t ) )
α 3 −1
)
= (α 3 exp(α 2 t ) )(α 0 + α 1 exp(α 2 t ) )
α 3 −1
= (α 1α 3 t exp(α 2 t ) )(α 0 + α 1 exp(α 2 t ) )
α 3 −1
= (α 0 + α 1 exp(α 2 t ) ) 3 ln (α 0 + α 1 exp(α 2 t ) )
α
Morgan-Mercer-Flodin: φ (t ) = α 0 − (α 0 − α 1 ) /(1 + (α 2 t )α 3 ) + ε
∂φ ∂α 0
∂φ ∂α1
∂φ ∂α 2
∂φ ∂α 3
= 1 − (1 + (α 2 t )α 3
= (1 + (α 2 t ) −α 3
(
)(
= α 3 (α 0 − α 1 )(α 3t )α 3 / α 2 ((1 + (α 2 t )α 3 ) 2
(
)
(
)
= (α 0 − α 1 ) ln(α 2 t )(α 2 t ) α 3 / α 2 1 + (α 2 t ) α 3
)
2
118
Table 4.4: Partial derivatives of the Champan-Richard and Stannard growth models.
Chapman-Richard: φ (t ) = α 0 (1 − α 1 exp(−α 2 t ))
= (1 − α 1 exp(−α 2 t ) )1−α 3
∂φ ∂α1
= (− α 0 /(1 − α 3 ) )(1 − α 1 exp(−α 2 t ) )1−α 3
∂φ ∂α 3
+ε
1
∂φ ∂α0
∂φ ∂α 2
1
1−α 3
1
1
−1
= (α 0α 1t /(1 − α 3 ) )(1 − α 1 exp(−α 2 t ) )1−α 3
(
)
(exp(−α 2 t ) )
−1
(exp(−α 2 t ) )
1
= α 0 /(1 − α 3 ) 2 (1 − α 1 exp(−α 2 t ) )1−α 3 ln (1 − α 1 exp(−α 2 t ) )
Stannard: φ (t ) = α 0 [1 + exp(− ((α 1 + α 2 t ) α 3 ))]
α3
∂φ ∂α 0
∂φ ∂α1
∂φ ∂α 2
∂φ ∂α 3
= (1 + exp(− ((α 1 + α 2 t ) α 3 )))
−α 3
= α 0 exp(− (α 1 + α 2 t α 3 ))(1 + exp(− (α 1 + α 2 t α 3 )))
1−α 3
= α 0 t exp(− ((α 1 + α 2 ) / α 3 ))(1 + exp(− ((α 1 + α 2 ) / α 3 )))
1−α 3
α 0 [1 + exp(− ((α 1 + α 2 t ) / α 3 ))]−α ln[1 + exp(− ((α 1 + α 2 t ) / α 3 ))]
* {− (((α 1 + α 2 t ) exp(− ((α 1 + α 2 t ) / α 3 ) )) / α 3 ) + ((α 1 + α 2 t ) / α 3 )}
3
=
The Marquardt algorithm requires a starting value for each parameter to be
estimated. Starting value specification is one of the most difficult problems
encountered in estimating the parameters of nonlinear models (Draper and Smith,
1981). Inappropriate starting values will result in longer iteration, greater execution
time, non-convergence of the iteration, and possibly convergence to unwanted local
minimum sum squares residual. The simplest parameter to specify is the α0. This is
attributed to the clarity of its definition. The parameter α0 is defined as the
maximum possible value of the dependent variable determined by the productive
capacity of the experimental site.
Therefore, in our case α0 was specified as the maximum value of the response
variable in the data. Furthermore, the α2 parameter is defined as the constant rate at
which the response variable approaches its maximum possible value α0. For
modelling biological growth variable, the allometric constant α3 lies between zero
and one for the Chapman-Richards growth model and is also positive for the von
119
Bertalanffy, Richard’s, Weibull and Morgan-Mercer-Flodin growth models. Finally,
α1 parameter can be specified by evaluating the models at the start of growth when
the predictor variable is zero.
The computation of the initial parameters estimate for logistic growth is given
below. Consider logistic growth model;
φ = α /( 1 + β exp( − κ T )) + ε
(4.4)
By taking natural logarithmas in equation (4.4), and setting η = ln φ and τ = lnα, we
obtained
η = τ − ln (1 + βeκT )
(4.5)
Nonlinear estimation procedures require initial parameter estimates and the
better these initial estimates are, the faster will be the convergence to the fitted
values. In fact, experience with growth models shows that, if the initial estimates are
poor, convergence to the wrong final values can easily occur. There is no general
method for obtaining initial estimates. One uses whatever information is available.
For example, for the logistic model (equation (4.4)), we can argue in this manner;
(i)
When T = ∞ , η = τ. So take τ0 = ymax
(ii)
For any two other observations, the ith and jth say, set
(
yi = τ 0 − ln 1 + β 0e −κ 0Ti
)
(
and y j = τ 0 − ln 1 + β 0 e
−κ 0T j
)
acting as though, equation (4.5) were true without error for these
observations. Then, developing, we find that
exp(τ 0 − y i ) − 1 = β 0 exp(−κ 0Ti ) and exp(τ 0 − y j ) − 1 = β 0 exp(−κ 0T j )
whereupon by division, taking natural logarithms, and rearrangement, we
obtain
κ0 =
⎡ exp(τ 0 − y i ) − 1 ⎤
1
ln ⎢
⎥.
T j − Ti ⎢⎣ exp(τ 0 − y j ) − 1⎥⎦
In general i and j should be more widely spaced rather than otherwise, to lead
to stable estimates.
(iii)
From the ith equation above we can evaluate
β 0 = exp(κ 0Ti )[exp(τ 0 − y i ) − 1]
120
(iv)
Substitution of τ0 = ymax in the two foregoing equations provides us with
values for κ0 and β0
Initial estimates for fitting the model in equation (4.4) can be similarly
obtained in the order α0, κ0 , β0 from the equations below;
α0 = ymax,
⎡ (α − wi ) ⎤
β0 = ⎢ 0
⎥ exp(κ 0Ti ) ,
⎦
⎣ α0
κ0 =
⎡α 0 − w j ⎤
1
ln ⎢
⎥,
Ti − T j ⎣ α o − wi ⎦
where yi = ln(wi) and wi are growth observations, have constant variance is usually a
sensible one for the case of growing yield. In fitting logistic growth model to oil
palm yield growth data, we set α0 = 27.7, β0 = 0.55 and κ0 = 1.25 as the initial
values. The convergence criterion met at the 20th iteration, which led α0 = 37.0806,
β0 = 4.8148 and κ0 = 0.7817. The above steps ((i) to (iv)) were also applied to other
growth models in determining the initial values.
4.5
RESULTS AND DISCUSSION
The statistical significance of the parameters of the nonlinear models were
determined by evaluating the 95% asymptotic confidence intervals of the estimated
parameters. The null hypothesis H0: αj = 0 was rejected when the 95% asymptotic
confidence interval of αj did not include zero. The 95% asymptotic confidence
intervals for each growth model are presented in the last column(s) of Table 4.5 and
Table 4.6.
121
Table 4.5: Parameter estimates of the logistic, Gompertz, negative exponential,
monomolecular, log-logistic, Richard’s and Weibull growth model for yield-age
relationship
Model parameter
Logistic
Parameter
estimated
Asymptotic
standard error
37.0806
4.8149
0.7817
0.5327
1.3115
0.1087
35.9514
2.0345
0.5511
38.2098
7.5952
1.0122
37.1788
2.2683
0.6132
0.5701
0.4265
0.0854
35.9703
1.3642
0.4321
38.3874
3.1724
0.7943
37.5017
0.4046
0.6643
0.0362
36.1001
0.3282
38.9033
0.4811
37.3235
1.1408
0.4592
0.6565
0.1367
0.0689
35.9317
0.8511
0.3130
38.7151
1.4305
0.6055
α0
α1
α2
38.1172
3.1947
1.8874
1.0667
0.8678
0.3245
35.8559
1.3549
1.1995
40.3785
5.0344
2.5754
α0
α1
α2
α3
37.0418
11.0433
0.8729
1.5205
0.5698
33.6452
0.4059
2.0391
37.0418
-60.6695
0.0076
-2.8257
38.2564
82.7561
1.7383
5.8667
α0
α1
α2
α3
37.3234
-5.2452
0.3415
1.3442
0.8887
8.9982
0.0906
0.0014
35.4291
-24.4245
0.1483
1.3411
39.2178
13.9339
0.5347
1.3472
α0
α1
α2
Asymptotic confidence
interval
lower
upper
Gompertz
α0
α1
α2
Negative Exponential
α0
α1
α2
Monomolecular
α0
α1
α2
Log-Logistic
Richard’s
Weibull
The least squares estimates for the parameters of the nonlinear models for oil
palm yield-age relationship are given in Table 4.5 and Table 4.6. The parameter
estimates for the logistic, Gompertz, negative exponential, monomolecular and
Morgan-Mercer-Flodin growth functions and are all statistically significant at the 5%
level. Estimates of α1 and α3 for the von Bertalanffy, Richard’s and Chapman-
122
Richard growths model are not statistically significant at the 5% level, because zero
value is included in the confidence interval of the parameters estimated.
The parameter estimates of the Weibull and Stannard growth models, except
for α1, are statistically significant at 5% level. The convergence criteria meet using
the Marquardt iteration procedure and all growth models have various number of
iteration. The minimum iteration is 8 for the negative exponential growth model
however Chapman-Richard growth model recorded the highest iteration at 43
iterations. (Appendix F).
Table 4.6: Parameter estimates of the MMF, von Bertalanffy, Chapman-Richard and
Stannard growth models for yield-age relationship
Model parameter
Parameter
estimated
Asymptotic
standard error
Asymptotic confidence
interval
lower
upper
37.2032
11.5236
0.3534
3.4347
0.6724
2.5198
0.0355
0.8877
35.7700
6.1525
0.2776
1.5425
38.6365
16.8943
0.4292
5.3270
37.0416
-0.0455
0.8731
2.5203
0.5698
0.1979
0.4058
2.0388
35.8270
-0.4673
0.0080
-1.8254
38.2562
0.3763
1.7382
6.8661
α0
α1
α2
α3
35.8502
0.4927
0.4488
0.6155
0.8162
2.3322
0.1449
2.8893
34.1106
-4.4783
0.1397
-5.5430
37.5898
5.4637
0.7579
6.7740
α0
α1
α2
α3
37.0415
-1.5799
0.5743
0.6577
0.56598
0.2544
0.5236
0.8825
35.8269
-2.1222
-0.5417
-1.2232
38.2561
-1.0376
1.6904
2.5388
Morgan-Mercer-Flodin
α0
α1
α2
α3
Von Bertalanffy
α0
α1
α2
α3
Chapman-Richard
(without initial stage)
Stannard
123
Table 4.7: Asymptotic correlation for each nonlinear growth model fitted
Model
Asymptotic correlation
Logistic
(α0, α1) = -0.1743; (α0, α2) = -0.3631; (α1, α2) = 0.8863
Gompertz
(α0, α1) = -0.2324; (α0, α2) = -0.4398; (α1, α2) = 0.8675
Von Bertalanffy
(α0, α1) = -0.2911; (α0, α2) = -0.3891; (α0, α3) = -0.3073;
(α1, α2) = 0.9248; (α1, α3) = 0.9970; (α2, α3) = 0.9496.
Negative exponential
(α0, α2) = -0.5911
Monomolecular
(α0, α1) = -0.3532; (α0, α2) = -0.5552; (α1, α2) = 0.8536
Log-logistic
(α0, α1) = -0.3162; (α0, α2) = -0.7245; (α1, α2) = 0.7799
Richard’s
(α0, α1) = -0.3212; (α0, α2) = -0.3892; (α0, α3) = -0.3072;
(α1, α2) = 0.9752; (α1, α3) = 0.9937; (α2, α3) = 0.9495.
Weibull
(α0, α1) = -0.3766; (α0, α2) = 0.2781; (α0, α3) = -0.9999;
(α1, α2) = -0.9475; (α1, α3) = 0.3763; (α2, α3) = -0.2778.
Morgan-Mercer-Flodin
(α0, α1) = -0.2426; (α0, α2) = -0.0015; (α0, α3) = -0.5212;
(α1, α2) = -0.7558; (α1, α3) = 0.6085; (α2, α3) = -0.5218.
Chapman-Richard
(α0, α1) = -0.7459; (α0, α2) = 0.4445; (α0, α3) = -0.6289;
(α1, α2) = -0.6471; (α1, α3) = 0.9104; (α2, α3) = -0.2844.
Stannard
(α0, α1) = -0.0364; (α0, α2) = 0.2542; (α0, α3) = 0.3077;
(α1, α2) = -0.6204; (α1, α3) = -0.5031; (α2, α3) = 0.9871.
Asymptotic correlations are used to measure the correlation among the
parameter estimated. In growth model, the parameters estimated are assumed not
correlated (Draper and Smith, 1981; Ratkowsky, 1983). If the asymptotic correlation
very high (near to plus and minus one), so that the models are not suitable for
modelling the growth data. The asymptotic correlation of the parameters was
124
obtained after the iteration was converged. Table 4.7 presents the asymptotic
correlation coefficient among the parameters estimated. All asymptotic correlation
coefficients are relatively small, except for the von Bertalanffy’s growth model {(α1,
α2)=0.9248; (α1, α3)=0.9970; (α2, α3)=0.9496}, the Richard’s growth model {(α1,
α2) =0.9752; (α1, α3) =0.9937; (α2, α3)=0.9495} and the Weibull’s growth model
{(α0, α3)=-0.9999; (α1, α2)=-0.9475}.
When nonlinear models are fitted to a biological growth data set, the estimated
parameters’ lack of statistical significance might imply one of the following:
(i)
One or more parameters in the model may not be useful, or more
accurately, a reparameterised model involving fewer parameters might be
more appropriate;
(ii)
The biological growth data used for fitting the model is not adequate for
estimating all the parameters; or
(iii)
The model assumptions do not conform with the modeled biological
system.
125
Table 4.8: The actual and predicted values of FFB yield, the associated measurement
error and correlation coefficient between the actual and predicted values for Logistic,
Gompertz, von Bertalanffy, negative exponential, mono molecular and log-logistic
growth models
FFB
Year
yield
Logistic
Gom
Von Ber-
Negative
Mono-
Log
pertz
talanffy
Exponential
molecular Logistic
1
11.78
11.58
10.88
11.91
12.48
10.42
9.09
2
18.43
18.46
19.11
18.28
20.81
20.33
20.46
3
25.21
25.37
25.93
25.12
26.36
26.59
27.19
4
30.78
30.62
30.59
30.61
30.07
30.54
30.90
5
33.03
33.81
33.45
33.98
32.54
33.04
33.05
6
35.66
35.51
35.11
35.68
34.19
34.62
34.38
7
36.96
36.35
36.04
36.46
35.29
35.61
35.26
8
37.97
36.74
36.56
36.79
36.03
36.24
35.86
9
38.04
36.92
36.84
36.94
36.52
36.64
36.28
10
39.20
37.01
37.00
37.00
36.85
36.89
36.60
11
36.50
37.05
37.08
37.02
37.06
37.05
36.84
12
37.21
37.07
37.13
37.03
37.21
37.15
37.03
13
39.97
37.07
37.15
37.04
37.31
37.21
37.18
14
38.45
37.08
37.16
37.04
37.37
37.25
37.30
15
33.65
37.08
37.17
37.04
37.42
37.28
37.40
16
34.71
37.08
37.17
37.04
37.44
37.30
37.48
17
37.75
37.08
37.18
37.04
37.46
37.31
37.55
18
32.81
37.08
37.18
37.04
37.48
37.31
37.60
19
37.99
37.08
37.18
37.04
37.48
37.32
37.65
MSE
2.96
3.15
2.94
4.06
3.72
4.64
MAE
1.22
1.35
1.22
1.61
1.53
1.72
MAPE
0.03
0.04
0.03
0.05
0.05
0.06
r
0.97
0.97
0.97
0.96
0.96
0.96
126
Table 4.9: The actual and predicted values of FFB yield, the associated measurement
error and correlation coefficient between the actual and predicted values for
Richard’s , Weibull, MMF, Chapman-Richard, Chapman-Richard* (with initial) and
Stannard growth models
FFB
Chapman-
Chapman-
Year
yield
Richard’s
Weibull
MMF
Richard
Richard*
Stannard
0
0
-
-
-
-
0.92
-
1
11.78
11.91
10.43
12.23
13.43
10.05
11.91
2
18.43
18.28
20.33
17.51
20.01
19.51
18.28
3
25.21
25.12
26.59
25.65
25.09
26.33
25.12
4
30.78
30.61
30.54
31.21
28.71
30.71
30.61
5
33.03
33.98
33.04
34.02
31.18
33.38
33.98
6
35.66
35.68
34.61
35.40
32.82
34.97
35.68
7
36.96
36.46
35.61
36.11
33.90
35.91
36.46
8
37.97
36.79
36.24
36.50
34.60
36.45
36.79
9
38.04
36.94
36.64
36.73
35.05
36.77
36.94
10
39.20
37.00
36.89
36.87
35.34
36.95
37.00
11
36.50
37.02
37.05
36.96
35.52
37.06
37.02
12
37.21
37.03
37.15
37.02
35.64
37.12
37.03
13
39.97
37.04
37.21
37.07
35.72
37.16
37.04
14
38.45
37.04
37.25
37.10
35.76
37.18
37.04
15
33.65
37.04
37.28
37.12
35.80
37.19
37.04
16
34.71
37.04
37.30
37.14
35.82
37.19
37.04
17
37.75
37.04
37.31
37.15
35.83
37.20
37.04
18
32.81
37.04
37.31
37.16
35.84
37.20
37.04
19
37.99
37.04
37.32
37.17
35.84
37.20
37.04
MSE
2.94
3.72
3.20
6.19
3.41
2.94
MAE
1.22
1.53
1.37
2.27
1.45
1.22
MAPE
0.03
0.05
0.04
0.07
0.05
0.03
r
0.97
0.96
0.97
0.96
0.97
0.97
127
4.00
4.00
3.00
3.00
2.00
2.00
1.00
1.00
0.00
-1.00 0
5
10
15
0.00
-1.00 0
20
-2.00
-2.00
-3.00
-3.00
-4.00
-4.00
-5.00
5
10
15
20
Y ear
Logistic
Gompertz
4.00
3.00
4.00
3.00
2.00
1.00
2.00
1.00
0.00
5
10
15
20
-2.00
0.00
-1.00 0
5
10
-2.00
-3.00
-3.00
-4.00
-4.00
-5.00
-6.00
-5.00
Y ear
Y ear
Von Bertalanffy
Negative exponential
4.00
4.00
3.00
3.00
2.00
2.00
1.00
1.00
0.00
0.00
-1.00 0
20
-5.00
Y ear
-1.00 0
15
5
10
15
20
-1.00 0
5
10
15
-2.00
-2.00
-3.00
-3.00
-4.00
-4.00
-5.00
-6.00
-5.00
Ye a r
Y ear
Monomolecular
Log logistic
Figure 4.1: Residual plot for the Logistic, Gompertz, von Bertalanffy, Negative
exponential, Monomolecular and Log-logistic growth models
20
128
4.00
4.00
3.00
R e s id u a l
2.00
2.00
1.00
0.00
0.00
0
-2.00
5
10
15
20
-1.00 0
5
10
15
20
-2.00
-4.00
-3.00
-4.00
-6.00
-5.00
Year
Ye a r
Richard’s
Weibull
4.00
6.00
3.00
2.00
4.00
1.00
2.00
0.00
-1.00 0
5
10
15
20
0.00
-2.00
0
-3.00
5
10
15
20
-2.00
-4.00
-5.00
-4.00
Ye a r
Ye a r
Morgan-Mercer-Flodin
Chapman-Richard
4.00
4.00
3.00
3.00
2.00
2.00
1.00
1.00
0.00
0.00
-1.00 0
5
10
15
20
-1.00 0
-2.00
-2.00
-3.00
-3.00
-4.00
-4.00
-5.00
5
10
15
-5.00
Ye a r
Chapman-Richard*
Ye a r
Stannard
Figure 4.2: Residual plot for the Richard’s, Weibull, Morgan-Mercer-Flodin,
Chapman-Richard, Chapman-Richard* and Stannard growth models
20
129
The argument in (ii) is also applicable with the von Bertalanffy and the
Chapman-Richard growth models. The investigation of the differential forms and
second derivatives of the von Bertalanffy and the Chapman-Richard models
indicated that the functions are suitable to model a system that encompasses the
entire range of cycles (i.e. juvenile, adolescent, mature and senescent stages) of the
biological response variable.
However, the FFB yield growth measurements considered in this study lack
data on juvenile stages of growth. Hence, two of the parameters from the two
models rendered insignificant. To support this argument we have included an initial
data point (age = 0, FFB = 0) to the data and refitted the von Bertalanffy and the
Chapman-Richard models for each parameter from each model. Table 4.10 shows
the parameter estimates, asymptotic standard error and asymptotic 95% confidence
intervals for each parameter of these two models. The predicted FFB value at initial
age using Chapman-Richard model (with initial value) is 0.92 ton per hectare per
year.
Two of the parameters (α1 and α3) are statistically insignificant (based on
asymptotic confidence interval of parameters estimated) if the initial data point is not
included (Table 4.6). However, inclusion of the initial data point caused only the
Chapman-Richard growth model to show statistically significant estimates of the
three parameters. Meanwhile, the von Bertalanffy growth model did not show any
improvement. Including of any additional data points from an early stage of growth
will result in a significant improvement in the estimate of the parameters of the
Chapman-Richard model.
Table 4.9 also indicated that with initial values, the MAPE was reduced from
0.07 to 0.05. This clearly illustrates that the significance of the parameters of the
Chapman-Richard growth model depends on the range of the growth data.
130
Table 4.10: The parameter estimates an asymptotic correlation for the von
Bertalanffy and Chapman-Richard models when an initial growth response data point
is added
Model and parameter
Parameter
estimated
Asymptotic
standard error
Asymptotic confidence
interval
lower
upper
Von Bertalanffy
α0
α1
α2
α3
37.2017
5.5385
0.5498
0.4826
0.6140
9.5941
0.1238
0.3959
39.9001
-14.7999
0.2873
-0.3566
37.2036
0.8530
0.5498
0.4822
0.6256
0.3581
0.1265
0.1018
35.8773
0.0937
0.2816
0.3695
38. 5033
25.8769
0.8124
1.3218
Chapman-Richard
α0
α1
α2
α3
Model
Von Bertalanffy
38.5298
1.6122
0.8181
1.3340
Asymptotic correlation
(α0, α1) = 0.1127; (α0, α2) =-0.4229 ; (α0, α3) = -0.2532 ;
(α1, α2) = -0.8174; (α1, α3) = -0.9887; (α2, α3) = 0.8428.
Chapman-Richard
(α0, α1) = 0.2264; (α0, α2) = -0.4879; (α0, α3) = -0.2988;
(α1, α2) = -0.6940; (α1, α3) = -0.9571; (α2, α3) = 0.8516.
This study provide the statistical requirement for estimating parameters of
nonlinear growth models, the statistical testing for the parameters estimate and
interpretation of relevant statistical output from the perspective of oil palm. The
NLIN procedure in SAS does not guarantee that the iteration converges to a global
minimum sum of squares residual (SAS, 1992). Hence, an alternative approach for
avoiding the problem of non-convergence or convergence to an unwanted local
minimum sum of squares residual is to specify values for each parameter. NLIN
procedure then evaluates the residual of the sum of squares at each combination of
values to determine the best starting values for the iteration process. Initial values
may be intelligent guesses or preliminary estimates based on available information.
Initial values may, for example, be values suggested by the information gained in
131
fitting a similar equation in a different place. Based on a meaningful biological
definition of the parameters of the nonlinear models, expressions to specify initial
values for the asymptote and the biological constant were developed. These
expressions were found useful to specify initial values of the parameters for
modeling the sample FFB yield data used in the study.
4.6
CONCLUSION
It is important to note that some of the models such as the negative
exponential and monomolecular have no point of inflection (that is, there is no
change in sign of the second derivative for any T, and it climbs steadily at a
decreasing rate) and are not of sigmoid shape (Draper and Smith, 1981). Hence, the
models are not appropriate for modelling the entire life cycle range of biological
response variables such as oil palm yield growth that exhibit a sigmoid pattern over
time (reason (iii)) as in the previous section). This study found that the Gompertz,
logistic, log-logistic, Morgan-Mercer-Flodin and Chapman-Richard growth models
have the ability and are suitable for quantifying a growth phenomenon that exhibits a
sigmoid pattern over time (Draper and Smith, 1981; Ratkowsky, 1983).
Based on the statistical testing and root mean squares error (Table 4.11), the
model in the first rank is the Logistic model, the second rank is the Gompertz model,
followed by the Morgan-Mercer-Flodin, the Chapman-Richard (with initial stage)
and the Log-logistic growth models. However, the von Bertalanffy, Richard’s,
Weibull and Stannard growth models were found not statistically significance to fit
the oil palm yield growth data.
132
Table 4.11: The number of iteration and the root mean squares error for nonlinear
growth models consider in this study
Model
Number of
Root Mean squares error
iterative
Logistic
20
1.7204
Gompertz
22
1.7748
Von Bertalanffy
36
1.7146
Negative exponential
8
2.0149
Monomolecular
26
1.9287
Log logistics
22
2.1541
Richard’s
26
1.7146
Weibull
18
1.9287
Morgan-Mercer-Flodin
21
1.7888
Chapman-Richards
34
2.4879
Chapman-Richards (initial
43
1.8466
42
1.7146
stage)
Stannard
133
CHAPTER 5
MODELLING OIL PALM YIELD USING MULTIPLE LINEAR
REGRESSION AND ROBUST M-REGRESSION
5.1
INTRODUCTION
Multiple linear regression is being used widely in research in determining the
linear relationship between factors. Earlier researches believe that foliar nutrient
composition has a significant correlation with oil palm yield, but the data used has
not been analyzed in detail using the proposed method. We consider the multiple
linear regression method as one way to understand the relationship between foliar
nutrient composition and oil palm yield. In this study, we also proposed the use of
the foliar nutrient composition ratio (NBR) in leaves as the independent variable.
This modelling approach act as the preliminary study which will further enhanced
understanding of the issues in further motivate research in modelling of oil palm
yield.
5.2
MODEL DEVELOPMENT
Let us consider that the data consists of n observations on a dependent or
endogenous variable y and five independent or exogenous variables N, P, K, Ca and
Mg. The observations are usually represented as follows;
134
Obs. No.
y
N
P
K
Ca
Mg
1
y1
N11
P21
K31
Ca41
Mg51
2
y2
N12
P22
K32
Ca42
Mg52
3
y3
N13
P23
K33
Ca43
Mg53
:
:
:
:
:
:
:
:
:
:
n
yn
N1n
:
P2n
:
K3n
:
Ca4n
:
Mg5n
The relationship between the dependent and independent variables are
formulated as a linear model;
yi = θ0 + θ1N1i + θ2P2i + θ3K3i + θ4Ca4i + θ5Mg5i + εI
(5.1)
where θ0, θ1, θ2, θ3,θ4, θ5 are regression coefficients and εi is the random
disturbance. It is assumed that for any set of fixed values of independent variables
that fall within the range of the data, the linear equation (5.1) provides an acceptable
approximation of the true relationship between the dependent and independent
variables. The θ’s are estimated by minimizing the sum of squares error as in
equation (3.11) and equation (3.12). Then, the estimated value of θ’s can be
obtained using equation (3.13). The standard procedure in regression analysis such
as confidence interval of parameter estimated, normal probability plot and error
distribution was then undertaken as mentioned in Chapter 3.
In Chapter 1, we discussed the importance of the nutrient balance ratio (NBR)
in the foliar composition to the FFB production. Thus, we proposed to use NBR,
critical leaf phosphorus concentration, CLP, K deficiency and Mg deficiency as the
independent variables to estimate the FFB production using multiple regression
analysis.
Then we consider the regression equation as follows;
yi = θ0 + θ1N1i + θ2P2i + θ3K3i + θ4Ca4i + θ5Mg5i + θ6N-P6i + θ7N-K7i +θ8N-Ca +
θ9N-Mg9i + θ10P-K10i +θ11P-Ca11i + θ12P-Mg12i + θ13K-Ca13i + θ14K-Mg14i + θ15CaMg15i + θ16defK16i + θ17 defMg17i + θ18CLP18i + θ19TLB19i +εI
(5.2)
135
for i = 1, 2, …, n, where θ0, θ1, …, θ19 are regression coefficients and εi is the
random disturbance, N-P is the ratio between N and P, N-K is the ratio between N
and K and so on; defK is the deficiency of K in the leaf, defMg is the deficiency of
Mg in the leaf, CLP is the critical leaf P concentration and TLB is the total amount of
bases in the leaf. Then we apply the stepwise procedure in regression to select the
significant independent variables in the regression model. There are nineteen
independent variables were considered in equation (5.2).
In the next stage we analyze the same data set using the robust M-regression.
Here we also consider equation (5.1). We perform the analysis using S-Plus 2000
under the robust MM procedure. The details of the robust M-regression are given in
Chapter 3.
5.3
RESULTS AND DISCUSSION
The results are presented into two sections; the first section will discuss the
result from the MLR analysis, and the results from the robust M-regression will be
discussed in the second section. The residual analysis was also performed in this
section, and finally we conducted a comparative study between the MLR and RMR
performance.
5.3.1 Multiple linear regression (MLR)
As mentioned earlier, the first section introduced the use of the major foliar
nutrient composition, N, P, K, Ca and Mg in the model. The stepwise procedure in
MLR was applied in order to select the significant variables in the model. Table 5.1
lists the summarized regression equation for each station for the inland and coastal
areas.
136
The coefficient of determination is a measure of how well the explanatory
variables explain the response variable (Birkes and Dodge, 1993). By considering
the R2 value as the indicator to the fit of the model, all stations recorded quite a low
value of R2. For the inland area, the highest R2 is 0.422, recorded at ILD2, and the
lowest was recorded at station ILD7, was 0.118. This indicates that 42.20% of the
variability of the FFB yield about their predicted values is explained by the linear
relationship between the FFB yield and the foliar nutrient composition at ILD2
station. The regression equation for station ILD2 is given as
FFB = 5.707 - 53.642*Mg + 201.609*P - 6.298*K + 3.039*N
There is a positive relationship between FFB and N and P concentration, and
negative relationship between FFB yield and K and Mg concentration. The
coefficient of P concentration recorded the highest value; it means that P
concentration has a great influenced to the FFB yield.
Meanwhile, for the coastal area, the highest R2 value, 0.687 was recorded at
station CLD3 and the lowest value, 0.043 was recorded at station CLD6. This
indicates that 68.70% of the variability of the FFB yield about their predicted values
is explained by the linear relationship between the FFB yield and the foliar nutrient
composition at CLD3 station. The regression equation for CLD3 station is written as
below:
FFB = -9.901 + 17.664*N - 18.550*Mg.
There is a positive relationship between FFB yield and N concentration, but
negative relationship with Mg concentration. The similar interpretation can be made
for other stations. Detailed results of MLR can be referred to in Appendix G and
Appendix H.
137
Table 5.1: The regression equation and R2 values for inland and coastal areas
Inland Stations
Station
R2
Regression equation
ILD1
-20.311 + 342.077*P - 14.697*Ca
0.392
ILD2
5.707 - 53.642*Mg + 201.609*P - 6.298*K + 3.039*N
0.422
ILD3
-30.657 + 21.679*N + 7.863*Ca - 7.738*K + 15.727*Mg
0.404
ILD4
-5.806 + 8.207*N + 11.621*Ca + 41.336*P - 4.932*K -
0.185
7.476*Mg
ILD5
-15.529 + 12.466*N + 4.321*K
0.400
ILD6
26.527 - 14.099*Ca + 286.344*P – 14.127*N
0.317
ILD7
7.008 + 7.131*N - 37.198*P + 5.650*Ca
0.118
ILDT
-1.007 + 6.782*N + 48.554*P - 11.788*Mg + 4.130*Ca
0.148
Coastal Stations
CLD1
-44.721 + 30.526*K + 71.737*Mg + 198.541*P - 11.440*Ca
0.380
CLD2
0.189 + 22.750*Mg + 10.443*N - 7.509*Ca
0.171
CLD3
-9.901 + 17.664*N - 18.550*Mg
0.687
CLD4
18.262 + 12.794*Ca
0.050
CLD5
16.528 + 11.367*Ca - 10.294*N + 185.701*P
0.111
CLD6
40.988 - 12.548*Ca - 3.330*N
0.043
ILD7
6.810 + 66.648*P + 8.788*K - 25.001*Mg + 4.804*N
0.231
CLDT
8.998 + 128.949*P - 6.424*Mg
0.044
For station ILD1, the P and Ca concentration was found significantly affected
by FFB yield. Only two stations recorded all the foliar nutrients’ concentration,
which was included in the model for the inland areas. However there is no station in
the coastal area, which was significantly affected by the concentration of all the
nutrients. In general, the response of the foliar nutrient composition to FFB yield in
the inland areas is better than in the coastal areas. It shows that the composition of
the foliar nutrients created a greater influence and impact in the inland area than in
the coastal area. We found that there was no consistent variable in the regression
model; for example the variable which is statistically significant in station ILD1 is
not significant at the other stations. The reason for different foliar nutrient
138
compositions in the model is very difficult to ascertain, but we believe it is most
probably due to the soil factor.
5.3.2 Residual Analysis for MLR
Least squares tests and estimates are optimal if the population of errors can be
assumed to have a normal distribution. If the normality assumption is not satisfied,
then least squares procedures are still valid but they may be far from optimal.
Residual analysis was performed to investigate the distribution of error modelling.
We found that the error distribution for all stations were scattered within the error
mean. The normal probability plot was also performed in this study and the results
are shown in Appendix I for the inland area and Appendix J for the coastal area. A
number of plots and tests, based on residuals, have been developed for checking the
normality of the errors, but here we mention only the normal probability plot. The
standardised residuals are put an increasing order and are plotted against what their
expected values would be if they came from a sample of n independent standard
normal random variables. The plot should look nearly linear if the assumption of
normality is valid. When a normal probability plot is very nonlinear, the data can
sometimes be transformed so that normality is more closely approximated. .
Figure 5.1 and Figure 5.2 display the scatter plot of error distribution for both
inland and coastal stations. For ILD1 station, the errors scattered randomly within
the mean. The normal probability plot showed that the line is linear (Appendix I).
So, these procedures were performed to ensure that the normality assumption of
errors is followed. Then, using the same procedure, others were obtained and found
that all the plots have straight lines which clearly indicated that the error is normally
distributed.
15
15
10
10
5
5
0
0
50
100
150
200
250
300
Error
Error
139
0
0
-5
-5
-1 0
-1 0
-1 5
100
200
300
500
600
-1 5
O b se r v a ti o n
O b se r v a ti o n
ILD1
ILD2
15
20
10
15
5
10
0
0
50
100
150
200
250
300
350
Error
Error
400
5
0
-5
0
100
200
300
400
500
600
700
-5
-1 0
-1 0
-1 5
-1 5
O b se r v a ti o n
O b se r v a ti o n
ILD4
15
15
10
10
5
5
0
0
50
100
150
200
250
Error
Error
ILD3
0
0
-5
-5
-1 0
-1 0
50
100
150
200
-1 5
-1 5
O b se r v a ti o n
O b se r v a ti o n
ILD5
ILD6
20
25
15
20
15
10
Error
Error
10
5
0
0
100
200
300
-5
400
500
600
5
0
-5
0
500
1000
1500
2000
2500
-10
-10
-15
-15
-20
O b se rva tio n
O b se rva tio n
ILD7
ILDT
Figure 5.3: The error distribution plots of the RMR model for the inland stations
3000
140
20
10
15
5
10
0
0
-5
0
50
100
150
200
250
Error
Error
5
0
50
100
150
200
250
-5
-1 0
-1 0
-1 5
-2 0
-1 5
O b se r v a ti o n
O b se r v a ti o n
CLD2
10
15
8
10
6
5
4
0
2
-5
Error
Error
CLD1
0
-2
0
20
40
60
80
0
50
100
15 0
2 50
300
-1 5
100
-2 0
-4
-2 5
-6
-3 0
-8
O b se rva tio n
O b se rv a tio n
CLD4
CLD3
15
15
10
10
5
5
0
0
50
100
150
200
250
Error
Error
20 0
-1 0
0
0
-5
-5
-10
-10
-15
100
200
300
400
500
60 0
-15
O b se rva tio n
O b se rva tio n
CLD5
CLD6
10
15
10
5
5
0
100
200
300
-5
400
500
600
0
Error
Error
0
-5
0
500
1000
1500
2000
-1 0
-1 0
-1 5
-1 5
-2 0
-2 0
-2 5
O b se rv a ti o n
CLD7
O b se rv a ti o n
CLDT
Figure 5.4: The error distribution plots of the RMR model for the coastal stations
2500
141
The second stage in this study is applying the MLR to the major nutrient
compositions (MNC) and the nutrient balance ratio (NBR) as independent variables.
The details of the F test for the model and the t tests for individual test are displayed
in Appendix K and Appendix L. The regression equation and the R2 value for model
fitting are displayed in Table 5.2. For stations ILD1 and CLD4, the variables in the
models were not changed, but the R2 values had decreased from 0.392 to 0.379 and
from 0.050 to 0.035. In station ILD2, the variables included in the model were
changed from N, P, K and Mg to N, K, N-P ratio and N-Mg ratio. The R2 recorded
at station ILD3 is 0.478. This indicates that 47.80% of the variability of the FFB
yield about their predicted values is explained by the linear relationship between the
FFB yield and N, K, N-P ratio, N-K ration and K-Mg ratio.
The deficiency of K and Mg in foliar nutrient composition is examined
significantly in the regression model at station ILD4, which recorded R2 value at
0.215. This means that the deficiency in K and Mg will reflect in the FFB yield
production. Only two stations significantly recorded the effects of TLB to FFB yield
specifically at station CLD3 and CLD6. Critical leaf phosphorus (CLP) displayed a
positive response to FFB yield, except for station ILD7; in terms of the contribution,
TLB has the largest weight compared to the other variables.
Generally, by introducing the NBR in the regression model, only a slight
improvement can be achieved in model fitting. On the other hand, we can say that
the NBR information does not give us more information for the purpose of
improving the modelling whereas the MNC information is sufficient for this purpose
due to the complex interpretation of the NBR.
142
Table 5.2: The regression equation for inland and coastal stations using MNC and
NBR as independent variables
Inland Station
Station
Regression equation
R2
ILD1
-20.311 + 342.077*P - 14.697*Ca
0.379
ILD2
4.150 + 130.174*P - 4.285*K - 0.952*N-P + 1.990*N-Mg
0.433
ILD3
19.393 + 35.888*N - 41.774*K + 0.712*N-P - 20.692*N-K -
0.478
0.484*K-Mg
ILD4
-23.297 + 234.930*CLP - 14.829*K-Ca – 0.706*N-P +
0.215
1.019*Def-K + 0.347*Def-Mg
ILD5
262.286 - 2621.417*P - 27.160*N-P + 3724.885*CLP +
0.474
5.735*Mg-Ca
ILD6
64.016 - 3.054*N-P + 3.013*N-Ca
0.325
ILD7
10.776 - 43.857*P- 33.313*Mg + 188.723*CLP - 0.551*N-Ca -
0.132
0.236*N-Mg
ILDT
7.622 + 8.214*N + 42.607*P - 24.641*Mg - 0.254*N-Mg -
0.168
0.591*N-Ca
Coastal Station
CLD1
-75.969 + 185.968*P + 47.134*K +1.358*Def-Mg –
0.388
19.911*Mg-Ca
CLD2
-18.975 + 257.755*CLP + 0.338*Def-Mg - 1.589*N-K
0.181
CLD3
-18.173 + 356.146*CLP – 0.152*TLB
0.676
CLD4
18.262 + 12.794*Ca
0.035
CLD5
46.292 + 11.528*Ca – 1.658*N-P
0.108
CLD6
36.039 – 0.137*TLB
0.031
CLD7
-0.132 + 3.587*Def-K + 529.187*P + 4.341*N-P - 32.683*K-Ca
0.315
- 15.947*K-Mg - 2.769*Def-Mg + 84.381*Mg-Ca 530.956*CLP
CLDT
-64.453 - 24.419*N + 701.608*P - 63.647*Mg + 5.148*N-P 2.371*N-Mg
0.094
143
5.3.3 Robust M-Regression (RMR)
The purpose of introducing robust M-regression in this study is to improve
modelling accuracy. The quantile-quantile plot in Appendix M and Appendix N
prove the existing of outlier observation in the data set. Barnett and Lewis (1996)
defined an outlier in a set of data to be an observation (or set of observations) which
appears to be inconsistent with the remainder of that set of data. Table 5.3 shows the
results of the M-regression for all stations. The presence of outliers in the data set
had influence on the model fitting and therefore the overall performance. By using
the RMR, we found that the inland stations gave quite a high overall performance
when compared to coastal areas. This result is similar to that of the MLR regression.
The highest R2 value was recorded at station ILD2 (0.598), followed by station ILD1
(0.571), and the lowest R2 value recorded was 0.127 at station ILD7. The regression
equation for ILD1 can be written as below;
FFB = -16.790 + 331.546*P - 5.466*K - 19.296*Ca
This indicates that the regression coefficients of P, K and Ca concentration
are 331.546, -5.466 and -19.296, respectively. It shows that P concentration give the
great influenced to the FFB yield. The R2 value corresponds to the variance
explained by the independent variables in the model. For example, the concentration
of P, Ca and Mg explain about 59.80% of the variance in the model and the rest can
be are explained by unobserved variables at station ILD2. The regression line for
ILD2 is;
FFB = -5.5279 + 329.027*P – 6.7802*Ca –31.283*Mg
As in ILD1 station, the P concentration also recorded the highest regression
coefficient compared to Ca and Mg. It means that P concentration is the most
important nutrient compared to other nutrient in order to produce FFB yield.
5.3.4 Residual Analysis for RMR
As in MLR we also investigate the distribution of the error. Figure 5.3
displays the scattered plot for inland area. In general, the error was normally
distributed. The distribution error plot for coastal area is shown in Figure 5.4. Using
144
the same procedure as in section 5.3.2 the explanations were obtained and found that
the errors are normally distributed. The Q-Q plots for inland and coastal stations are
presented in Appendix M and N. The Q-Q plot was performed for the outlier’s
detection in the data set. The plot should look nearly linear if the assumption of
normality is valid. All stations showed the plot nearly linear except for station
CLD4, therefore further research should be made to investigate the problem. Those
plots are not much different with the plot in MLR. Thus conclusion can be made that
the model fitted is valid.
Table 5.3: Regression equation using robust M-regression for inland and coastal
areas
Inland Station
Station
Regression equation
R2
ILD1
-16.790 + 331.546*P - 5.466*K - 19.296*Ca
0.571
ILD2
-5.5279 + 329.027*P – 6.7802*Ca –31.283*Mg
0.598
ILD3
-29.839 + 25.6112*N – 92.915*P – 865*K + 7.2804*Ca +
0.381
22.893*Mg
ILD4
-11.733 + 8.285*N + 95.255*P – 6.513*K + 10.069*Ca –
0.199
8.955*Mg
ILD5
-24.345 + 11.6264*N
0.323
ILD6
22.876 – 13.974*N + 300.489*P – 12.764*Ca
0.313
ILD7
5.947 + 7.985*N – 47.678*P + 7.487*Ca
0.127
ILDT
-7.850 + 6.115*N + 128.369*P + 7.274*Ca – 26.476*Mg
0.243
Coastal Station
CLD1
-88.643 + 47.558*K + 113.074*Mg
0.389
CLD2
-4.914 + 15.754*N + 38.759*Mg
0.307
CLD3
-0.529 + 21.546*N
0.616
CLD4
98.730 – 21.838*N + 280.158*P – 26.190*K – 7.804*Ca –
0.225
56.851*Mg
CLD5
23.527 – 12.813*N + 194.641*P + 13.364*Ca
0.115
CLD6
37.175 – 3.613*N – 12.729*Ca
0.049
CLD7
14.811 + 5.248*N + 58.013*P – 8.066*Ca
0.151
CLDT
-7.368 + 7.535*N + 113.067*P
0.140
15
15
10
10
5
5
0
0
50
100
150
200
250
300
Error
Error
145
0
0
-5
-5
-1 0
-1 0
-1 5
100
200
300
500
600
-1 5
O b se r v a ti o n
O b se r v a ti o n
ILD1
ILD2
15
20
10
15
5
10
0
0
50
100
150
200
250
300
350
Error
Error
400
5
0
-5
0
100
200
300
400
500
600
700
-5
-1 0
-1 0
-1 5
-1 5
O b se r v a ti o n
O b se r v a ti o n
ILD4
15
15
10
10
5
5
0
0
50
100
150
200
250
Error
Error
ILD3
0
0
-5
-5
-1 0
-1 0
50
100
150
200
-1 5
-1 5
O b se r v a ti o n
O b se r v a ti o n
ILD5
ILD6
20
25
15
20
15
10
Error
Error
10
5
0
0
100
200
300
-5
400
500
600
5
0
-5
0
500
1000
1500
2000
2500
-10
-10
-15
-15
-20
O bse rva tio n
O b se rva tio n
ILD7
ILDT
Figure 5.3: The error distribution plots of the RMR model for the inland stations
3000
146
20
10
15
5
10
0
0
-5
0
50
100
150
200
250
Error
Error
5
0
50
100
150
200
250
-5
-1 0
-1 0
-1 5
-2 0
-1 5
O b se r v a ti o n
O b se r v a ti o n
CLD2
10
15
8
10
6
5
4
0
2
-5
Error
Error
CLD1
0
-2
0
20
40
60
80
0
50
100
15 0
2 50
300
-1 5
100
-2 0
-4
-2 5
-6
-3 0
-8
O b se rva tio n
O b se rva tio n
CLD4
CLD3
15
15
10
10
5
5
0
0
50
100
150
200
250
Error
Error
20 0
-1 0
0
0
-5
-5
-10
-10
-15
100
200
300
400
500
600
-15
O b se rva tio n
O b se rva tio n
CLD5
CLD6
10
15
10
5
5
0
100
200
300
-5
400
500
600
0
Error
Error
0
-5
0
500
1000
1500
2000
-1 0
-1 0
-1 5
-1 5
-2 0
-2 0
-2 5
O b se rv a ti o n
CLD7
O b se rv a ti o n
CLDT
Figure 5.4: The error distribution plots of the RMR model for the coastal stations
2500
147
5.4
CONCLUSION
In this section, discussion will be focused on the performance between
MLR(MNC) model and MLR(NBC) model. From that, we can see whether
inclusion NBR, TLB, deficiency of K and deficiency of Mg as independent variables
can improve the model R2 or not. Then the performance between MLR model and
RMR will discuss.
0.6
0.5
0.4
R2
0.3
0.2
0.1
0
ILD1 ILD2 ILD3 ILD4 ILD5 ILD6 ILD7 ILDT
Stations
MLR(MNC)
MLR(NBR)
RMR
Figure 5.5: The R2 value for each model proposed for inland area
In Figure 5.5, the bar chart gave the value of R2 for different station. We
compare the R2 value between MLR (MNC), MLR (NBC) and RMR models. In
MLR (NBC) model, five out of seven inland stations recorded R2 values higher than
MLR method. It shows that by including nutrient balance ratio as independent
variables, the R2 values were increased. So that, the nutrient balance ratio also can
be used to explain the behavior of oil palm yield. The third bar in the figure
represents the R2 values for RMR method. By using RMR method, the R2 values
were increase from 0.392 to 0.571 and from 0.422 to 0.598 at station ILD1 and ILD2
148
respectively. It also recorded at the combination data set, ILDT from 0.148 to 0.243.
Thus we can deduce that RMR method manage to increase accuracy level of oil palm
estimation.
0.7
0.6
0.5
R2
0.4
0.3
0.2
0.1
0
CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 CLDT
Stations
MLR(MNC)
MLR(NBR)
RMR
Figure 5.6: The R2 value for each model proposed for coastal area
For coastal area (Figure 5.6), the result shows that MLR (MNC) model and
model MLR (NBC) model recorded approximately equal R2 values except at station
CLD7. This station recorded the increment of R2 value from 0.231 to 0.315. It can
be concluded that by entering nutrient balance ratio in the model does not effect the
model accuracy. With RMR method, R2 values recorded higher than MLR method at
five out of seven coastal stations. For example, at station CLD2 the value changes
from 0.171 to 0.307 and at station CLD4 the value changes from 0.050 to 0.225.
That’s mean RMR method can increase the ability in oil palm yield estimation.
149
This study found that statistical approach like regression model, the accuracy
in the estimation is around 80 to 85% and the error of estimation is about 15 to 20%.
This figure indicates that the accuracy percentage is quite low and the error
estimation is still high. Even though the proposed method has an ability to increase
the accuracy, must be some space we can explore to get the best model. The best
model with higher accuracy to estimate oil palm yield is the major goal of this study.
Thus we proposed to explore the most currently popular heuristic approach namely
neural network model. It is because neural networks model offers the following
useful properties and capabilities such as (i) non-linearity, an artificial neuron can be
linear or nonlinear. A neural network, made up of an interconnection of nonlinear
neurons, is itself nonlinear. (ii) input-output mapping. The network is presented with
an example picked at random from the set, and the synaptic weights (free
parameters) of the network are modified to minimize the difference between the
desired response and the actual response of the network produced by the input signal
in accordance with an appropriate statistical criterion. The training of the network is
repeated for many examples in the set until the network reaches a steady state where
there are no significant changes in the synaptic weights (Hykins, 1999).
150
CHAPTER 6
MODELLING OIL PALM YIELD USING NEURAL NETWORK MODEL
6.1
INTRODUCTION
The neural network model is a causal model which is widely used to solve
complex problems. This chapter discusses the implementation of the neural network
approach to modelling oil palm yield. The implementation of this model using
Matlab, a mathematical software, requires data preparation and calculation of the
degree of freedom which is necessary for the neural networks architecture. This
chapter also considers the various combinations of activation functions in input layer
to hidden layer and hidden layer to output layer, and the conducting of the analysis of
variance and multiple comparison using Duncan’s tests to analyse the neural
network’s performance.
Three experiments were conducted to investigate the effect of number of
runs, number of hidden nodes, learning rate, momentum terms and outliers to the
neural network’s performance. Results from this model shows that the neural
network approach produces a good outcome. A comparative study was also
conducted between multiple linear regression, robust M-regression and the neural
network models, regarding their performance in oil palm yield modelling.
151
6.2
NEURAL NETWORK PROCEDURE
The procedure to develop NN modelling requires two stages. These are
important to ensure that the results and outcomes are valid and relevant. The two
stages are data preparation and calculating the degrees of freedom.
6.2.1 Data preparation
When the raw data has been collected, it may need converting into a more
suitable format depend on the software requirements. At this stage, we should follow
two stages, i.e. data validity checks and data partitioning.
Data validity checks: Data validity checks will reveal unacceptable data that,
if retained, would produce poor results. A simple data range check is an example of
validity checking. For example, if we collected fresh fruit bunch data in tonnes per
hectare per year, we would expect values to be far than zero and do not exceed 50
tonnes/hec/year. A value of -5 or 100 for instance, is clearly wrong. If there is a
pattern to the distribution of faulty data, then the patterns cause is required to be
diagnosed. Depending on the nature of the fault, we may need to discard the data
altogether (Bansal et al., 1993).
Data partitioning: Partitioning is the process of dividing the data into training
sets, validation sets and testing sets. By definition, training sets are used to actually
update the weights in a network, validation sets are used to decide the architecture of
the network, and testing sets are used to examine the final performance of the
network. The primary concerns should be to ensure that (i) the training set contains
enough data, and a suitable data distribution to adequately demonstrate the properties
we wish the network to learn; (ii) there is no unwarranted similarity between data in
different data sets (Hornik et al., 1989).
152
6.2.2 Calculating the degrees of freedom
A parametric analysis would be impossible without a discussion of the
degrees of freedom (df) of the network. Because networks have historically been
limited to nonparametric modelling, this topic has been conspicuously absent from
the literature (Ripley, 1996). In any parametric analysis, the number of degrees of
freedom is defined as the number of observations minus the number of parameters
that are free to vary (Gujarati, 1988; Adam, 1999). If N represents the number of
observations and k the number of estimated parameters (or the weight of neural
network connectionist from input layer to hidden layer and from hidden layer to
output layer), then the degrees of freedom, df, can be calculated using
df = N – k
(6.1)
This approach to the degrees of freedom can be applied directly to the feedforward neural network if the network has only a single output. In this case, let k
represent the number of estimated parameters. These estimated parameters include
not only the connection weights that feed into the output and the output’s intercept
parameter, but also the connection weights that interconnect the hidden layers. It
also calculates the bias weights that correspond to each of the hidden layers’
transformation nodes. So, the numbers of parameters estimated in the feed-forward
neural networks with one hidden layer are calculated as
k = nh (ni + 2) + 1
(6.2)
where nh is the number of hidden node and ni is number of input nodes. A necessary
condition in any parametric model is that the number of available degrees of freedom
must be positive. This constraint imposes an upper limit on the size of the network.
If there are N observations, then the maximum size of the hidden nodes can be
calculated using
nh (max) =
N −1
ni + 2
(6.3)
As shown in Figure 6.1, the input nodes are nitrogen concentration N,
phosphorus concentration P, potassium concentration K, calcium concentration Ca
and magnesium concentration Mg and the output node is FFB yield which is
measured in tonnes per hectare per year.
153
This was needed to transform the actual data into the range (0, 1) when the
activation function was applied. Thus we use formula Zi =
( xi − xmin ) + 0.01
to
( xmax − xmin ) + 0.01
transform the input and output value (Hsu, 1993). The value of 0.01 added in the
formula is to ensure the value transformation will not zero. The actual value can
easily be transformed by manipulation from the above equation as
xi = xmin + Z i (( xmax − xmin ) + 0.01) − 0.01 . In this study, we used a randomized
procedure to avoid the site plot effects or bias (Hsu, 1993).
6.3
COMPUTER APPLICATION
In this study we ran neural networks using the neural networks toolbox in
Matlab package 6.15. Matlab’s build in procedure provides a very simple neural
networks program. The user only has to write a simple program to call the neural
networks built-in procedure. Each procedure has its own specific name. In the first
part we only considered N, P, K, Ca and Mg concentration from foliar analysis as
input nodes, and FFB yield as the output node (Figure 6.1). The number of hidden
nodes varies from one station to another because of the different number of
observations. The maximum number of hidden nodes is obtained from equation
(6.3) to ensure that the degrees of freedom of the model is always positive.
Input layer
⎧N
⎪
⎪
⎪ P
⎪
⎪
⎪K
Foliar composition ⎨
⎪
⎪ Ca
⎪
⎪
⎪Mg
⎪
⎩
Hidden layer
Output layer
O
O
O
FFB yield
O
Figure 6.1: Three layers of fully connected neural network with five input nodes and
one output node
154
In our case we consider the fully connected feed-forward neural network and
supervised neural networks because we have input and target data set as shown in
Figure 6.1. We also assume that all the inputs have a significant influence on the
production of oil palm yield. We start the network with a small number of hidden
nodes, which are added one by one until the maximum number of hidden nodes,
which is defined from equation (6.3), is reached.
The first step in training a feed-forward network is to create the network
object. The function newff creates a trainable feed-forward network (Hagan et al.,
1996). The user should determine the transfer function in the first and second layers
as presented in Chapter 3. When the transfer function was obtained and the
command newff used, the network was ready for training. Before training a feedforward network, the weights and biases must be initialized. The initial weights and
biases are created with the command init. This function takes a network object as
input and returns a network object with all weights and biases initialized. For feedforward networks, the weights’ initialization is usually set to random (rands), which
sets weights to random values between –1 and 1.
Once the network weights and biases have been initialized, the network is
ready for training. The network can be trained for function approximation, pattern
association or pattern classification. This study considers the training networks for
function approximation. The training process requires networks inputs Input and
target outputs Target. During training, network’s weights and biases are iteratively
adjusted to minimize the network performance function using mean squares error,
mse until the maximum number of hidden nodes which obtained from equation (6.3).
The training algorithm used in our study is Leverberg-Marquardt (trainlm) because
this algorithm appears to be the fastest method for training moderate-sized feedforward neural networks, and it is also very efficient (Hagan et al., 1996). An
example of the algorithm is given in Appendix O.
An early stopping technique was used to avoid overtraining (where the mse
values will constant) the neural networks and to improve generalization of the
networks. This technique requires the data to be divided into three sets of data. The
155
first set is the training set. This is used to compute the gradient and update the
network weights and biases. The second set of data is the validation set. The error
on the validation set is monitored during the training process. The validation error
will normally decrease during the initial phase of training. However, when the
network begins to overfit the data, the error on the validation set will typically begin
to rise. When the validation error increases for a specified number of iteration, the
training is stopped, and the weights and biases at the minimum of the validation error
are returned as shown in Figure 6.2. Figure 6.3 presents the MSE value for each
phase. The MSE values will decreases vigorously when the number of epoch is less
then five. It remains consistent until epoch fifteen, when the training stops. We
divided the data into 3 sets, the training set, validation set and testing set of data with
ratio 70%, 15% and 15% respectively (Zhang et al., 2001).
The correlation coefficient was used as a method of measuring the correlation
between the actual value and the predicted value. When the correlation value
approaches one, it shows that the actual and predicted values are close, and that the
model fits the data well. Let X and X̂ be the actual value and predicted value from
the specified model, and σ X2 and σ X̂2 be the variance of the actual observation and
predicted observation. X and X̂ are the mean actual observation and mean
predicted. So, the correlation coefficient between the actual and predicted values is
defined as
n
( X i − X )( Xˆ − Xˆ )
i =1
σ X2 .σ X2ˆ
r=∑
(6.4)
Matlab also provides the graphically best fitted line between the actual and target
values, as shown in Figure 6.4.
156
Figure 6.2: The early stopping procedure for the feedforward neural network
Training (blue = training, green = validation and red = testing)
Figure 6.3: The mean squares error for training, validation and testing
157
Figure 6.4: The correlation coefficient between the actual (A) and predicted (T)
values
6.4
EXPERIMENTAL DESIGN FOR NEURAL NETWORK
The objective in this section is to investigate the effect of the number of
hidden nodes, the number of runs, momentum terms and learning rate to the NN
performance. We also performed an experiment to investigate the effect of outliers
data on the neural network performance.
We must first clarify how the experiment was designed. In the first stage, we
considered three factors, which from the literature should produce a significant effect
in NN performance, i.e. number of hidden nodes HN, number of runs NR, and
momentum rate MR. The number of hidden nodes has eight levels, between 3 and
10, there are six levels for the number of runs, (3, 5, 7, 10, 15, and 20) and four
levels for the momentum terms, which are 0.25, 0.5, 0.75, and 0.95.
158
We combined the information from fertilizer trials (nitrogen (N), phosphorus,
(P), potassium (K), and magnesium (Mg) fertilisers), and information from foliar
analysis the nitrogen concentration, phosphorus concentration, potassium
concentration, calcium concentration, and magnesium concentration. Therefore, this
neural network architecture involves nine inputs, and one output as shown in Figure
6.5. We conducted two experiments known as experiment 1 and experiment 2.
Experiment 3 was conducted purposely to investigate the outliers effects on neural
network performance.
Input layer
⎧N
⎪
⎪
⎪ P
⎪
⎪
⎪K
Foliar composition ⎨
⎪
⎪ Ca
⎪
⎪
⎪Mg
⎪
⎩
Hidden layer
Output layer
O
O
O
O
FFB yield
Fertilizer trial
⎧N
⎪
⎪
⎪ P
⎪
⎨
⎪K
⎪
⎪
⎪⎩Mg
O
O
O
Figure 6.5: Three layers of fully connected neural network with nine input nodes and
one output node
159
6.4.1
Experiment 1
This experiment considers three factors, namely hidden nodes, number of
runs and momentum term. This was carried out by changing the level of one factor
and assuming that two factors are fixed to run the networks. We then changed them
from one level to next level, recorded the error in estimation for each phase (training,
validation and testing) then calculated the average of the error. We used the
correlation to measure the model’s performance. As an example, suppose that the
hidden node was set to three and number of runs was also three, and that the
momentum term is 0.25. Now we can write the experiment as [3, 3, 0.25]. The first
value represents the number of hidden nodes, (HN), the second column represents the
number of runs, (NR), and the last value represents the momentum term, (MT). In
general the experiment can be written as [HN, NR, MT]. The momentum term level
will increase and the process is repeated for each factor until the maximum value at
[10, 20, 0.95] is reached.
6.4.2 Experiment 2
In the second experiment, we changed the momentum term with the learning
rate, and we set the momentum term at random. We still used the two factors of the
number of hidden nodes and the number of runs. The experiment then included the
number of hidden nodes HN, number of runs NR, and learning rate, LR and could be
written as [HN, NR, LR]. We considered the number of hidden nodes as having
seven levels, i.e. 2, 4, 6, 8, 10, 12, and 14, and n levels of runs i.e. 3, 5, 7, 9, 11, 15,
and 20. The learning rate had five levels, i.e. 0.05, 0.15, 0.25, 0.45, 0.65, and 0.95.
We repeated the process as in the first experiment until the maximum levels for each
factor were obtained.
160
6.4.3 Experiment 3
Outliers are observations that may influence the overall analysis, as it will
decrease the correlation values and increase the variance. It is therefore essential to
analyses the existing of outliers in our study. Outliers in a set of data will influence
the modelling accuracy as well as the estimated parameters especially in statistical
analysis as discussed by Hampel (1974), Andrew (1974), Rousseeuw and Leroy
(1987), Birkes and Dodge (1993), Mokhtar (1994) and Azme and Mokhtar (2004).
Barnett and Lewis (1995) and Rousseeuw and Leroy (1987) defined an outliers in a
set of data to be an observation or subset of data which appears to be inconsistent
with the remainder of that set of data. Reviews shows that no extensive research was
conducted on the influence of outliers in neural network modelling. Klein and
Rossin (1999a and 1999b) investigated the effects of data errors in neural network
modelling and found that neural network performance is influenced by errors in the
data set. Huber (1981) suggested that an observation is defined as outliers if its
values are outside the range µ ± 1.5 σ̂ , where µ is the average of the data and σ̂ is
the estimated standard deviation from the data set. This study examines the effects
of outliers on the application of neural network models to the analysis of oil palm
yield data.
This study considers two factors namely the percentage-outliers (P-O) and the
magnitude-outliers (M-O). The percentage-outliers are the percentage of the data in
the appropriate section of the data set, which are perturbed. The magnitude-outliers
are the degree to which the data deviate from the estimated mean. In this study, we
considered five input variables and one output variable and 243 data for each
variable, then total numbers of observations is 1458. Six levels of percentageoutliers factors were considered. They are the observations at 5%, 10%, 15%, 20%,
25% and 30%. The 5% outliers’ level means that the data set will contain 72
outliers. Therefore, the 10% level indicates 144 observations, the 15% level
indicates 216 observations, the 20% level indicates 288 observations, the 25% level
indicates 360 observations and the 30% level indicates 432 observations. This study
suggests five levels of magnitude-outliers namely µ ± 2.0 σ̂ , µ ± 2.5 σ̂ , µ ± 3.0 σ̂ , µ
± 3.5 σ̂ and µ ± 4.0 σ̂ . The observations were selected randomly and replaced
161
uniformly with outliers. For each level of percentage-outliers and magnitude-outliers
mentioned above, the number of hidden nodes increased from five to thirty and the
MSE values were recorded.
6.5
RESULTS AND DISCUSSION
The results of NN modelling will be discussed in four sections. Section 6.5.1
is a statistical analysis. Here, an analysis of variance (anova) will be used to
compare the model performance between the different activation functions used in
the model. Section 6.5.2 will discuss the neural network performance and the
selection of the best architecture for NN for each station in the inland and coastal
areas. Section 6.5.3 will elaborate on the residual analysis. The experiment’s design
of factors effect to NN performance and outlier data will be discussed in Section
6.5.4. Here the discussion includes the effects of number of runs, number of hidden
nodes, momentum terms and learning rate on the NN performance.
6.5.1 Statistical Analysis
In this section, we will discuss the effect of the combination activation
function in the hidden layer and output layer to the NN performance. Six
combination activation functions were considered, namely logsigmoid and
logsigmoid (LL), logsigmoid and purelin (LP), logsigmoid and tansigmoid (LT),
tansigmoid and purelin (TP), tansigmoid and logsigmoid (TL) and tansigmoid and
tansigmoid (TT). We ran all the combinations and recorded the mean squares error
for each number of hidden nodes. As mentioned before, the NN was divided into
three phases; training, validation and testing. We take the average of the MSE to
study the overall performance of NN architecture. For the purpose of this study, we
considered MSE as the testing variable of each phase. We also tested the correlation
between the predicted and observed values.
We are interested to test whether all combination activation functions will
produce equal MSE values. So that, two hypotheses were tested; the null
162
hypothesis, H0: all the MSE forming for each combination activation functions are
equal and the alternative hypothesis, Ha: at least one combination activation function
is not equal. In this case, the dependent variable was MSE value and the independent
variables were the combinations of activation function. For further explanation,
station ILD1was considered as an example. For each combination activation
function, the NN was running using 2 hidden nodes to 30 hidden nodes and the MSE
values and correlation coefficient for each phase were recorded. The numbers of
hidden node depended on the maximum number of hidden nodes which was obtained
from equation (6.3). Then, rearranged the MSE values and performed the analysis of
variance. As a standard procedure, the F statistic was used to test the null
hypothesis. The results of the test for the inland area are presented in Table 6.1. The
F values for training, validate, testing and average are 3.368, 17.997, 12.055 and
10.729, respectively, and found statistically significant at 5% level and (5, 198)
degrees of freedom. Then, by using the same procedure, others station were
obtained. As we can see, almost all the stations showed the p-value less than 5%
corresponded with their degrees of freedom, which signified rejection of the null
hypothesis, except for stations ILD6, ILD7 and the ILDT. At station ILD6 the
average MSE for different combination activation functions can be assumed equal as
it was also found in the training and correlation results at station ILD7.
Table 6.1: The F statistics values for different combination activation functions used
at inland station
Station
The F-value
Training Validate
Testing
Degrees of
Average Correlation
freedom
ILD1
3.368*
17.997*
12.055* 10.729*
3.062**
(5, 198)
ILD2
7.850*
15.516*
14.949* 10.091*
1.601
(5, 264 )
ILD3
2.736**
12.899*
7.951*
4.431*
(5, 240)
ILD4
2.291**
15.055*
10.452* 13.700*
3.058**
(5, 265)
ILD5
1.859
2.306**
42.523* 16.715*
2.519**
(5, 168)
ILD6
3.132**
22.047*
4.755*
0.927
4.606*
(5, 132)
ILD7
1.766
13.455*
7.028*
5.812*
0.916
(5, 264)
ILDT
0.853
11.736*
12.742*
3.732*
0.613
(5, 264)
1.924
Note: * significant at 1% level; ** significant at 5% level
163
Table 6.2: The F statistics values for different combination activation functions used
at coastal station
Station
F-value
Degrees of
Average Correlation
freedom
Training
Validate
Testing
CLD1
0.847
11.091*
18.495* 15.103*
6.454*
(5, 180)
CLD2
3.664*
7.724*
9.166*
10.197*
2.403*
(5, 192)
CLD3
3.265**
3.762*
1.918
2.905**
4.017*
(5, 54)
CLD4
1.295
12.413*
6.006*
8.957*
5.272*
(5, 222)
CLD5
1.524
10.218*
7.112*
1.615
2.139
(5, 180)
CLD6
3.232*
6.523*
10.145*
7.518*
6.146*
(5, 264)
CLD7
1.366
5.092*
3.354*
2.865**
5.911*
(5, 264)
CLDT
2.794**
8.083*
35.022* 13.037*
2.047
(5, 264)
Note: * significant at 1% level; ** significant at 5% level
We also performed the MSE testing using the nonparametric test, when
normality assumptions are not required. Here the statistical test is using χ2 (Lehman,
1998). The results are displayed in Table 6.3. Almost all the Chi-square tests were
statistically significant at 1% and 5% level at 5 degrees of freedom in the inland and
coastal areas. We found the both tests gave similar conclusions.
The Duncan test was performed using the average of MSE. Table 6.4 gives
details of similarities and differences between the combination activation functions.
For each station, the test shows a different result. This means that the NN
performance depends on the data site characteristic. Station CLD2 stated that the LL
activation function gives the smallest average of MSE, compared to the LP and
others. If we look at the value itself, the differences between combination functions
are rather small. The fluctuation of MSE occurs when the number of hidden nodes
is added. Station CLD7 gave an interesting situation, where 5 combination
activation functions performed more comparably than with the TP combination.
This also happened at station ILD3 in the inland area. According to the findings, no
generalization can be made on the selection of the combination activation function,
and we suggest that the ‘trial and error’ method is a good alternative in selecting the
best combination.
164
Table 6.3: The Chi-Square values of MSE testing for the inland and coastal stations
The Chi-Square value
Station
Training
Validation
Testing
Correlation
Average
Inland Stations
ILD1
18.873*
89.812*
50.565*
62.292*
17.268*
ILD2
38.581*
80.377*
123.511*
77.975*
9.615
ILD3
13.002**
75.554*
16.942*
18.634*
20.476*
ILD4
28.548*
84.575*
62.190*
68.517*
19.074*
ILD5
15.013**
18.960*
116.457*
86.067*
14.342**
ILD6
14.432**
69.147*
30.352*
14.223**
19.550*
ILD7
6.888
59.872*
45.036*
23.096*
5.646
ILDT
7.949
94.111*
57.421*
40.484*
11.315**
Coastal Stations
CLD1
5.245
49.758*
65.279*
48.697*
28.536*
CLD2
19.341*
44.742*
47.265*
52.928*
9.819
CLD3
16.727*
14.322**
7.375
11.229**
20.495*
CLD4
6.369
51.979*
41.195*
55.883*
29.942*
CLD5
6.150
36.313*
69.254*
30.774*
11.038*
CLD6
16.458*
27.999*
43.931*
23.837*
31.612*
CLD7
6.902
63.029*
13.042**
18.063**
9.119
CLDT
16.602*
39.407*
104.941*
64.055*
23.134*
Note: * significant at 1% level; ** significant at 5% level; df =5
165
Table 6.4: Duncan test for the average of MSE for homogeneous subsets for the
inland and coastal areas
Station
Combination activation function
Inland stations
ILD1
TL
TT
LP
TP
LL
LT
________________________
ILD2
LT
LL
TL
LP
TT
TP
________________________
ILD3
TL
LT
TP
TT
LL
LP
ILD4
TL
LT
LL
TT
TP
LP
ILD5
LL
TT
TP
TL
LT
LP
________________
ILD6
ILD7
Not Significant
LL
TT
LT
TL
LP
TP
______________
ILDT
TL
LT
TP
LL
TT
LP
Coastal stations
CLD1
TL
LT
TP
LL
LP
TT
_________________________________
CLD2
LL
LT
TL
TT
TP
LP
CLD3
TL
LP
LL
TP
LT
TT
__________________________________
CLD4
TT
LL
TL
LT
TP
LP
_________________
CLD5
Not significant
CLD6
LL
TL
TT
LT
LP
TP
CLD7
LL
TL
LT
TT
LP
TP
CLDT
TL
LT
TT
LL
TP
LP
_______________
_______________
166
6.5.2
Neural Network Performance
The best models in determining the hidden nodes were selected based on the
minimum average of MSE and are represented in Table 6.5 for the inland and Table
6.6 for the coastal area. The values in brackets [ ] represent the number of hidden
nodes, and we found that the optimum number of hidden nodes varies from the
stations to station. The mean squares error for each model and every phase (training
phase, validation phase and testing phases) are shown in Table 6.5 and Table 6.6.
The average of MSE is in the range of 0.0200 to 0.0372 (Table 6.5) and the
combination station ILDT recorded the lowest MSE value (0.0049). The MSE
values in the training phase are slightly lower compared to validation and testing
phases. The average MSE values for the coastal area are in the range from 0.0220 to
0.04770, and CLDT recorded the highest value (0.0520) in Table 6.6.
Table 6.5: Mean squares error for training, validation, testing and average of the
neural network model in the inland area
Mean squares error estimate
Station/Model
Training
Validation
Testing
Average
ILD1: LL [24]
0.0176
0.0326
0.0372
0.0292
ILD2: LL [27]
0.0173
0.0207
0.0234
0.0200
ILD3: TP [24]
0.0177
0.0351
0.0444
0.0304
ILD4: TP [20]
0.0286
0.0313
0.0259
0.0286
ILD5: LL [13]
0.0136
0.0225
0.0247
0.0202
ILD6: TT[16]
0.0179
0.0312
0.0366
0.0285
ILD7: LL [38]
0.0354
0.0391
0.0373
0.0372
ILDT: LP [23]
0.0042
0.0057
0.0048
0.0049
167
Table 6.6: Mean squares error for training, validation, testing and average of the
neural network model in the coastal area
Mean squares error estimate
Station/Model
Training Validation
Testing
Average
CLD1: TL [16]
0.0312
0.0542
0.0436
0.0430
CLD2: LP [23]
0.0135
0.0369
0.0928
0.0477
CLD3: LP [06]
0.0094
0.0138
0.0429
0.0220
CLD4: LT [07]
0.0236
0.0538
0.0255
0.0343
CLD5: TP [06]
0.0364
0.0525
0.0545
0.0436
CLD6: LT [39]
0.0265
0.0436
0.0456
0.0394
CLD7: TP [25]
0.0202
0.0347
0.0285
0.0333
CLDT: LP [38]
0.0384
0.0493
0.0474
0.0520
Table 6.7 shows the correlation coefficient between the observed value and
the predicted value from the NN model. The higher value of the correlation
coefficient indicates that the predicted value is similar to the actual value. The
highest correlation was observed in station ILD1 by using LL combination (0.7984
or R2 = 0.6374). The LL combination at station ILD2 also produced a higher value
(0.7840 or R2 = 0.6146). The best linear plot for all stations is presented in Appendix
P and Appendix Q for the inland and coastal areas respectively. It shows how close
the predicted and actual values are. For the inland area, the highest value of r was
0.8361, then with station ILD5 with r = 0.8114, station ILD1 with r = 0.7984),
station ILD6 with r = 0.7900, station ILD3 with r = 0.7853 and followed by station
ILD2 with r = 0.7840.
For the coastal area, the highest value of r was recorded at station CLD3
(0.8703), and was followed by station CLD4 with r = 0.8313. In general, the r- value
for the inland area was greater than for the coastal area. Thus, the effects of the
foliar nutrient composition to oil palm yield in the inland areas are more evident than
in the coastal areas.
168
Table 6.7: The correlation coefficient of the neural network model
The activation function combination (input to hidden layer and hidden
layer to output layer)
Station
LL
LP
LT
TP
TL
TT
Inland stations
ILD1
0.7984
0.7461
0.7094
0.7575
0.7336
0.7274
ILD2
0.7840
0.7471
0.7574
0.7433
0.7530
0.7586
ILD3
0.7754
0.7662
0.7634
0.7853
0.7712
0.7558
ILD4
0.5687
0.5988
0.5858
0.5900
0.5559
0.5758
ILD5
0.8114
0.7529
0.7635
0.7927
0.7870
0.8051
ILD6
0.7306
0.7210
0.7114
0.7297
0.7136
0.7900
ILD7
0.4498
0.4961
0.4647
0.4912
0.4961
0.4245
ILDT
0.8288
0.8361
0.8360
0.8291
0.8331
0.8335
Coastal stations
CLD1
0.7395
0.7756
0.7836
0.7328
0.7404
0.6718
CLD2
0.5880
0.5948
0.5625
0.6425
0.6094
0.5826
CLD3
0.8633
0.8603
0.8640
0.8657
0.8703
0.8460
CLD4
0.8313
0.7974
0.7922
0.7974
0.8135
0.7936
CLD5
0.5359
0.5155
0.5180
0.5397
0.5412
0.5161
CLD6
0.4031
0.4334
0.4560
0.4625
0.3863
0.3965
CLD7
0.6663
0.6590
0.6556
0.7054
0.6763
0.6572
CLDT
0.5241
0.5489
0.5111
0.5095
0.4916
0.5105
After NN architecture was selected, the MSE, RMSE, MAE and MAPE for
each station and each combination were calculated. Here we calculate the accuracy
of the model prediction(s) compared to the actual values. Table 6.8 shows the
MAPE value for each station and for each combination activation function (another
measurement shown in Appendix R and Appendix S). Normally, decisions can be
made by looking at the model that produces the minimum error. The shadowed
value of MAPE was selected to be the best model.
169
Table 6.8: The MAPE values of the neural network model
The activation function combination (input layer to hidden layer and
hidden layer to output layer)
Station
LL
LP
LT
TP
TL
TT
Inland area
ILD1
0.1159
0.1318
0.1460
0.1443
0.1429
0.1456
ILD2
0.1162
0.1257
0.1244
0.1306
0.1263
0.1223
ILD3
0.1195
0.1190
0.1126
0.1092
0.1141
0.1166
ILD4
0.1356
0.1276
0.1304
0.1288
0.1350
0.1316
ILD5
0.0944
0.1133
0.1067
0.0964
0.0956
0.0968
ILD6
0.0987
0.1010
0.0992
0.1018
0.1052
0.0719
ILD7
0.1674
0.1580
0.1691
0.1642
0.1614
0.1747
ILDT
0.0576
0.0560
0.0564
0.0578
0.0570
0.0573
Coastal
CLD1
0.1456
0.1337
0.1398
0.1437
0.1220
0.1461
CLD2
0.0725
0.0638
0.0717
0.0665
0.0779
0.0743
CLD3
0.0756
0.0657
0.0750
0.0688
0.0844
0.0789
CLD4
0.1126
0.1298
0.1118
0.1282
0.1163
0.1125
CLD5
0.1488
0.1498
0.1578
0.1431
0.1530
0.1514
CLD6
0.1279
0.1237
0.1232
0.1274
0.1335
0.1302
CLD7
0.1066
0.1086
0.1088
0.1003
0.1056
0.1103
CLDT
0.1499
0.1456
0.1516
0.1572
0.1531
0.1518
Figure 6.6 to Figure 6.10 show the actual and predicted FFB yield value using
the NN model. They represent the best model, which has been discussed earlier.
The pink line represents the predicted value, while the blue line represents the actual
value. Figure 6.6 and Figure 6.7 show the plot for the inland areas. We can see that
most of the red and blue lines cannot be differentiated except for these representing
stations ILD4 and ILD7. The predicted FFB yield values for station ILD4 and ILD7
are quite smaller than the actual values. This is not surprising because the correlation
coefficient values for both models were considered low compared to the other
stations. In fact, the MAPE values for both stations are among the highest. If we
170
study the combined plot for all the stations (ILDT) in the inland areas, we found that
the predicted and actual values are not much different. Figure 6.9 to Figure 6.11
represent the actual and predicted FFB yield values for the coastal areas. As for the
inland area, the neural networks model can be used to represent the behavior of the
Y ield (ton/hec/yr
data for all stations.
50.00
40.00
30.00
20.00
10.00
235
222
209
196
183
170
157
144
131
118
105
92
79
66
53
40
27
14
1
0.00
Obs ervation
A ctual
Pred
ILD1
Y ield (ton/hec/yr
50.00
40.00
30.00
20.00
10.00
505
477
449
421
393
365
337
309
281
253
225
197
169
141
113
85
57
29
1
0.00
Observ ation
A ctual
Pred
40.00
30.00
20.00
10.00
286
271
256
241
226
211
196
181
166
151
136
121
106
91
76
61
46
31
16
0.00
1
Y ield (ton/hec/yr
ILD2
Obs erv ation
Actual
Pred
ILD3
* Pred. = predicted
Figure 6.6: The actual and predicted FFB yield for ILD1, ILD2 and ILD3 stations
using the NN model
40.00
30.00
20.00
10.00
Observation
Actual
Pred
4 0 .0 0
3 0 .0 0
2 0 .0 0
199
188
177
166
155
144
133
122
111
89
78
67
56
45
34
23
12
0 .0 0
100
1 0 .0 0
1
Y ie ld ( t o n /h e c
ILD4
O b s e r v a tio n
A c tu a l
Pr e d
40.00
30.00
20.00
10.00
163
154
145
136
127
118
109
100
91
82
73
64
55
46
37
28
19
10
0.00
1
Yield (ton/hec/yr)
ILD5
Obs erv ation
Pred
Actual
5 0 .0 0
4 0 .0 0
3 0 .0 0
2 0 .0 0
523
494
465
436
40 7
378
349
320
291
262
233
204
175
88
59
30
1
0 .0 0
146
1 0 .0 0
117
Y ie ld ( t o n /h e c
ILD6
O b s e r v a tio n
A c tual
P re d
ILD7
* Pred. = predicted
Figure 6.7: The actual and predicted FFB yield for ILD4, ILD5, ILD6 and ILD7
stations using the NN model
628
595
562
529
496
463
430
397
364
331
298
265
232
199
166
133
100
67
34
0.00
1
Y ield (ton/hec/yr
171
172
2566
2431
2296
2161
2026
1891
1756
1621
1486
1351
1216
1081
946
811
676
541
406
271
136
1
Y ield (ton/hec/yr
45.00
40.00
35.00
30.00
25.00
20.00
15.00
10.00
5.00
0.00
Obs ervation
Actual
Pred
217
205
193
181
169
157
145
133
121
109
97
85
73
61
49
37
13
0 .0 0
5 .0 0
0 .0 0
5 .0 0
0 .0 0
5 .0 0
0 .0 0
5 .0 0
0 .0 0
1
Y ie ld ( t o n / h
4
3
3
2
2
1
1
25
ILDT
O b s e r v a t io n
A c tu a l
Pre d
2 22
209
19 6
183
170
15 7
14 4
131
1 18
105
92
79
66
53
40
27
0 .0 0
5 .0 0
0 .0 0
5 .0 0
0 .0 0
5 .0 0
0 .0 0
5 .0 0
0 .0 0
14
4
3
3
2
2
1
1
1
Y ie ld ( t o n / h e
CLD1
O b s e r v a t io n
A c tu a l
Pre d
CLD2
Yield (ton/hec/yr)
40.00
30.00
20.00
10.00
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
0.00
Observation
Actual
Pred
CLD3
* Pred. = predicted
Figure 6.8: The actual and predicted FFB yield for ILDT, CLD1, CLD2 and CLD3
stations using the NN model
256
2 41
22 6
2 11
1 96
181
1 66
15 1
136
1 21
10 6
91
76
61
46
31
16
4 0.00
3 5.00
3 0.00
2 5.00
2 0.00
1 5.00
1 0.00
5.00
0.00
1
Y ie ld (ton /h ec /y
173
Obs e rv atio n
A c tua l
Pr e d
217
205
193
181
169
157
145
133
121
109
97
85
73
61
49
37
25
13
4 0 .0 0
3 5 .0 0
3 0 .0 0
2 5 .0 0
2 0 .0 0
1 5 .0 0
1 0 .0 0
5 .0 0
0 .0 0
1
Y ie ld ( t o n / h e
CLD4
O b s e r v a t io n
A c tu a l
Pr e d
487
460
433
379
379
406
352
352
325
298
271
244
217
190
163
136
109
82
55
28
4 0 .0 0
3 5 .0 0
3 0 .0 0
2 5 .0 0
2 0 .0 0
1 5 .0 0
1 0 .0 0
5 .0 0
0 .0 0
1
Y ie ld ( t o n / h e
CLD5
O b s e r v a t io n
A c tu a l
Pr e d
CLD6
3 0 .00
2 0 .00
487
460
433
406
325
298
271
244
217
190
163
82
55
28
1
0 .0 0
136
1 0 .00
109
Y ie ld ( t o n /h e c
4 0 .00
O b s e r v a t io n
A c tu a l
Pr e d
CLD7
* Pred. = predicted
Figure 6.9: The actual and predicted FFB yield for CLD4, CLD5, CLD6 and CLD7
using the NN model
Y ield (ton/hec/yr)
174
50.00
40.00
30.00
20.00
10.00
2015
1909
1803
1697
1591
1485
1379
1273
1167
1061
955
849
743
637
531
425
319
213
107
1
0.00
Observation
A ctual
Pred
CLDT
* Pred. = predicted
Figure 6.10: The actual and predicted FFB yield for CLDT using the NN model
6.5.3
Residual Analysis
Residual analysis was also performed to examine the distribution of the error
modelling Figure 6.11 and Figure 6.12 present the error distribution after NN models
are fitted for the inland and coastal areas respectively. Figure 6.11 (ILD1 to ILD7)
shows the error distribution for the inland stations, and Figure 6.11 ILDT is the
combination of all inland stations. The error distribution of all stations appears to
scatter randomly at the zero line. Figure 6.12 (CLD1 to CLD7), presents the error
distribution for the coastal stations, and Figure 6.12 (CLDT) shows the error
distribution of the combination for all stations in the coastal area. We also found that
all the error distributions were normally distributed. It means that the selected neural
network model is adequate to fit the oil palm yield data.
175
0.80
0.50
0.40
0.60
0.30
0.40
0.20
Error
Error
0.10
0.00
-0.10 0
50
100
150
200
250
300
0.20
0.00
0
100
200
300
400
500
600
-0.20
-0.20
-0.30
-0.40
-0.40
-0.60
-0.50
O b se rva tion
Observation
ILD2
ILD1
0.50
0.80
0.40
0.60
0.30
0.20
0.40
0.00
-0.10 0
50
100
150
200
250
300
350
Error
Error
0.10
0.20
0.00
-0.20
0
100
200
300
400
500
600
700
-0.20
-0.30
-0.40
-0.40
-0.50
-0.60
-0.60
Observation
Observation
ILD4
0.50
0.50
0.40
0.40
0.30
0.30
0.20
0.20
0.10
0.10
0.00
-0.10 0
50
100
150
200
250
Error
Error
ILD3
0.00
-0.10 0
-0.20
-0.20
-0.30
-0.30
-0.40
-0.40
-0.50
50
100
Obse rvation
ILD6
ILD5
0.80
0.50
0.60
0.40
0.30
0.40
0.20
0.00
0
100
200
300
400
500
600
Error
0.20
Error
200
-0.50
Observation
-0.20
150
0.10
0.00
-0.10
-0.40
-0.20
-0.60
-0.30
0
500
1000
1500
2000
2500
3000
-0.40
-0.80
Observation
ILD7
Observation
ILDT
Figure 6.11: The error distribution plots of the neural network model for the inland
stations
176
0.50
0.60
0.40
0.40
0.30
0.20
0.20
Error
0
50
100
150
200
250
-0.20
Error
0.10
0.00
0.00
-0.10 0
50
100
150
200
250
-0.20
-0.40
-0.30
-0.60
-0.40
-0.80
-0.50
Observation
Observation
CLD2
0. 30
0.60
0. 20
0.40
0. 10
0.20
0. 00
0
20
40
60
80
100
-0.10
Error
Error
CLD1
0.00
0
-0.20
-0.40
-0.30
-0.60
-0.40
50
100
200
250
300
400
500
600
-0.80
Observation
Obse rva tion
CLD3
CLD4
0.60
0.80
0.40
0.60
0.20
0.40
0.00
0
50
100
150
200
250
-0.20
Error
Error
150
-0.20
0.20
0.00
0
-0.40
-0.20
-0.60
-0.40
-0.80
100
200
300
-0.60
Observation
Observation
CLD5
CLD6
0.60
1.00
0.80
0.40
0.60
0.20
0
100
200
300
-0.20
400
500
600
Error
Error
0.40
0.00
0.20
0.00
-0.20
0
500
1000
1500
2000
2500
-0.40
-0.40
-0.60
-0.60
-0.80
-0.80
Obse rva tion
CLD7
Observation
CLDT
Figure 6.12: The error distribution plots of the neural network model for the coastal
stations
177
6.5.4 Results of Experiment 1
After completing the experiment, we rearranged the data for variance analysis
and response surface analysis. The F value of hidden nodes, 8.7759 (p = 0.0000 and
df = (7, 1912)), indicated that the performance of the neural networks model was
statistically affected by the number of hidden nodes. The F value for number of runs,
1.6950 (p = 0.1330 and df = (5, 1914)), and the momentum term, 1.3300 (p = 0.2630
and df = (3, 1916)), show that both factors did not influence the overall performance
of the neural networks. This analysis yields the conclusion that only the number of
hidden nodes has a significant influence on the NN performance, and there is no
effect resulting from the number of runs or the momentum term value on the neural
network’s performance.
6.5.5 Results of Experiment 2
After running the analysis of variance, we found that the F value for the
hidden nodes is 8.0480 (p = 0.0000 and df = (6, 2932)) and the F value for the
number of runs is 2.8840 (p = 0.0080 and df = (6, 2932)). This indicates that both
factors affect the neural network’s performance. However, the F value for the
learning rate is 1.6090 (p = 0.1540 and df = (5, 2933)), which means that the null
hypothesis cannot be rejected and we conclude that the learning rate does not
influence the neural network’s performance.
6.5.6 Results of Experiment 3
The results of the analysis of variance (ANOVA) tests, and independent
sample t-tests (Norušis 1998) conducted to test the effects of percentage-outliers and
magnitude-outliers on MSE, are discussed. Tests are also performed to obtain which
combinations of percentage-outliers and magnitude-outliers differ significantly from
the base-case scenario with no data outliers, and their findings are reported. For both
178
experiments, actual and predicted values were compared using mean squares error
(MSE) as a measure of modelling accuracy.
(i)
Experiment for outliers in the training data
Estimated MSE results, using simulated inaccuracies for magnitude-outliers
and percentage-outliers in oil palm yield modeling, are given in Figure 6.13. The
results show that as percentage-outliers increases from 5% to 30%, MSE values also
increases, indicating a decrease in modelling accuracy. As magnitude-outliers
increases from 2 σ̂ to 4 σ̂ , MSE values also increase, again indicating a decrease in
MSE
modelling accuracy in the training data (Figure 6.14).
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
0
5
10
15
20
25
30
Percentage-outlier
2
2.5
3
3.5
4
Figure 6.13: The MSE values for different levels of the percentage-outliers in the
training data
MS E
179
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
0
2
2.5
3
3.5
4
Magnitude-outlier
5
10
15
20
25
30
Figure 6.14: The MSE values for different levels of the magnitude-outliers in the
training data
A one-factor ANOVA test was conducted to investigate the individual effects
of percentage-outliers and magnitude-outliers on the neural network’s performance.
The independent variables are the percentage-outliers (5%, 10%, 15%, 20%, 25%
and 30%) and the magnitude-outliers (µ ± 2.0 σ̂ , µ ± 2.5 σ̂ , µ ± 3.0 σ̂ , µ ± 3.5 σ̂ and
µ ± 4.0 σ̂ ). The F values were recorded as 18.481 (p = 0.000 and df =(5, 179)) and
3.988 (p = 0.002 and df = (4, 179)) for the percentage-outliers and magnitude-outliers
respectively, indicating that both factors produced a statistically significant effect on
the modelling accuracy.
Following this, the two-factor ANOVA test was conducted to examine the
effects of both independent variables on MSE simultaneously. Significant main
effects for the percentage-outliers (F = 28.246 and df = (5, 154)) and the magnitudeoutliers (F = 3.332 and df = (4, 154)) and their interaction (F = 2.507 and df = (20,
154)), were found as the p-values were less then 0.05. These results indicated that
modelling accuracy in the training data can be affected by both the percentageoutliers and the magnitude-outliers.
When more than two levels of factor were conducted, the ANOVA results did
not indicate where significant differences occurred. For example, while the
180
percentage-outliers is a significant factor, this difference may be a result of the
percentage-outliers changing from 10% to 15%, or 15% to 20%, or 25% to 30%. It
could also have come from a larger jump, such as 5% to 25% or 10% to 30%.
The independent t-test with 9 degrees of freedom was performed to test the
MSE values between results with no outliers and the conjunction of percentageoutliers and magnitude-outliers. Independent sample t-tests were performed in order
to determine exactly where significant differences occurred. The results of the t-tests
are presented in Table 6.9. For all the σ̂ ’s of magnitude-outliers, significant
differences (p < 0.05) were found between the percentage-outliers of 15%, 20%, 25%
and 30% and data sets with no outliers. This means that the neural network was first
influenced by the outliers in the training data when the percentage-outliers reached
15%. The neural network is unaffected by the outliers impact when the percentageoutliers in the training data is lower than 15%.
Table 6.9: The t-statistic values in the training data
Magnitudeoutliers
Percentage-outliers
5%
10%
15%
20%
25%
30%
2.0 σ̂
0.410
-0.918
-2.902*
-3.797*
-3.374*
-2.722*
2.5 σ̂
0.208
-0.597
-2.857*
-3.266*
-3.687*
-3.517*
3.0 σ̂
-1.348
-2.080
-3.301*
-3.218*
-3.979*
-3.503*
3.5 σ̂
-0.897
-0.142
-3.048*
-3.178*
-4.805*
-6.867*
4.0 σ̂
-0.861
-1.991
-2.831*
-3.990*
-5.147*
-6.211*
* p-value < 0.05; degrees of freedom = 9.
(ii)
Experiment for outliers in the test data
In this section, we conducted an experiment for outliers in test data which
used the same procedures of ANOVA and independent sample t-tests as the training
data. The estimated MSE results, using the simulated inaccuracies for magnitudeoutliers and percentage-outliers for the oil palm yield modelling are given in Figure
181
6.15. They show that as the percentage-outliers increases from 5% to 30%, the MSE
also increases, indicating a decrease in estimate accuracy. As the magnitude-outliers
increases from 2 σ̂ to 4 σ̂ , the MSE also increases, which indicates a decrease in the
MSE
modelling accuracy (Figure 6.16).
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0
5
10
15
20
25
30
Percentage-outlier
2
2.5
3
3.5
4
Figure 6.15: The MSE values for different levels of the percentage-outliers in test
data
0.14
0.12
MSE
0.1
0.08
0.06
0.04
0.02
0
0
2
2.5
3
3.5
4
Magnitude-outlier
5
10
15
20
25
30
Figure 6.16: The MSE values for different levels of the magnitude-outliers in the test
data
A one-factor ANOVA test was conducted to investigate the individual effects
of percentage-outliers and the magnitude-outliers on the neural network’s
performance in the test data set. The independent variables used are percentageoutliers (6 levels) and magnitude-outliers (5 levels). The F values were recorded as
12.171 (p = 0.000 and df = (5, 179)) and 3.570 (p = 0.004 and df = (4, 179)) for the
182
percentage-outliers and magnitude-outliers respectively, indicating that both factors
are statistically significant therefore affecting the modelling accuracy.
Next, the two-factor ANOVA test was conducted to investigate for the effect
of both independent variables on MSE simultaneously. Significant main effects for
percentage-outliers (F = 11.709 and df = (5, 154)), magnitude-outliers (F = 2.640 and
df = 4, 154)) and their interaction (F = 2.273 and df = (20, 154)) were found as the p-
values were less then 0.05. These results indicated that the percentage-outliers and
magnitude-outliers had an effect on modelling accuracy.
The independent t-tests were also performed to examine the MSE values
between results with no outliers and the conjunction of percentage-outliers and
magnitude-outliers. Independent sample t-tests were performed in order to determine
exactly where significant differences occurred. The results of the t-test with 9
degrees of freedom are presented in Table 6.10. For all the σ̂ ’s of magnitudeoutliers, significant differences (p < 0.05) were found between percentage-outliers of
15%, 20%, 25% and 30% and data sets with no outliers. Therefore, the conclusion
can be made that the neural network was first influenced by the outliers when the
percentage-outliers reached 15%. The neural network is resilient to the outliers’
impact when the percentage-outliers in the test data is lower than 15%. This result is
consistent with the result from the training set data.
183
Table 6.10: The t-statistic values for the test data
Magnitude-
Percentage-outliers
outliers
5%
10%
15%
20%
25%
30%
2.0 σ̂
-1.043
-1.196
-3.092*
-5.429*
-2.558*
-8.283*
2.5 σ̂
-1.365
-0.982
-2.814*
-4.304*
-3.073*
-6.072*
3.0 σ̂
-0.567
-1.442
-3.535*
-4.461*
-5.086*
-5.669*
3.5 σ̂
-0.090
-0.523
-2.999*
-3.619*
-5.902*
-6.768*
4.0 σ̂
-0.172
-0.346
-3.061*
-3.322*
-5.141*
-3.355*
* p-value < 0.05; degrees of freedom = 9.
6.6
COMPARATIVE STUDY ON YIELD MODELLING
The MLR and RMR have been discussed in chapter 5. In this section, we
will look at the performance of each model and conduct comparative study. The
comparison is based on measurements of error such as MSE, RMSE, MAE, MAPE
and the correlation coefficient (Table 6.11). Description of each station is given
according to the MAPE and correlation values.
The MAPE value for the MLR model at station ILD1 is 0.1623, or as a
percentage, 16.23% error, whereas the RMR model decreased the error to 15.79%.
The NN model recorded the minimum MAPE value as 0.1159, or an 11.59% error.
The correlation coefficient for the NN model was also the highest at 0.7984,
compared to the MLR and the RMR models at 0.6260 and 0.7554 respectively.
For station ILD2, we found that the MAPE value for the MLR and the RMR
models were 0.1483 and 0.1555 respectively, compared to the NN model which was
0.1162. This means that the overall errors from the NN model were about 11.62%.
The correlation value for the RMR and NN models are quite close to each other
0.7732 and 0.7810 respectively.
184
The error modelling results for station ILD3 using the MLR, RMR and NN
models are 14.04%, 14.03% and 10.92% respectively. While the correlation
coefficients for each are 0.6360, 0.6169 and 0.7853 respectively. The error
percentages recorded at station ILD4 are considered high at 15.06%, 17.65% and
12.76% for the MLR, RMR and NN models respectively. This might be due to the
nutrients composition’s low response to FFB yield. The correlation coefficients also
support this due to the low values recorded for each model.
The error model recorded at station ILD5 was 14.97% for MLR, 14.83% for
RMR and 9.44% for the NN models. Furthermore, the correlation coefficient for the
NN models comparatively high compared to MLR and RMR at 0.8114 compared to
0.6330 and 0.5681, respectively. The MAPE for the MLR and RMR models at
station ILD6 are 0.1242 and 0.1778 respectively. However, the NN model recorded
a quite low result of 0.0719. For station ILD7, all models recorded very low results
in correlation, around 0.3430 and 0.4961, in a similar phenomenon to station ILD4.
For the ILDT, the percentage error is 18.31% for the MLR model, 19.01% for the
RMR model and 5.60% for the NN model.
Table 6.11: The MSE, RMSE, MAE and MAPE for MLR, RMR and NN
performance for the inland area
Inland Stations
Station
MSE
RMSE
MAE
MAPE
Correlation
ILD1 MLR
21.4611
4.6326
3.6667
0.1623
0.6260
RMR
25.0822
5.0082
3.7602
0.1579
0.7554
NN
13.1079
3.6204
2.6677
0.1159
0.7984
ILD2 MLR
16.8930
4.1101
3.2249
0.1483
0.6490
RMR
18.4810
4.2989
3.3185
0.1555
0.7732
NN
11.4734
3.3872
2.5910
0.1162
0.7810
ILD3 MLR
18.3084
4.2788
3.4338
0.1404
0.6360
RMR
18.4817
4.2990
3.4290
0.1403
0.6169
NN
11.9431
3.4558
2.7076
0.1092
0.7853
185
(continue…)
Station
MSE
RMSE
MAE
MAPE
Correlation
ILD4 MLR
17.8201
4.2214
3.4573
0.1506
0.4300
RMR
26.0176
5.1001
4.0155
0.1765
0.4457
NN
14.0759
3.7517
2.9545
0.1276
0.5988
ILD5 MLR
13.8848
3.7262
3.0457
0.1497
0.6330
RMR
13.9845
3.7395
3.0222
0.1483
0.5681
NN
5.9225
2.4336
1.8398
0.0944
0.8114
14.0868
3.7532
3.0552
0.1242
0.5630
MR
27.8998
5.2820
4.4151
0.1778
0.5593
NN
7.0829
2.6613
1.7861
0.0719
0.7900
23.5147
4.8492
3.8944
0.1798
0.3430
MR
23.7693
4.8754
3.8651
0.1745
0.3568
NN
20.4788
4.5253
3.5087
0.1580
0.4961
ILDT MLR
24.8232
4.9822
4.0833
0.1831
0.3840
RMR
29.0756
5.3922
4.2919
0.1901
0.4932
NN
2.8263
1.6811
1.2689
0.0560
0.8361
ILD6 MLR
ILD7 MLR
Table 6.12 presents the summary of measurements for those three models
considered in this study for coastal areas. For example, the percentage error at
station CLD1 is 17.66% for MLR, 16.25% for RMR and 12.20% for NN. These
figures prove that the NN model is able to fit the model more effectively when
compared with regression approach. Looking at adequacy (coefficient correlation),
we found that the MLR approach is much better than the RMR. However, the NN
model still shows some improvement; an improvement of 0.616 to 0.7407 from the
MLR to the NN model.
For stations CLD2 and CLD3, the error percentages stated are quite low at
around 8 %. This signifies that the models have been matched efficiently to the data.
The coefficient correlation for station CLD3 is does not differ much among the three
matched models. Meanwhile, in station CLD4, the coefficient correlation for the NN
model is very different to the MLR and RMR models. The NN models results of
recorded 0.8312, compared to 0.224 and 0.4748 for the MLR and RMR models.
186
Table 6.12: The MSE, RMSE, MAE and MAPE for MLR, RMR and neural network
performance for the coastal area
Coastal area
Station
CLD1
CLD2
CLD3
CLD4
CLD5
CLD6
CLD7
CLDT
MSE
RMSE
MAE
MAPE
Correlation
MLR
24.4451
4.9442
4.0550
0.1766
0.6160
RMR
28.4546
5.3343
4.0660
0.1625
0.3438
NN
15.1412
3.8911
2.8715
0.1220
0.7404
MLR
8.3597
2.8913
2.3787
0.0813
0.4130
RMR
9.5160
3.0848
2.4246
0.0843
0.5538
NN
7.5109
2.7406
1.9190
0.0638
0.6422
MLR
7.5409
2.7461
2.1067
0.0839
0.8290
RMR
7.5526
2.7482
2.1055
0.0838
0.7186
NN
6.6892
2.5863
1.7651
0.0657
0.8703
MLR
40.2150
6.3415
5.6664
0.2562
0.2240
RMR
80.9058
8.9948
5.8929
0.3064
0.4748
NN
13.2776
3.6438
2.6550
0.1118
0.8312
MLR
23.7183
4.8701
3.9941
0.1729
0.3320
RMR
23.7804
4.8765
3.9849
0.1725
0.3397
NN
18.5634
4.3085
3.3054
0.1431
0.5397
MLR
17.6260
4.1983
3.3901
0.1394
0.2080
RMR
17.6583
4.2022
3.3828
0.1389
0.2231
NN
15.0867
3.8841
3.0213
0.1232
0.4560
MLR
17.5519
4.1895
3.3821
0.1301
0.4810
RMR
23.8404
4.8827
3.6051
0.1505
0.3883
NN
11.6043
3.4065
2.6659
0.1003
0.7054
MLR
27.3915
5.2337
4.2791
0.1804
0.2100
RMR
33.7806
5.8121
4.4196
0.2000
0.3742
NN
20.0818
4.4812
3.5495
0.1456
0.5489
187
In station CLD5, CLD6 and CLDT, the MLR and RMR models were
recorded comparatively high MAPE values of around 13 to 20%. Furthermore, the
coefficient correlations attained are considered low compared to the other stations.
Even so, the NN model has successfully increased the coefficient correlation for
those three stations. For example, the correlation recorded in station CLD5 for the
MLR model has increased from 0.3320 to 0.5397. In addition, at station CLD6 the
correlation has increased from 0.2080 to 0.4560. For CLDT also shows an increase
from 0.2100 to 0.5489
Figure 6.17 and Figure 6.18 display graphically the differences of the
correlation coefficient for those three models for the inland and coastal areas. The
first and second bars represent the correlation value from the MLR and RMR models
respectively, while the third bar show the correlation value from the NN model. For
all inland stations, the correlation values of the NN model were recorded as
significantly higher than when using the regression approach, except for station
ILD2. At station ILD2 the correlation values of the RMR and NN models were quite
close, but the NN model still gave a higher result. For the coastal area, all correlation
values from the NN model recorded higher than those from the regression model.
Correlation
coefficient
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
ILD1
ILD2
ILD3
ILD4
ILD5
ILD6
ILD7
ILDT
Stations
MLR
RMR
NN
Figure 6.17: The correlation coefficients from the MLR, RMR and NN models for
the inland area
188
0.9
0.8
0.7
0.6
Correlation 0.5
coeffficient 0.4
0.3
0.2
0.1
0
CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 CLDT
Stations
MLR
RMR
NN
Figure 6.18: The correlation coefficient from the MLR, RMR and NN models for the
coastal area
0.2
0.18
0.16
0.14
0.12
MAPE 0.1
0.08
0.06
0.04
0.02
0
ILD1 ILD2 ILD3 ILD4 ILD5 ILD6 ILD7 ILDT
Stations
MLR
RMR
NN
Figure 6.19: Comparison of the MAPE values between the MLR, RMR and NN
models for the inland area
189
0.35
0.3
0.25
0.2
MAPE
0.15
0.1
0.05
0
CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 CLDT
Stations
MLR
RMR
NN
Figure 6.20: Comparison of the MAPE values between the MLR, RMR and NN
models for the coastal area
Figure 6.19 represents the MAPE values for three different models for the
inland area. In this figure, the third bar represents the MAPE values of the NN
model, and all of them are lower than those from the regression models especially
that of ILDT. Figure 6.20 also represents the MAPE values for those three models
for the coastal area. As with the inland area, the NN model also recorded lower
MAPE values compared to the regression model approach.
In this section, we will discuss thoroughly the changes in the coefficient
correlation and the increment of accuracy among the MLR, RMR and NN models.
Table 6.13 indicates the change of coefficient correlation from the MLR to the NN
model and from the RMR to the NN model. For station ILD1 (inland area), the
change from the MLR to the NN model is considered higher than the change from
the RMR to the NN model. Specifically, the change is 27.54% compared to5.69%.
We found that the same situation occurs at station ILD2. The change from the RMR
to the NN model is 20.34%, the change from the RMR model to the NN model is
only 1.01.
190
On the other hand, stations ILD3, ILD4, ILD6 and ILD7 recorded large
changes in correlation coefficient, but the changes from both the two models (MLR
and RMR) are quite similar. The percentage change from the MLR to the NN model
is very high 117.73% for the ILDT, compared to only a 69.52% changes from the
RMR to the NN model.
From Table 6.13, we can see the large changes in coefficient correlation for
the coastal area from the regression analysis to the NN model. The change from the
MLR model to the NN model at station CLD1 is quite small, 20.19%, compared with
the 115.36% change that occurred from the RMR to the NN model. The highest
change from the MLR model to the NN model were recorded at station CLD4,
CLDT and CLD6 as 253.66%, 161.23% and 119.23%, respectively. In addition, the
percentage change from the RMR to the NN model quite high between about 58%
and 104 %, except for stations CLD2 and CLD3, where the changes are only 7.43%
and 14.17% respectively for the RMR model.
Table 6.14 presents the numerical values of accuracy (measured in MAPE)
estimating from the models proposed. The last two columns present the changes in
accuracy of the regression model compared to the NN model. In the inland area, the
changes in accuracy from the MLR to the NN model range from around 2.6 to
15.56%, whilst the percentage changes from the RMR to the NN model range from
around 2 to 16.56%. The highest changes were recorded at the ILDT.
191
Table 6.13: The correlation changes from the MLR and RMR model to neural
network model
Station
MLR
RMR
Percentage change
NN
MLR to NN
RMR to NN
Inland stations
ILD1
0.6260
0.7554
0.7984
27.5399
5.6923
ILD2
0.6490
0.7732
0.7810
20.3389
1.0088
ILD3
0.6360
0.6169
0.7853
23.4748
27.2978
ILD4
0.4300
0.4457
0.5988
39.2558
34.3504
ILD5
0.6330
0.5681
0.8114
28.1832
42.8269
ILD6
0.5630
0.5593
0.7900
40.3197
41.2479
ILD7
0.3430
0.3568
0.4961
44.6356
39.0410
ILDT
0.3840
0.4932
0.8361
117.7343
69.5255
Coastal stations
CLD1
0.6160
0.3438
0.7404
20.1948
115.3577
CLD2
0.4130
0.5538
0.5948
44.0194
7.4034
CLD3
0.8290
0.7186
0.8603
3.7756
14.1700
CLD4
0.2240
0.4748
0.7922
253.6607
66.8492
CLD5
0.3320
0.3397
0.5397
62.5602
58.8754
CLD6
0.2080
0.2231
0.4560
119.2307
104.3926
CLD7
0.4810
0.3883
0.7054
46.6528
81.6637
CLDT
0.2100
0.3742
0.5489
161.3809
46.6863
192
Table 6.14: The performance changes of MAPE from the MLR and RMR models to
the NN model
Percentage change of model
accuracy
MLR
RMR
NN
MLR to NN
RMR to NN
Inland stations
ILD1
0.1623
0.1579
0.1159
5.5390
4.9875
ILD2
0.1483
0.1555
0.1162
3.7689
4.6536
ILD3
0.1404
0.1403
0.1092
3.6296
3.6175
ILD4
0.1506
0.1765
0.1276
2.7078
5.9381
ILD5
0.1497
0.1483
0.0944
6.5036
6.3285
ILD6
0.1242
0.1778
0.0719
5.9717
12.8801
ILD7
0.1798
0.1745
0.1580
2.6579
1.9988
ILDT
0.1831
0.1901
0.0560
15.5588
16.5576
Coastal stations
CLD1
0.1766
0.1625
0.1220
6.6310
4.8358
CLD2
0.0813
0.0843
0.0638
1.9049
2.2387
CLD3
0.0839
0.0838
0.0657
1.9867
1.9756
CLD4
0.2562
0.3064
0.1118
19.4138
28.0565
CLD5
0.1729
0.1725
0.1431
3.6030
3.5529
CLD6
0.1394
0.1389
0.1232
1.8824
1.8232
CLD7
0.1301
0.1505
0.1003
3.4257
5.9094
CLDT
0.1804
0.2000
0.1456
4.2460
6.8000
As we can see from Figure 6.21, it is very clear that the NN method
constantly provides better results than regression model. Generally, the accuracy
changes in the coastal area are lower than in the inland area. For example, stations
CLD2, CLD3 and CLD6 recorded the lowest changes around 2%. Station CLD4
shows the highest changes; 19.41% for the MLR model and 28.06% for the RMR
model. Figure 6.22 shows the accuracy in estimating by three models for the coastal
area, and the bar for the NN model is fairly similar to the bar for the regression
model. Graphically, changes in accuracy are very obvious, as shown in Figure 6.23.
On the other hand in Figure 6.24, only station CLD4 shows a sizeable difference
between the regression model and the NN model.
193
95
90
85
(1-MAPE)*100
80
75
70
ILD1 ILD2
ILD3 ILD4 ILD5
ILD6 ILD7 ILDT
Stations
MLR
RMR
NN
Figure 6.21: Comparison of the accuracy of models for the inland area
100
90
80
70
60
(1-MAPE)*100 50
40
30
20
10
0
CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 CLDT
Stations
MLR
RMR
NN
Figure 6.22: Comparison of the accuracy of models for the coastal area
194
18
16
14
12
% change of 10
accuracy
8
6
4
2
0
ILD1 ILD2 ILD3 ILD4 ILD5 ILD6 ILD7 ILDT
Stations
MLR to NN
RMR to NN
Figure 6.23: The percentage changes of the model accuracy for the inland area
30
25
20
% change of
15
accuracy
10
5
0
CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 CLDT
Stations
MLR to NN
RMR to NN
Figure 6.24: The percentage changes of the models accuracy for the coastal area
195
6.7
CONCLUSION
For both experiments, the result demonstrated that the number of hidden
nodes produces an effect in the overall performance of the NN architecture. The
experiment also found that the momentum term and learning rate do not reflect their
influence in the NN’s performance. The first experiment shows that the number of
runs did not affect the NN performance, but changing the momentum term to
learning rate, due to shows a significant effect on NN performance. The number of
runs too influences NN performance.
For outliers in the training data, it has been demonstrated that modelling
accuracy decreases as the percentage-outliers and magnitude-outliers increases. It
has also been shown that the magnitude-outliers affect on modelling accuracy, and
that the relationship between the percentage-outliers and model accuracy is linear.
When the percentage-outliers is lower than 15% (even though the magnitude of
outliers may increase), the effect on model accuracy is statistically insignificant as
there are no outliers in the training data. The model’s accuracy is statistically
significant compared to having no outlier data, starting at the combination of 15% of
percentage-outliers and magnitude-outliers at all σ̂ ’s.
For outliers in the test data it has been demonstrated that modelling accuracy
decreases as the percentage-outliers and magnitude-outliers increases. The finding
that modelling accuracy decreased as the percentage of outliers increased is a
departure from the work of Bansal et al. (1993), who discussed a neural network
application that is not affected by the error rate of test data. Our research findings
are similar to work by Klein and Rossin (1999a and 1999b). One difference between
this study and the work of Bansal et al. (1993) and Klein and Rossin (1999a and
1999b) is that the magnitude of the outliers in our research is defined using variance
from the data set and has five levels, while their study was based on percentage
where only two levels were considered. Therefore, this work shows that variations in
the percentage of outliers and magnitude of outliers in the test data may affect
modeling accuracy at these higher levels.
196
Comparative study conducted between MLR, RMR and the NN model,
shows that the neural network approach outperforms both the multiple linear
regression and the robust M-regression.
197
CHAPTER 7
THE APPLICATION OF RESPONSE SURFACE ANALYSIS IN
MODELLING OIL PALM YIELD
7.1
INTRODUCTION
This chapter describes the application of response surface analysis in
modelling oil palm yield. The analysis of the response surface was conducted using
the Statistical Analysis System (SAS) version 6.12 package. The results are
demonstrated in this chapter.
7.2
RESPONSE SURFACE ANALYSIS
The purpose of using response surface analysis is to determine the optimum
level of oil palm yield using fertiliser information. Canonical analysis was used to
investigate the shape of the predicted response surface. The brief mathematical
explanation of response surface analysis is given as follows.
For each response variable, the model can be written as follows;
yi = xi′Bxi + b′xi + c′zi + εi
i = 1, 2, …, n.
where yi is the ith observation of the response variable.
xi = (xi1, xi2, …, xik)′ are the k factor variables for the ith observation.
(7.1)
198
zi = (zi1, zi2, …, zip)′ are the p covariates, including the intercept term.
B is the kxk symmetrical matrix of quadratic parameters, with diagonal
elements of equal value to the coefficient of the pure quadratic terms in the
model, and off-diagonal elements of equal value to half the coefficient of the
corresponding cross-product.
b is the kx1 vector of linear parameters.
c is the px1 vector of covariate parameters, one of which is the intercept.
εi is the error associated with the ith observation. Tests performed assume that
errors are independently and normally distributed with mean zero and
variance σ2.
The parameters in B, b and c are estimated by the least squares method. To
optimize y with respect to x, take the partial derivatives, set them to zero and solve
the equation (7.2).
∂y
= 2xB + b′ = 0 , then x = -0.5B-1b
∂x
(7.2)
Canonical analysis also known as the stationary point is used to determine
whether the solution at the stationary point has a maximum or minimum response. It
is needed to find out whether B is positive or negative definite by looking at the
eigenvalues (represented by λ in equation (3.71)) of B. If the eigenvalues are all
negative then the solution is at a maximum point, if the eigenvalues are all positive
then the solution is at a minimum point, and if the eigenvalues have mixed signs then
the solution is at the stationary point is saddle point. A detailed mathematical
explanation of the response surface analysis was given in Chapter 3.
If the estimated surface is found at a maximum or minimum point, the
analysis performed by model fitting and the canonical analysis may be sufficient
(Myers and Montgomery, 1995; Christensen, 2001; Box and Draper, 1987; SAS
1992). If the stationary point is a saddle point then the ridge analysis is proposed to
ensure the stationary point will be inside the experimental region. The result is a set
of coordinates for the maximum or minimum point, along with the predicted
response at each computed point on the path. The method of the ridge analysis
solves the estimated ridge for the optimum response and increases the radius from
199
the center of the original design. The coded radius is the distance from the ridge’s
origin.
An example of the response surface plot for fertiliser treatments in the inland
and coastal areas are presented in Figure 7.1.
ILD1
CLD2
ILD5
CLD7
Figure 7.1: The response surface plots for the fertiliser treatments in ILD1 and ILD2
stations in the inland area and CLD2 and CLD7 stations in the coastal area
200
7.3
DATA ANALYSIS
The data analysis was conducted using fertiliser treatments. The detailed
procedure of response surface analysis is depicted in Figure 7.2.
Data
Fertiliser Treatments
RSA
[Canonical Analysis]
Stationary Point
Maximum/Minimum/Saddle Point?
if saddle point
RSA
[Ridge Analysis]
if maximum
or minimum
Profit Analysis
Conclusion
Figure 7.2: Data analysis procedure obtained the optimum level of fertiliser
The SAS package provided the easy way to perform response surface analysis
via PROC RSREG procedure (SAS, 1992). The RSREG procedure allows one of
each of the following statements;
201
PROC RSREG option;
MODEL response = independents/options;
RIDGE option;
WEIGHT variable;
ID variables;
BY variables;
The PROC RSREG and MODEL statements are required. The MODEL
statement lists the dependent variable (oil palm yield) followed by an equal sign, and
then lists independent variables namely, N, P, K and Mg fertilisers. Independent
variables specified in the MODEL statement must be variables used in the data set.
A RIDGE statement specifies that the ridge of optimum response be
computed. The ridge starts at given point x0, and the point on the ridge at radius r
from x0 is a collection of factor settings that optimises the predicted response at that
radius. The ridge analysis can be used as a tool to help interpret an existing response
surface or to indicate the direction in which further experimentation should be
performed. A BY statement can be used with PROC RSREG to obtain separate
analyses on observations in groups defined by the BY variable. When it is stated in
the programming, the procedure expects the input data set to be sorted in order of the
BY variables. The ID statement names variables that are to be transferred to the
created data set, which contains statistics for each observation. The WEIGHT
statement names a numeric variable in the input data set.
7.4
NUMERICAL EXAMPLE
Analysis of the fertiliser treatments using response surface analysis was
conducted for each station. The discussion on the findings will be divided into the
canonical analysis and the ridge analysis for the fertiliser treatments. The stationary
202
point was identified to determine its turning points: either a maximum, a minimum or
a saddle. The ridge analysis is introduced if the stationary point is a saddle point.
7.4.1 Canonical Analysis for Fertiliser Treatments
The summary of the response surface analysis, which provides the values of
MSE, RMSE and R2, is presented in Table 7.1 for inland areas and Table 7.2 for
coastal areas. The average of the FFB yield is considered as the mean of the FFB
yield in tonnes, and R2 represents the variance explained by the exploratory variables
or factors. The average FFB yield in the inland area was recorded as between
21.4925 tonnes/hectare/year to 29.2185 tonnes/hectare/year. While in the coastal
areas, the average FFB yield was recorded within the range of 26.766
tonnes/hectare/year to 31.4014 tonnes/hectare/year. The highest R2 value was
recorded at station CLD3 (0.7613) and was followed by station CLD4 (0.5972).
Both stations are located in the coastal area.
Table 7.1: The average of FFB yield, MSE, RMSE and R2 values for inland area
Station
Average FFB yield
MSE
RMSE
R2
(tonnes/hectare/year)
ILD1
23.7382
15.7966
3.9744
0.5802
ILD2
23.5865
22.6885
4.7632
0.4322
ILD3
26.7956
15.2171
3.9009
0.5330
ILD4
26.4915
8.5504
2.9241
0.5660
ILD5
29.2185
6.5911
2.5673
0.4532
ILD6
23.5196
17.0863
4.1335
0.4638
ILD7
21.4925
15.8995
3.9874
0.4529
203
Table 7.2: The average of FFB yield, MSE, RMSE and R2 values for coastal area
Station
Average FFB yield
MSE
RMSE
R2
(tonnes/hectare/year)
CLD1
28.1731
8.82712
2.9710
0.5696
CLD2
28.2383
11.2728
3.3575
0.5858
CLD3
26.6933
7.0654
2.6581
0.7613
CLD4
30.3716
4.3348
2.0820
0.5972
CLD5
26.7660
16.4810
4.0596
0.5354
CLD6
30.0687
8.1174
2.8491
0.5214
CLD7
31.4014
13.6416
3.6934
0.5316
All the important results, including the eigenvalues, critical values, predicted
FFB yield values at the stationary points and concluding remarks of stationary points
are presented in Table 7.3. As shown in Table 7.3, station ILD1 has all negative
values of λ, λ1 = -0.3439, λ2 = -0.9395, λ3 = -2.3165 and λ4 = -4.5217, thus the
stationary point is a maximum point. The results from these findings indicated that
5.148 kg of the N fertiliser, 2.7054 kg of the P fertiliser, 3.2195 kg of the K fertiliser
and 2.8126 kg of the Mg fertiliser are needed to achieve the maximum level of FFB
yield of 30.1826 tonnes per hectare per year. On the other hand, station ILD3
recorded λ values of the N, P and K fertiliser at 0.8696, -0.6600 and -1.7946
respectively. It shows that the stationary point for ILD3 is a saddle point as the λ
values recorded mixed signs. As presented in the Table 7.2, we found that the
stationary points for all stations were saddle except for ILD1 and ILD2.
204
Table 7.3: The eigenvalues, predicted FFB yield at the stationary points, and critical
values of fertiliser level in the inland area
Station/
Eigenvalue
Critical
fertiliser
(λ)
value
Predicted FFB
yield at stationary
Concluding
remarks
point
N
-0.3439
5.1480
ILD1 P
-0.9395
2.7054
K
-2.3165
3.2195
Mg
-4.5217
2.8126
N
-0.9069
6.4000
ILD2 K
-2.2656
6.3864
N
0.8696
6.2525
ILD3 P
-0.6600
0.9165
K
-1.7946
3.1341
N
1.6003
5.1652
ILD4 P
0.8658
2.0342
K
-0.2154
4.8463
Mg
-1.0857
2.0679
N
1.6615
1.1063
ILD5 P
-0.9745
0.9999
K
-1.1204
2.9753
N
0.7537
7.5114
ILD6 P
0.3066
3.0042
K
-0.2191
2.4278
Mg
-1.7060
0.6666
N
1.3554
5.8538
ILD7 P
-0.1517
4.7872
K
-0.8699
4.3905
Mg
-4.2806
1.7877
30.1826
Maximum
point
Maximum
27.3469
point
28.0684
Saddle point
26.2051
Saddle point
29.9558
Saddle point
24.6765
Saddle point
30.7051
Saddle point
The summary of the response surface analysis for the coastal area is shown in
Table 7.4 and Table 7.5. In CLD6 station for example, the eigenvalues of the eigen
205
vector are λ1 = 0.4646, λ2 = -0.0116 and λ3 = - 5.0618. The signs of the eigenvalues
are mixed, thus the stationary point is shown to be a saddle point. This occurred in
all the coastal stations in this study. The estimated FFB yield at the station CLD6
was 29.2281 tonnes per hectare per year and it corresponded with 3.9034 kg of the N
fertiliser, 0.1187 kg of the P fertiliser and 6.5243 kg of the K fertiliser.
The canonical analysis indicated that the predicted response surface was
shaped as saddle at station CLD6. The eigenvalue of the N fertiliser, 0.4646 shows
that the valley orientation of the saddle point is less curved than the hill orientation
with the K concentration eigenvalues of -5.0618. The negative sign of the
eigenvalues for P and K factors indicated the directions of downward curvature. The
larger eigenvalue of absolute is for the K factor; it means that the K factor is more
pronounced and the curvature of the response surface is in the associated direction.
The surface is more sensitive to the changes in K, compared to factors N and P. For
detailed numerical results of the critical value and predicted value at the stationary
points, refer to Table 7.4 and Table 7.5.
Table 7.4: The eigenvalues, the predicted FFB yield at the stationary points, and
critical values of fertiliser level for CLD1 and CLD2 stations
Station/
Eigenvalue
Critical
Predicted FFB yield at
Concluding
fertiliser
(λ)
value
the stationary point
remarks
N
1.1058
0.2328
CLD1 P
1.0041
6.8132
28.5310
Saddle
K
-0.1793
4.4443
Mg
-0.4926
1.2246
N
0.2903
1.9312
CLD2 P
-0.3234
3.4270
K
-0.7052
0.5436
Mg
-1.8375
2.3214
point
30.2809
Saddle
point
206
Table 7.5: The eigenvalues, the predicted FFB yield at the stationary points, and
critical values of fertiliser level for CLD3, CLD4, CLD5, CLD6 and CLD7 stations
Station/
Eigenvalue
Critical
Predicted FFB yield at
Concluding
fertiliser
(λ)
value
the stationary point
remarks
N
1.4988
6.3486
CLD3 P
0.0634
2.9778
29.9374
Saddle
K
-0.6883
2.0066
Mg
-3.1878
1.6929
N
0.5961
2.3739
CLD4 P
-0.1260
2.0859
K
-0.6599
4.5873
Mg
-0.6916
2.1789
N
4.9445
2.8598
CLD5 P
1.2246
3.2396
K
-0.8439
6.2467
Mg
-3.7785
5.4442
N
0.4646
3.9034
CLD6 P
-0.0116
0.1187
K
-5.0618
6.5243
N
1.3496
5.3768
CLD7 P
-1.0850
0.6474
K
-1.1862
4.5479
point
30.9816
Saddle
point
25.9715
Saddle
point
29.2281
Saddle
point
32.5936
Saddle
point
7.4.2 Ridge Analysis for Fertiliser Treatments
The estimated responses of the FFB yield at certain radii and the fertiliser
levels for the inland stations are presented in Table 7.6 to Table 7.8. To illustrate the
result of the ridge analysis, we only considered the station, which have saddle points
at their stationary points. As mentioned earlier, ridge analysis is used to find the
optimum value of FFB yield when the canonical analysis indicated the stationary
207
point is a saddle point. The stations of ILD1 and ILD2 are not discussed in section
7.4.2.
The Mg fertiliser is not used in the experiment at stations ILD3 and ILD5. At
station ILD3, the estimated FFB yield is 29.7351 tonnes per hectare per year at
radius 0.0, and corresponded with 4.7800 kg of N fertiliser, 0.9000 kg of P fertiliser
and 4.1000 kg of K fertiliser. When the radius was increased from 0.0 to 0.5, the
estimated FFB yield also increased to 30.3281 tonnes per hectare per year. There
were also increments of fertiliser inputs to 5.5958 kg of N fertiliser, 1.0560 kg of P
fertiliser and 5.7555 kg of K fertiliser. When the radius reached its maximum value,
the estimated FFB yield was recorded at 31.1853 tonnes per hectare per year, and it
corresponded with 5.6887 kg of N fertiliser, 1.1711 kg of P fertiliser and 7.8547 kg
of K fertiliser.
For station ILD4 at radius 0.0 the results suggest that given the annual
application of 4.000 kg of N fertiliser, 1.8750 kg of P fertiliser, 4.2000 kg of K
fertiliser and 1.4400 kg of Mg fertiliser, the average estimated FFB yield is 25.8575
tonnes per hectare. The increased radius, from 0.0 to 1.0, caused the average
estimated FFB yield to increase from 25.8575 tonnes per hectare to 27.7434 tonnes
per hectare. The N, P and K fertilisers also increased to 4.2611 kg, 2.1026 kg and
6.5259 kg respectively. Whereas, the Mg fertiliser decreased from 1.44 kg to 1.1745
kg. The details of the estimated FFB yield and corresponding fertiliser levels at
certain radii for other stations are presented in Tables 7.6 to 7.8.
208
Table 7.6: The estimated FFB yield and fertiliser level at certain radii for stations
ILD3 and ILD4 in the inland area
Station
ILD3
ILD4
Fertiliser Level (kg/palm/year)
Estimated FFB yield
Radius
N
P
K
Mg
(tonnes/hectare/year)
0.0
4.7800
0.900
4.1000
-
29.7351
0.1
5.081
0.9358
4.2057
-
29.8744
0.2
5.3312
0.9707
4.4629
-
29.9887
0.3
5.4781
1.0025
4.8707
-
30.0938
0.4
5.5525
1.0305
5.3158
-
30.2047
0.5
5.5958
1.0560
5.7555
-
30.3281
0.6
5.6247
1.0800
6.1862
-
30.4667
0.7
5.6461
1.1035
6.6097
-
30.6212
0.8
5.6629
1.1264
7.0282
-
30.7923
0.9
5.6768
1.1488
7.4428
-
30.9803
1.0
5.6887
1.1711
7.8547
-
31.1853
0.0
4.0000
1.8750
4.200
1.4400
25.8575
0.1
4.2044
1.8790
4.3248
1.4542
25.9351
0.2
4.3186
1.8998
4.5724
1.4448
26.0419
0.3
4.3397
1.9274
4.8514
1.4133
26.1475
0.4
4.3373
1.9539
5.1096
1.3790
26.2814
0.5
4.3284
1.9795
5.3557
1.3446
26.4460
0.6
4.3168
2.0046
5.5953
1.3104
26.6420
0.7
4.3039
2.0294
5.8310
1.2763
26.8696
0.8
4.2901
2.0539
6.0641
1.2423
27.1290
0.9
4.2758
2.0783
6.2956
1.2083
27.4202
1.0
4.2611
2.1026
6.5259
1.1745
27.7434
209
Table 7.7: The estimated FFB yield and fertiliser level at certain radii for stations
ILD5 and ILD6 in the inland area
Fertiliser Level (kg/palm/year)
Estimated FFB
yield
Station
ILD5
ILD6
Radius
N
P
K
Mg
(tonnes/hectare/year
0.0
1.365
2.275
2.275
-
29.5075
0.1
1.4014
2.1037
2.4119
-
29.6488
0.2
1.4936
1.9752
2.5417
-
29.7814
0.3
1.6307
1.9119
2.6462
-
29.9233
0.4
1.7757
1.8845
2.7298
-
30.0891
0.5
1.9188
1.8712
2.803
-
30.2842
0.6
2.0595
1.8646
2.8706
-
30.5106
0.7
2.1983
1.8615
2.9352
-
30.7691
0.8
2.3358
1.8604
2.9977
-
31.0602
0.9
2.4723
1.8606
3.0589
-
31.3840
1.0
2.6081
1.8619
3.1192
-
31.7407
0.0
4.67
1.875
3.975
2.08
24.4806
0.1
4.9085
1.8581
4.0471
2.1299
24.7152
0.2
5.1234
1.8453
4.1449
2.1918
24.9275
0.3
5.3048
1.8384
4.2665
2.2679
25.1228
0.4
5.4476
1.8388
4.4022
2.3587
25.3072
0.5
5.5539
1.8479
4.539
2.4628
25.4866
0.6
5.6308
1.8661
4.6654
2.5774
25.6659
0.7
5.6859
1.8934
4.7744
2.6996
25.8489
0.8
5.7256
1.9291
4.8636
2.8265
26.0384
0.9
5.7546
1.9716
4.9333
2.9559
26.2365
1
5.7761
2.0197
4.9856
3.0861
26.4447
210
Table 7.8: The estimated FFB yield and fertiliser level at certain radii for station
ILD7
Fertiliser Level (kg/palm/year)
Station
ILD7
Radius
N
Estimated FFB yield
P
K
Mg
(tonnes/hectare/year)
0.0
4.325
2.275
3.4750
1.7400
24.4803
0.1
4.5706
2.3088
3.4393
1.7512
24.7287
0.2
4.8201
2.3354
3.4304
1.7757
24.9778
0.3
5.0655
2.3536
3.4351
1.8127
25.2337
0.4
5.3021
2.3635
3.447
1.8623
25.5011
0.5
5.5273
2.3664
3.4628
1.9225
25.7835
0.6
5.7406
2.3636
3.4806
1.9909
26.0841
0.7
5.9427
2.3564
3.4994
2.065
26.4050
0.8
6.1351
2.3461
3.5186
2.1431
26.7479
0.9
6.3192
2.3333
3.538
2.2241
27.1141
1.0
6.4966
2.3188
3.5574
2.3068
27.5043
The results of the ridge analysis of fertiliser treatments in the coastal areas are
presented in Tables 7.9 to 7.12. The estimated FFB yield at radius 0.1 for station
CLD1 is 27.4295 tonnes per hectare per year. Therefore it needed 1.8872 kg of N
fertiliser, 1.8940 kg of P fertiliser, 2.9434 kg of K fertiliser and 1.7663 kg of Mg
fertiliser to achieve this level. The increment of radius from 0.1 to 0.5 has resulted in
the increment of the estimated FFB yield to 28.2428 tonnes per hectare per year, and
corresponded with 2.2825 kg of N fertiliser, 2.0209 kg of P fertiliser, 3.8199 kg of K
fertiliser and 1.6061 kg of Mg fertiliser. The maximum value of the estimated FFB
yield was 29.6934 ton per hectare per year, when the radius reached the maximum
value of 1.0. The fertiliser levels required were also increased to 2.8791 kg for N
fertiliser, 2.0007 kg for P fertiliser, 4.8655 kg for K fertiliser and 1.4579 kg for Mg
fertiliser. However, at station CLD2, the small increase in the estimated average of
FFB yield from 32.3348 tonnes per hectare per year at radius 0.0 to 32.5424 tonnes
per hectare per year, was recorded at the maximum radius of 1.0. The fertiliser level
required were also increased, from 1.8200 kg to 2.8791 kg for N fertiliser, P fertiliser
increased from 1.8200 kg to 3.0549 kg, K fertiliser increased from 1.3600 kg to
2.0685 kg and Mg fertiliser also recorded an increment from 1.8200 kg to 2.5049 kg.
211
It shows that the needs of N, P, K and Mg fertilisers at CLD2 station are quite
comparable. Mg fertiliser was not applied at stations CLD6 and CLD7.
Table 7.9: The estimated FFB yield and fertiliser level at certain radii for stations
CLD1 and CLD2 in the coastal area
Station
CLD1
CLD2
Fertiliser Level (kg/palm/year)
Estimated FFB yield
Radius
N
P
K
Mg
(tonnes/hectare/year)
0.0
1.8200
1.8200
2.7300
1.8200
27.2633
0.1
1.8872
1.8940
2.9434
1.7663
27.4295
0.2
1.9709
1.9481
3.1626
1.7194
27.6082
0.3
2.0670
1.9850
3.3833
1.6779
27.8022
0.4
2.1718
2.0082
3.6028
1.6405
28.0132
0.5
2.2825
2.0209
3.8199
1.6061
28.2428
0.6
2.3975
2.026
4.0342
1.5739
28.4919
0.7
2.5152
2.0253
4.2457
1.5433
28.7111
0.8
2.6351
2.0202
4.4546
1.514
29.0508
0.9
2.7565
2.0117
4.6611
1.4856
29.3615
1.0
2.8791
2.0007
4.8655
1.4579
29.6934
0.0
1.8200
1.8200
1.3600
1.8200
32.3348
0.1
1.8978
1.9413
1.4383
1.7833
32.3944
0.2
1.9726
2.0687
1.5164
1.7606
32.4438
0.3
2.0458
2.2006
1.5942
1.7575
32.4832
0.4
2.1176
2.3356
1.6716
1.7806
32.5132
0.5
2.1871
2.4711
1.7476
1.8362
32.5345
0.6
2.2531
2.6031
1.8208
1.9263
32.5480
0.7
2.314
2.7284
1.8897
2.0464
32.5547
0.8
2.3696
2.8452
1.9537
2.188
32.5555
0.9
2.4204
2.9536
2.0131
2.3428
32.5512
1.0
2.4672
3.0549
2.0685
2.5049
32.5424
212
Table 7.10: The estimated FFB yield and fertiliser level at certain radii for stations
CLD3 and CLD4 in the coastal area
Station
CLD3
CLD4
Fertiliser Level (kg/palm/year)
Estimated FFB yield
Radius
N
P
K
Mg
(tonnes/hectare/year)
0.0
3.6400
1.8200
3.6400
1.6200
28.2359
0.1
4.0036
1.8121
3.6367
1.8200
28.6748
0.2
4.3673
1.8053
3.6425
1.8256
29.0506
0.3
4.7298
1.8003
3.6652
1.8450
29.3638
0.4
5.081
1.8008
3.7274
1.9124
29.6169
0.5
5.3372
1.8192
3.8738
2.1270
29.8253
0.6
5.4454
1.8477
4.0349
2.4012
30.0304
0.7
5.5045
1.8741
4.1720
2.6447
30.2560
0.8
5.5479
1.8986
4.2952
2.8671
30.5079
0.9
5.5844
1.9218
4.4104
3.0766
30.7880
1.0
5.6169
1.9442
4.5206
3.2781
31.0969
0.0
3.6400
1.8200
3.6400
1.8200
30.9592
0.1
3.8391
1.9564
3.6175
1.8868
31.0200
0.2
4.1241
2.0637
3.5584
1.9331
31.0846
0.3
4.4415
2.1512
3.4809
1.9663
31.1577
0.4
4.7692
2.2279
3.3953
1.9927
31.2409
0.5
5.1002
2.2984
3.3058
2.0153
31.3352
0.6
5.4324
2.3654
3.2141
2.0356
31.4408
0.7
5.7648
2.4299
3.1211
2.0544
31.5581
0.8
6.0973
2.4929
3.0272
2.0722
31.6869
0.9
6.4297
2.5547
3.9327
2.0893
31.8276
1.0
6.7619
2.6157
2.8378
2.1059
31.9800
213
Table 7.11: The estimated FFB yield and fertiliser level at certain radii for stations
CLD5 and CLD6 in the coastal area
Fertiliser Level (kg/palm/year)
Station
CLD5
CLD6
Estimated FFB yield
Radius
N
P
K
Mg
(tonnes/hectare/year)
0.0
2.7300
4.5500
9.1000
4.5500
26.0478
0.1
2.8881
4.7353
8.9110
4.2429
26.2357
0.2
3.0149
4.7345
8.8843
3.8037
26.4774
0.3
3.1188
4.6571
8.8651
3.3592
26.8602
0.4
3.2131
4.5551
8.8422
2.9228
27.2298
0.5
3.3027
4.4429
8.8168
2.4925
27.7504
0.6
3.3897
4.3256
8.7898
2.0661
28.369
0.7
3.4752
4.2053
8.7616
1.6423
29.0858
0.8
3.5596
4.0831
8.7328
1.2204
29.9012
0.9
3.6433
3.9596
8.7034
0.7998
30.8152
1.0
3.7266
3.8353
8.6736
0.3801
31.8279
0.0
2.7250
1.3650
3.4100
-
32.6262
0.1
2.9364
1.3988
3.2121
-
33.0906
0.2
3.1074
1.4489
2.9718
-
33.5074
0.3
3.2377
1.5115
2.7024
-
33.8947
0.4
3.3357
1.5816
2.4183
-
34.2664
0.5
3.4111
1.6556
2.1284
-
34.6314
0.6
3.4712
1.7317
1.8368
-
34.9953
0.7
3.5207
1.8088
1.5453
-
35.3614
0.8
3.5628
1.8864
1.2545
-
35.7317
0.9
3.5995
1.9642
0.9646
-
36.1077
1.0
3.6321
2.0422
0.6756
-
36.4902
214
Table 7.12: The estimated FFB yield and fertiliser level at certain radii for station
CLD7 in the coastal area
Station
CLD7
7.5
Fertiliser Level (kg/palm/year)
Estimated FFB yield
Radius
N
P
K
Mg
(tonnes/hectare/year)
0.0
2.7250
1.8200
3.4100
-
31.4923
0.1
2.9651
1.8948
3.4892
-
31.7818
0.2
3.1851
1.9985
3.5571
-
32.0639
0.3
3.3774
2.1309
3.611
-
32.3452
0.4
3.5395
2.2864
3.6506
-
32.6327
0.5
3.6743
2.4568
3.6781
-
32.9326
0.6
3.7877
2.6356
3.6965
-
33.2493
0.7
3.8852
2.8185
3.7084
-
33.5861
0.8
3.9711
3.0035
3.7158
-
33.9447
0.9
4.0482
3.1891
3.7199
-
34.3267
1.0
4.1189
3.3749
3.7216
-
34.7329
ECONOMIC ANALYSIS
In addition to the statistical analysis, an economic analysis should be carried
out to determine the point at which the total profit of the oil palm yield is at the
highest level (Nelson, 1997). The economic analysis is purposely focused on gaining
the optimum level of fertiliser. As discussed earlier, ridge analysis will give several
optimum solutions based on the estimated FFB yield and the fertiliser level of the N,
P, K and Mg at certain radii. Thus, an economic analysis is required to obtain the
optimum profit in oil palm yield modelling.
To obtain the maximum profit in oil palm yield production, four types of
fertilisers are considered, namely, nitrogen (N), phosphorus (P), potassium (K) and
magnesium (Mg). These fertilisers are the most needed in oil palm yield.
215
Let the cost of fertiliser (RM per ton/hec/year) at a certain radius, Ci be given
as;
Ci = aiNp + biPp + ciKp + diMgp
for i = 1, 2, ..,11
(7.3)
where a, b, c and d are the weights for N, P, K and Mg fertilisers (measured in kg per
palm per year) respectively, derived form the ridge analysis, and Np, Pp, Kp and Mgp
are the prices (per tonnes)of fertiliser N, P, and K respectively. Since the FFB yield
is measured in tonnes per hectare per year, we also converted the cost and total profit
(TP) into RM per hectare per year. The total income per hectare per year at a certain
radius, Hi, is then given as
Hi = Eri*Yp
for i = 1, 2, …, 11
(7.4)
where Eri is expected FFB yield at radius i, and Yp is the yield price.
Therefore, the total profit TP can be formulated as;
TPi = Hi - Ci
for i = 1, 2, …, 11
(7.5)
Thus, we can determine the optimum fertilisers which can be used to achieve a high
TP.
7.5.1 Profit Analysis
Based on the fertiliser prices in Januari 2005, the price of the ammonium
sulphate (AS) was RM720 per ton, christmas island rock phosphate (CIRP) was RM
440 per ton, murate of potash (MOP) was RM1040 per ton and kieserite (Mg) was
RM729 per ton. The average price of FFB yield in Januari 2005 was about RM
288.00 per ton. Assuming that other costs such as management costs is constant. A
very simple calculation was conducted to obtain the optimum profit from several
radius levels. The total profit for each station is summarised in Table 7.13. This will
help the policy makers identify the maximum profit in the oil palm yield. The details
of the calculation in finding the total profit are shown in Appendix T for the inland
area and Appendix U for the coastal area.
216
The results suggested that, given an annual application of 5.148 kg for N
fertiliser, 2.7054 kg for P fertiliser, 3.2196 kg for K fertiliser and 2.8126 kg for Mg
fertiliser, palms grown in Bungor series soil (ILD2) are capable of producing an
average FFB yield 30.1826 tonnes per hectare and of making a total profit of
RM7254.73. The Renggam soils series (ILD3) needed the combination of 4.78 kg of
N fertiliser, 0.9 kg of P fertiliser and 4.1000 kg of K fertiliser in order to produce an
average FFB yield of 29.7351 tonnes per hectare. The highest total profit for the
inland stations is RM8309.58 at station ILD5 (Pohoi series soil). It has been
suggested that palms grown in Pohoi series soil are more suitable for producing at
the maximum FFB yield with the minimum cost of fertiliser.
Given the combination of 2.8791 kg of N fertiliser, 2.0007 kg of P fertiliser,
4.8655 of K fertiliser and 1.4579 kg of Mg fertiliser at Carey series soils (station
CLD1), 29.6934 tonnes per hectare of FFB yield could be produced with a total
profit of RM7282.87. A combination of the fertilisers suggested at 5.6169 of N
fertiliser, 1.9442 of P fertiliser, 4.5206 of K fertiliser and 3.2781 kg of Mg fertiliser,
made the average FFB yield of 31.0969 tonnes per year at CLD3 (Briah series soil).
As shown in Table 7.11, RM9918.89 appears to be the highest total profit for the
coastal area at station CLD6 (Briah series soil).
217
Table 7.13: The fertiliser level, average estimated FFB yield and total profit for the
inland and coastal areas
Fertiliser Level (kg/palm/year)
N
P
K
Mg
Estimated FFB yield
Total Profit
(tonnes/hectare/year)
(RM)
Inland stations
ILD3
4.7800 0.9000
4.1000
*
29.7351
7429.48
ILD4
4.2611 2.1026
6.5259
1.1745
27.7434
7250.29
ILD5
2.6081 1.8619
3.1192
*
31.7407
8309.58
ILD6
5.7761 2.0197
4.9856
3.0861
26.4447
5872.45
ILD7
6.4966 2.3188
3.5574
2.3068
27.5043
6373.06
Coastal stations
CLD1
2.8791 2.0007
4.8655
1.4579
29.6934
7282.87
CLD2
1.8200 1.8200
1.3600
1.8200
32.5558
7939.66
CLD3
5.6169 1.9442
4.5206
3.2781
31.0969
7281.32
CLD4
3.6400 1.8200
3.6400
1.8200
30.9592
7723.78
CLD5
3.7266 3.8353
8.6736
0.3801
31.8279
7253.34
CLD6
3.6321 2.0422
0.6756
*
36.4902
9918.89
CLD7
4.1189 3.3749
3.7216
*
34.7329
8838.13
* Note: Mg fertiliser was not used in these treatments.
7
6
5
Fertiliser level 4
(kg/palm/year) 3
2
1
0
ILD1
ILD2
ILD3
ILD4
ILD5
ILD6
ILD7
Inland stations
N
P
K
Mg
Figure 7.3: The optimum fertiliser level for each station in the inland area
218
After determining the optimum level of fertilisers and the maximum profit for
each station, we must perform a comparison of the optimum fertiliser needed for
each station. Figure 7.3 depicts the summary of the fertilisers required by oil palms.
It is obvious that the predominantly required fertilisers for oil palm are the N and K
fertilisers. The recordings of the ILD1 (Bungor series soil), ILD3 (Munchong series
soil), ILD6 (Batu Anam series soil) and ILD7 (Durian series soil) stations disclosed a
need for the N fertiliser to be higher than the K fertiliser. While at the stations ILD4
and ILD5, where the soil series are Batu Anam and Pohoi respectively, the K
fertiliser recorded higher than the N fertiliser. The levels of the N and K fertilisers
needed at station ILD2 (Renggam series soil) are almost the same. In general, the
need for P and Mg fertilisers in the inland area are less than that for the N and K
fertilisers.
9
8
7
6
Fertiliser level 5
(kg/palm/year) 4
3
2
1
0
CLD1
CLD2
CLD3
CLD4
CLD5
CLD6
CLD7
Coastal stations
N
P
K
Mg
Figure 7.4: The optimum fertiliser level for each station in the coastal area
Figure 7.4 illustrates the need for fertiliser by oil palm in the coastal area. As
in the inland area, the N and K fertilisers are the dominant fertiliser required by oil
palms. Stations CLD3 (Briah series soil), CLD4 (Sedu series soil), CLD6 (Briah
series soil) and CLD7 (Briah series soil) recorded a higher need for N fertiliser than
219
for K fertiliser. The palms grown on these Carey series soil (CLD1 and CLD5)
revealed that recorded the K fertiliser was needed more than the N fertiliser.
However, the needs for the N and K fertilisers are quite similar in station CLD2
where the soil series is Selangor.
The foliar nutrient composition levels and the average estimate for the FFB
yield giving the maximum profit are shown in Table 7.14, and displayed graphically
in the Figure 7.5. The findings show that the combination of foliar nutrient
composition 2.5303% of N, 0.1698% of P, 1.0855% of K, 0.5757% of Ca and
0.3562% of Mg are capable of producing an average FFB yield in the ILD1 station
(Bungor series soil) of 30.1826 tonnes per hectare. The results from Table 7.12
suggest that the Renggam series soil (ILD2) produced an estimated FFB yield of
27.3469 tonnes per hectare when the composition of the combination of N, P, K, Ca
and Mg nutrient composition are 2.3398, 0.1677, 0.6858, 0.6957 and 0.2936%
respectively.
Table 7.14: The estimated FFB yield and the foliar nutrient composition level in (%)
for the inland area
Foliar nutrient composition (%)
Station
Estimated FFB yield
N
P
K
Ca
Mg
(tonnes/hectare/year)
ILD1
30.1826
2.5303
0.1698
1.0855
0.5757
0.3562
ILD2
27.3469
2.3398
0.1677
0.6858
0.6957
0.2936
ILD3
29.7351
2.6841
0.1672
1.2257
0.8615
0.1941
ILD4
27.7434
3.0223
0.1695
1.0470
0.6823
0.6548
ILD5
31.7406
2.9331
0.1693
1.1392
0.7418
0.2681
ILD6
26.4447
2.6869
0.1685
0.7642
0.7668
0.2744
ILD7
27.5043
2.6264
0.1517
0.9626
0.5795
0.5365
Figure 7.5 shows that for all the inland stations, the N concentration is
consistently the highest followed by the K concentration. The result is consistent
with the findings of the N and K fertiliser cumulative level needed by the oil palm
per year. The P concentration recorded the lowest level in the foliar analysis when
220
compared to the other nutrients. The sequence of the foliar nutrient composition in
ascending order is the P concentration, Mg concentration, Ca concentration, K
concentration and lastly the N concentration (P < Mg < Ca < K < N).
3.5
3
2.5
2
Foliar
composition (%) 1.5
1
0.5
0
ILD1
ILD2
ILD3
ILD4
ILD5
ILD6
ILD7
Inland stations
N
P
K
Ca
Mg
Figure 7.5: The foliar nutrient composition levels for each station in the inland area
The average estimate of the FFB yield and the foliar nutrient composition for
the coastal area is presented in Table 7.15. For station CLD1 (Carey series soil), the
average estimated FFB yield gave the maximum profit of 29.6934 tonnes per hectare
when the combination of the foliar nutrient composition was 2.6222% of N, 0.162%
of P, 0.9306% of K, 0.4821% of Ca and 0.246% of Mg. Sedu soils series (CLD4)
estimated the FFB yield to be 30.9592 tonnes per hectare, and the composition of the
N, P, K, Ca and Mg concentrations to be 2.6418, 0.1569, 0.8810, 0.4948 and
0.7641%, respectively. Even though the soil series are the same, the foliar nutrient
compositions and the average estimated FFB yield are different depending on the
local soil and climatic factors (Foster 1995). Station CLD3, CLD6 and CLD7 are all
from Briah soil series, but the foliar nutrient compositions are different from each
other.
The foliar nutrient composition for all the stations in the coastal area is
represented graphically in Figure 7.6. The N concentration recorded the highest
composition in all the coastal stations. These results are similar to the inland area,
where the nutrient concentration varies from one station to the next even though the
soil series are the same.
221
Table 7.15: The estimated FFB yield and the foliar nutrient composition levels in (%)
for the coastal area
Foliar nutrient composition (%)
Station
Estimated FFB yield
N
P
K
Ca
Mg
(tonnes/hectare/year)
CLD1
29.6934
2.6222
0.1620
0.9306
0.4821
0.2460
CLD2
32.5558
2.5565
0.1536
0.8321
0.5036
0.3656
CLD3
31.0969
2.5646
0.1618
0.6196
0.4863
0.4425
CLD4
30.9592
2.6418
0.1569
0.8810
0.4948
0.7641
CLD5
31.8279
2.6689
0.1482
0.9366
0.5857
0.3226
CLD6
36.4902
2.3310
0.1494
0.7438
0.6728
0.4042
CLD7
34.7329
2.5536
0.1495
0.8503
0.7208
0.2641
3
2.5
2
Foliar
1.5
composition (%)
1
0.5
0
CLD1
CLD2
CLD3
CLD4
CLD5
CLD6
CLD7
Coastal stations
N
P
K
Ca
Mg
Figure 7.6: The foliar nutrient composition levels for each station in the coastal area
222
9
8
7
6
N and K fertiliser 5
(kg/palm/year) 4
3
2
1
0
CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 ILD1 ILD2 ILD3 ILD4 ILD5 ILD6 ILD7
Stations
N
K
Figure 7.7: Comparison between the N and K fertiliser level needs by oil palm for
the coastal and inland areas
Illustrations of the different levels required of N and K fertilisers, for all
stations in the inland and coastal areas, are depicted in Figure 7.7. The Carey series
soil (CLD1 and CLD5) and Batu Anam series soil (ILD4) needed more K fertiliser
compared to N fertiliser. The rest showed that the N fertiliser is needed more for oil
palm. The absence of P and Mg fertiliser means that the same level of the N and the
K fertilisers are needed (shown in station ILD2, where the soil series is Renggam).
7.6
CONCLUSION
The results discussed on the previous section clearly indicated that the R2
value for fertiliser treatments is comparable to the study done by Ahmad Tarmizi et
al. (1986; 1991) and Green (1976). The canonical analysis for the fertiliser levels
identified the stations ILD1 and ILD2 to be at the maximum point. Other inland
stations are marked at the saddle points. The ridge analysis disclosed of the optimal
level of the estimated FFB yield. The inland areas are expected to produce around
26 to 31 tonnes of FFB yield per hectare per year.
223
Meanwhile, the estimated FFB yield is around 29 to 36 tonnes per hectare per
year and can be produced in coastal areas, which are higher than inland areas. These
findings are similar to the research done by Foster (1995) and Ahmad Tarmizi et al.
(1999). In terms of profit, the total profit can be obtained with the optimal level of
fertilisers. The foliar combination found in this study is in the range of the optimal
level suggested by Foster and Chang (1977). The fertiliser level needed by the oil
palm is different among the experimental stations, even though they are grown in the
same series soil. The nutrients in the soil and the climate also appeared to be other
factors which affected the production of FFB yield (Foster et al., 1987; Foster 1995;
Soon and Hong 2001).
224
CHAPTER 8
SUMMARY AND CONCLUSION
8.1
INTRODUCTION
This chapter summarises the study of the modelling of oil palm yield and then
discusses some important results and findings. We draw some conclusions from
these observations, and examine some of possible directions of future research into
the modelling of oil palm yield. This chapter begins by summarising the results, and
discussing the findings from which some conclusions were developed.
8.2
RESULTS AND DISCUSSION
Discussion on the results starts with the initial exploratory study in section
8.2.1. In this study, the nonlinear growth model and the MLR were discussed. The
application of the neural networks and how this approach overcomes the MLR
weakness are touched in the section 8.2.1. A comparative study on neural networks
and the multiple linear regression approach is highlighted before the conclusion is
made in section 8.2.2. In the section 8.2.3, a discussion will be done on the response
surface analysis results.
225
8.2.1
Initial Exploratory Study
This study is divided into four main types of modelling approach. This is an
exploratory study which required the author to look into current practices of yield
growth modelling in plants. In the first stage, this study explored the use of the
nonlinear growth model, followed by the use of multiple linear regression and robust
M-regression models. The multiple linear regression model is commonly used in the
modelling of oil palm yield, but due to the lack of knowledge among practitioners it
was found that nonlinear growth modelling was not widely explored. Inadequacy in
the ability to model results in poor management and decisions which hinder the
improvement of oil palm yield.
Modelling using the nonlinear growth model demonstrated that nonlinear
models were suitable to model the yield growth data. The comparative studies
among the models were performed based on the MSE, RMSE, MAE, MAPE and
correlation results and were presented in Table 8.1. The findings showed that the
Logistic, von Bertalanffy, Richard’s and Stannard growth models produced
minimum values of MAPE. However, the Von Bertalanffy, Richard’s and Stannard
growth models were found to be statistically insignificant to fit the oil palm yield
growth data because the zero value is included in the asymptotic confidence interval
of the parameter’s estimate. Based on the MAPE values, the Logistic model was
selected as the best model, followed by the Gompertz, Morgan-Mercer-Flodin,
Chapman-Richard (with initial stage) and Log-logistic growth models respectively.
Literature reviews on oil palm modelling using various approaches provided
us with guidelines on the direction of the study. Reviews on techniques used for
modelling oil palm yield showed that multiple linear regression was the most popular
technique used. It is a model used in oil palm research in order to find the
relationship between factors influencing the oil palm yield.
226
Table 8.1: The adequacy of fit measurements used for the nonlinear growth models
Model
MSE
RMSE
MAE
MAPE
Correlation
Logistic
2.9552
1.7191
1.2241
0.0347
0.9714
Gompertz
3.1528
1.7756
1.3525
0.0417
0.9694
Von Bertalanffy
2.9366
1.7136
1.2175
0.0346
0.9715
Negative exponential
4.0574
2.0143
1.6125
0.0512
0.9617
Monomolecular
3.7202
1.9287
1.5316
0.0501
0.9638
Log-logistic
4.6442
2.1550
1.7164
0.0598
0.9552
Richard’s
2.9365
1.7136
1.2176
0.0345
0.9715
Weibull
3.7202
1.9287
1.5316
0.0501
0.9638
Morgan-Mercer-Flodin
3.2045
1.7901
1.3696
0.0411
0.9689
Chapman-Richard
6.1903
2.4880
2.2749
0.0696
0.9630
Chapman-Richard*
3.4127
1.8473
1.4535
0.0478
0.9670
Stannard
2.9366
1.7136
1.2175
0.0345
0.9715
* with initial stage
Due to the nature of oil palm yield data, which is highly dependent on many
factors such as humidity, chemical content and soil type, we explored causal type
models such as the MLR model. A comparison between the MLR and RMR models
was also performed. The purpose was to explore the possibility of improving the
model with an improved the model’s accuracy. The relationship between oil palm
yield and foliar nutrient composition was investigated using MLR and RMR. The
modelling began by selecting FFB yield as the dependent variable, and the
concentrations of N, P, K, Ca and Mg as the independent variables. Nutrient balance
ratio, deficiency of K, deficiency of Mg and critical level of phosphorus were later
added to the model as independent variables.
The error modelling results for the MLR and MLR‡ models in the inland area
were recorded from 12.42 percent to 18.31 percent, and from 14.42 percent to 18.19
227
percent respectively. These are shown in Table 8.2. The R2 values also displayed a
similarity between the two models. The lowest error modelling result was 8.13
percent recorded at station CLD2; it is followed by station CLD3 with 8.39 percent
of error. The modelling accuracy of the MLR model ranged from 81.69 to 87.58
percent, and from 81.81 to 87.58 percent for the MLR‡ model recorded in the inland
area. In the coastal area (Table 8.3), the accuracy of the MLR model ranged from
74.38 to 91.87 percent. The MLR‡ model’s accuracy ranged from 75.01 to 91.92
percent. As the MAPE values for the inland and coastal areas were approximately
equal, the conclusion can be made that the performance of both approaches was also
similar.
Table 8.2: The RMSE, MAPE and R2 values for the MLR and RMR modelling for
the inland and coastal areas
RMSE
Station MLR
MLR‡
R2
MAPE
RMR
MLR
MLR‡
RMR
MLR MLR‡ RMR
Inland area
ILD1
4.6326 4.6831 5.0082 0.1623 0.1634 0.1579 0.392
0.379 0.571
ILD2
4.1101 4.0536 4.2989 0.1483 0.1457 0.1555 0.422
0.433 0.598
ILD3
4.2788 5.4819 4.2990 0.1404 0.1809 0.1403 0.404 0.478 0.381
ILD4
4.2214 4.1421 5.1001 0.1506 0.1455 0.1765 0.185 0.215 0.199
ILD5
3.7262 3.4895 3.7395 0.1497 0.1376 0.1483 0.400 0.474 0.323
ILD6
3.7532 3.7331 5.2820 0.1242 0.1242 0.1778 0.317 0.325 0.313
ILD7
4.8492 4.8107 4.8754 0.1798 0.1784 0.1745 0.118 0.132 0.127
ILDT
4.9822 4.9802 5.3922 0.1831 0.1819 0.1901 0.148 0.168 0.243
‡
Independent variables: N, P, K, Ca, Mg, nutrient balance ratio, critical leaf
phosphorus, K deficiency, Mg deficiency and TLB.
The presence of outlier observations will affect the credibility of the model
performance. The Q-Q plot showed the presence of outlier observations in the data
sets. For this reason robust regression was introduced in this study, specifically
robust M regression. The R2 values showed that six stations recorded an
improvement, five stations recorded no change and five stations recorded a decrease
in value. The accuracy of the RMR model was from 80.99 to 85.17 percent,
228
recorded in the inland area. Mean while, in the coastal area, the accuracy of
modelling was between 69.36 to 91.62 percent. The highest R2 value was 0.687,
recorded in the station CLD3; this meant that only 68.7 percent of the variance could
be explained by the independent variables, while 31.3 percent of the variance was
contributed by unexplained factors. The lowest of the R2 values were due to the lack
of information on the other factors that contributed to the oil palm yield production,
such as rainfall, soil moisture and other factors. These findings resulted in a
modelling accuracy that is still fairly low and that requires improvement.
Table 8.3: The RMSE, MAPE and the R2 values for the MLR and RMR models for
the coastal area
RMSE
Station MLR
MLR‡
R2
MAPE
RMR
MLR
MLR‡
RMR
MLR
MLR‡
RMR
0.118
Coastal
CLD1
4.9442 4.9085 5.3343
0.1766 0.1750 0.1625 0.380
0.388
CLD2
2.8913 2.8736 3.0848
0.0813 0.0808 0.0843 0.171
0.181 0.307
CLD3
2.7461 2.7944 2.7482
0.0839 0.0854 0.0838 0.687
0.676 0.516
CLD4
6.3415 6.3909 8.9948
0.2562 0.2499 0.3064 0.050
0.035 0.225
CLD5
4.8701 4.8758 4.8765
0.1729 0.1737 0.1725 0.111
0.108 0.115
CLD6
4.1983 4.2217 4.2022
0.1394 0.1399 0.1389 0.043
0.031 0.049
CLD7
4.1895 3.9527 4.8827
0.1301 0.1206 0.1505 0.232
0.315 0.151
CLDT
5.2337 5.0946 5.8121
0.1804 0.1746 0.2000 0.044
0.094 0.140
‡
Independent variables: N, P, K, Ca, Mg, nutrient balance ratio, critical leaf
phosphorus, K deficiency, Mg deficiency and TLB.
8.2.2 Modelling using neural network
Artificial neural networks are computing systems containing many
interconnected nonlinear neurons, capable of extracting linear and nonlinear
regularity in a given data set. Artificial neural networks, like humans, learn by
example. Neural network are potentially useful for studying the complex
229
relationship between the inputs and outputs of a system. Literature reviews on the
application of neural network for modelling purposes showed that they gave more
reliable results when compared to the statistical approach. Neural network are also
capable of learning how to do tasks based on the data given for training on initial
experience. They are data driven, rather than model driven as in the statistical
approaches.
The neural network model is still new in oil palm research compared to
research work in other short term crops such as barley, corn and potatoes. We used
the neural network model to further refine the modelling oil palm yield. In the first
stage, we used the same input and output variables used as in the MLR and RMR
models. The neural network model was run using different combinations of the
activation function. To ensure that the network was not overfit, the maximum
number of hidden nodes was calculated. The influence of the activation function
combination on the neural network’s performance was examined using the F
statistics (Table 8.4).
The test was evaluated using the MSE values for the training, validation and
testing phases, the average of the MSE values and the correlation coefficient. In the
training phase, nine out of sixteen stations were statistically significant. This meant
that neural network was able to produce different performances with different
combinations of the activation function(s) in the training phase. All stations showed
that the F tests were statistically significant in both the validation and testing phases.
For the average MSE, fourteen out of sixteen stations showed the F tests as
statistically significant at the 0.001 and 0.05 levels at their degrees of freedom,
respectively. The F test used for the correlation was also found to be significant at
eleven stations. Generally, different combinations of activation functions, will result
in different performances by the neural network.
230
Table 8.4: The F values of the analysis of variance for different activation functions,
for the inland and coastal areas
Station
The F-values
Training Validation
Testing
Average Correlation
df
Inland Stations
ILD1
3.368*
17.997*
12.055*
10.729*
3.062**
(5, 198)
ILD2
7.850*
15.516*
14.949*
10.091*
1.601
(5, 264 )
ILD3
2.736**
12.899*
1.924
7.951*
4.431*
(5, 240)
ILD4
2.291**
15.055*
10.452*
13.700*
3.058**
(5, 265)
ILD5
1.859
2.306**
42.523*
16.715*
2.519**
(5, 168)
ILD6
3.132**
22.047*
4.755*
0.927
4.606*
(5, 132)
ILD7
1.766
13.455*
7.028*
5.812*
0.916
(5, 264)
ILDT
0.853
11.736*
12.742*
3.732*
0.613
(5, 264)
Coastal Stations
CLD1
0.847
11.091*
18.495*
15.103*
6.454*
(5, 180)
CLD2
3.664*
7.724*
9.166*
10.197*
2.403*
(5, 192)
CLD3
3.265**
3.762*
1.918
2.905**
4.017*
(5, 54)
CLD4
1.295
12.413*
6.006*
8.957*
5.272*
(5, 222)
CLD5
1.524
10.218*
7.112*
1.615
2.139
(5, 180)
CLD6
3.232*
6.523*
10.145*
7.518*
6.146*
(5, 264)
CLD7
1.366
5.092*
3.354*
2.865**
5.911*
(5, 264)
CLDT
2.794**
8.083*
35.022*
13.037*
2.047
(5, 264)
Note: * significant at the 1% level; ** significant at the 5% level
The correlation was used to measure the proximity between the target value
and the predicted values, while the MAPE values were used to measure the error.
The correlation and the MAPE values are presented in Table 8.5. The highest
correlation value in the inland area was recorded as 0.8361 for the ILDT, followed
by 0.8114 at the ILD5 station. Meanwhile, the minimum MAPE value of 0.0560
was also recorded for the ILDT, followed by station ILD6 with a MAPE value of
0.0719. For the coastal area, the highest value of r was 0.8703 and was recorded at
ILD3 station; the second highest was recorded at station ILD4. The MAPE values
231
recorded ranged from 0.0638 to 0.1456. At this stage, we concluded that the neural
network model is able to estimate the FFB yield.
Table 8.5: The MAPE values and the correlation of the neural network model for the
inland and coastal areas
Station
ILD1
ILD2
ILD3
ILD4
ILD5
ILD6
ILD7
ILDT
MAPE
0.1159
0.1162
0.1092
0.1276
0.0944
0.0719
0.1580
0.0560
r
0.7984
0.7840
0.7853
0.5988
0.8114
0.7900
0.4961
0.8361
Station
CLD1
CLD2
CLD3
CLD4
CLD5
CLD6
CLD7
CLDT
MAPE
0.1220
0.0638
0.0657
0.1118
0.1431
0.1232
0.1003
0.1456
r
0.7404
0.6425
0.8703
0.8313
0.5397
0.4560
0.7054
0.5489
In the second stage, we examined the effect of an increased number of input
nodes on the neural network’s performance. The fertiliser trials and foliar nutrient
composition information were used as input nodes to estimate the FFB yield. The
results showed that the neural network’s performance was comparable to its previous
performance, so that the fertiliser trials information as inputs did not significantly
improve the neural network performance.
Experimental design was conducted to determine the effects of the neural
network parameters (such as learning rate, momentum terms, number of hidden
nodes and number of runs) to the neural network’s performance. By using the
analysis of variance test, it was found that the number of runs and the momentum
term did not influence neural network performance in the first experiment (Table
8.6). In the second experiment, the learning rate did not exert a statistically influence
on the neural network’s performance. However, the number of runs and the hidden
nodes influenced the neural network’s performance significantly. In both
experiments, the influence of the number of hidden nodes on the neural network’s
performance was statistically significant at the 0.05 level.
232
Table 8.6: The F values of analysis of variance for Experiment 1, 2 and 3
Experiment 1
Experiment 2
Parameter
Fstatistic
pvalue
df
HN
8.7759
0.0000
(7, 1912)
NR
1.6950
0.1330
(5, 1914)
MT
1.3300
0.2630
(3, 1916)
HN
8.0480
0.0000
(6, 2932)
NR
2.8840
0.0080
(6, 2932)
LR
1.6090
0.1540
(5, 2933)
P-O
18.481
0.000
(5, 179)
M-O
3.988
0.002
(4, 179)
P-O
12.171
0.000
(5, 179)
M-O
3.570
0.004
(4, 179
Experiment 3
Training
data
Test data
The third experiment was conducted to investigate the effects of percentageoutliers and magnitude-outliers on neural network performance in training and
test data. It is found that modelling accuracy in training and test data decreased
as the percentage-outliers and magnitude-outliers increased (Table 8.6). The Ftests and t-tests were performed and proved that both factors statistically
significant influenced on the neural network modelling.
A comparative study was conducted to evaluate the model performance
between the three models proposed in this study. The MAPE values and the
correlation values are presented in Table 8.7. The MAPE values of the NN models
were lower compared to the MLR and RMR models in both the inland and coastal
areas. This showed that the NN model can reduce the error in modelling and
improve prediction accuracy. The correlation values for the neural network’s model
were also higher than those for the MLR and RMR models.
233
Table 8.7: The comparison of the MAPE and correlation values between the MLR,
RMR and NN models, for the inland and coastal areas
MAPE
MLR
RMR
Correlation
NN
MLR
RMR
NN
Inland Stations
ILD1
0.1623
0.1579
0.1159
0.6260
0.7554
0.7984
ILD2
0.1483
0.1555
0.1162
0.6490
0.7732
0.7840
ILD3
0.1404
0.1403
0.1092
0.6360
0.6169
0.7853
ILD4
0.1506
0.1765
0.1276
0.4300
0.4457
0.5988
ILD5
0.1497
0.1483
0.0944
0.6330
0.5681
0.8114
ILD6
0.1242
0.1778
0.0719
0.5630
0.5593
0.7900
ILD7
0.1798
0.1745
0.1580
0.3430
0.3568
0.4961
ILDT
0.1831
0.1901
0.0560
0.3840
0.4932
0.8361
Coastal Stations
CLD1
0.1766
0.1625
0.1220
0.6160
0.3438
0.7404
CLD2
0.0813
0.0843
0.0638
0.4130
0.5538
0.6425
CLD3
0.0839
0.0838
0.0657
0.8290
0.7186
0.8703
CLD4
0.2562
0.3064
0.1118
0.2240
0.4748
0.8313
CLD5
0.1729
0.1725
0.1431
0.3320
0.3397
0.5397
CLD6
0.1394
0.1389
0.1232
0.2080
0.2231
0.4560
CLD7
0.1301
0.1505
0.1003
0.4810
0.3883
0.7054
CLDT
0.1804
0.2000
0.1456
0.2100
0.3742
0.5489
Finally, the accuracy of fit for these three models was compared. As shown in
Table 8.8 for inland area and Table 8.9 for coastal area. The changes in value are all
positive. This meant that the NN model was able to improve the modelling accuracy
when compared to both statistical approaches. The accuracy of fit when using the
neural network model ranged from 84.2 to 94.4 percent accuracy for the inland area,
and from 85.44 to 93.62 percent accuracy for the coastal area. The lowest change in
accuracy from the MLR model to the NN model recorded in the inland area was
2.6579 percent at station ILD7.
234
The highest change was recorded at 15.5588 percent for the ILDT. The
lowest change in accuracy from the RMR model to the NN model was 1.9988
percent, recorded at station ILD7, and the highest change was 16.5576 percent,
recorded for the ILDT. In the coastal area, station CLD4 recorded the highest
accuracy changes from the MLR model to the NN model and from the RMR model
to the NN model at 19.4138 percent and 28.0565 percent respectively.
The results showed that the neural network model performs better in
modelling than the regression model. The comparative study showed that the neural
network model provided more reliable results when compared to both the MLR and
RMR models. Hence, the neural network performance is robust to the impact of
outlier observations when compared to with linear regression model.
Table 8.8: The accuracy of the MLR, RMR and NN models, and the accuracy
changes for the inland area
Accuracy change (%)
MLR
RMR
NN
From MLR to NN
From RMR to NN
Inland Area
ILD1
83.7700 84.2100 88.4100
5.5390
4.9875
ILD2
85.1700 84.4500 88.3800
3.7689
4.6536
ILD3
85.9600 85.9700 89.0800
3.6296
3.6175
ILD4
84.9400 82.3500 87.2400
2.7078
5.9381
ILD5
85.0300 85.1700 90.5600
6.5036
6.3285
ILD6
87.5800 82.2200 92.8100
5.9717
12.8801
ILD7
82.0200 82.5500 84.2000
2.6579
1.9988
ILDT
81.6900 80.9900 94.4000
15.5588
16.5576
235
Table 8.9: The accuracy of the MLR, RMR and NN models, and the accuracy
changes, for the coastal area
Accuracy change (%)
MLR
RMR
NN
From MLR to NN
From RMR to NN
Accuracy change (%)
CLD1
82.3400 83.7500 87.8000
6.6310
4.8358
CLD2
91.8700 91.5700 93.6200
1.9049
2.2387
CLD3
91.6100 91.6200 93.4300
1.9867
1.9756
CLD4
74.3800 69.3600 88.8200
19.4138
28.0565
CLD5
82.7100 82.7500 85.6900
3.6030
3.5529
CLD6
86.0600 86.1100 87.6800
1.8824
1.8232
CLD7
86.9900 84.9500 89.9700
3.4257
5.9094
CLDT 81.9600 80.0000 85.4400
4.2460
6.8000
8.2.3 Modelling using response surface analysis
Conventional analyses of variance indicated particular application rates that
produce larger or smaller yields than other rates, but did not estimate an optimum
application rate. In this response surface analysis was attempted to obtain the
optimum application of fertiliser rate in order to generate maximum oil palm yield.
The use of ridge analysis was proposed in this study when the stationary point
was a saddle. The fertiliser combinations, the average estimated FFB yield and the
estimated total profit are recorded in Table 8.10 and Table 8.11. In the inland area,
the Renggam (ILD2) and Durian (ILD7) soil series recorded the highest requirements
for the N fertiliser at the levels of 6.4 kg/palm and 6.4966 kg/palm respectively. The
requirements for the N fertiliser for the inland and coastal areas were different. In
the inland area, the N fertiliser needed ranged from 2.6081 kg/palm to 6.4966
kg/palm, whereas in the coastal area it ranged from 1.82 kg/palm to 5.6169 kg/palm.
Thus, the amount of the N fertiliser required in the inland area was greater than those
in the coastal area. The requirements for the P and Mg fertiliser were quite similar
236
for both areas. This study found that oil palm trees need more of the N and K
fertilisers compared to other types of fertiliser.
The total profit calculated for each fertiliser combination by considering only
the fertiliser’s cost and assuming that other cost factors were constant. The highest
total profit for the inland area was recorded at station ILD5, which produced 31.7407
tonnes/hectare/year of FFB yield with a total profit of RM8309.58. The total profit
recorded in the coastal area ranged from RM 7253.34 /hectare/year to RM 9918.89
/hectare/year; these figures are higher than those recorded in the inland areas. The
fertiliser combinations were varied from one station to another, even those stations
with the same soil series.
Table 8.10: The fertiliser level, average estimated FFB yield and total profit for the
inland area
Fertiliser Level (kg/palm/year)
Station/Soil type
N
P
K
Mg
Estimated FFB yield
Total
(tonnes/hectare
Profit
/year)
(RM)
Inland Stations
ILD1/Bungor
5.1480
2.7054
3.2195
2.8126
30.1826
7306.09
ILD2/Renggam
6.4000
**
6.3864
*
27.3469
6357.17
ILD3/Munchong
4.7800
0.9000
4.1000
*
29.7351
7429.48
ILD4/Batu Anam
4.2611
2.1026
6.5259
1.1745
27.7434
7250.29
ILD5/Pohoi
2.6081
1.8619
3.1192
*
31.7407
8309.58
ILD6/Batu Anam
5.7761
2.0197
4.9856
3.0861
26.4447
5872.45
ILD7/Durian
6.4966
2.3188
3.5574
2.3068
27.5043
6373.06
Note: * Mg fertiliser was not applied in the trials; ** P fertiliser was not applied in
the trials
237
Table 8.11: The fertiliser level, average estimated FFB yield and total profit for the
coastal area
Fertiliser Level (kg/palm/year)
Station/Soil
type
Estimated FFB yield
Total
(tonnes/hectare
Profit
N
P
K
Mg
/year)
(RM)
CLD1/Carey
2.8791
2.0007
4.8655
1.4579
29.6934
7282.87
CLD2/Selangor
1.8200
1.8200
1.3600
1.8200
32.5558
7939.66
CLD3/Briah
5.6169
1.9442
4.5206
3.2781
31.0969
7281.32
CLD4/Sedu
3.6400
1.8200
3.6400
1.8200
30.9592
7723.78
CLD5/Carey
3.7266
3.8353
8.6736
0.3801
31.8279
7253.34
CLD6/Briah
3.6321
2.0422
0.6756
*
36.4902
9918.89
CLD7/Briah
4.1189
3.3749
3.7216
*
34.7329
8838.13
Note: * Mg fertiliser was not applied in the trials.
The foliar nutrient composition levels that corresponded with the average
estimated FFB yield are presented in Table 8.12. The results suggested that given an
annual application of 5.1480 kg of N fertiliser, 2.7054 kg of P fertiliser, 3.2195 kg of
K fertiliser and 2.8126 kg of Mg fertiliser, palms grown on the Bungor soils at
station ILD1 or example were capable of producing an average crop in the region of
30.1826 tonnes per hectare. This combination of fertilisers had the estimated foliar
nutrient composition of 2.5303% N, 0.1698% P, 1.0855% K, 0.5757% Ca and
0.3562% Mg. The other results are in Table 8.12.
238
Table 8.12: The average estimated FFB yield and the foliar nutrient composition
levels for the inland and coastal areas
Foliar nutrient composition (%)
Estimated FFB yield
Station/Soil type
(tonnes/hectare/year)
N
P
K
Ca
Mg
Inland Stations
ILD1/Bungor
30.1826
2.5303
0.1698 1.0855 0.5757 0.3562
ILD2/Renggam
27.3469
2.3398
0.1677 0.6858 0.6957 0.2936
ILD3/Munchong
29.7351
2.6841
0.1672 1.2257 0.8615 0.1941
ILD4/Batu Anam
27.7434
3.0223
0.1695 1.0470 0.6823 0.6548
ILD5/Pohoi
31.7406
2.9331
0.1693 1.1392 0.7418 0.2681
ILD6/Batu Anam
26.4447
2.6869
0.1685 0.7642 0.7668 0.2744
ILD7/Durian
27.5043
2.6264
0.1517 0.9626 0.5795 0.5365
Coastal Stations
CLD1/Carey
29.6934
2.6222
0.1620 0.9306 0.4821 0.2460
CLD2/Selangor
32.5558
2.5565
0.1536 0.8321 0.5036 0.3656
CLD3/Briah
31.0969
2.5646
0.1618 0.6196 0.4863 0.4425
CLD4/Sedu
30.9592
2.6418
0.1569 0.8810 0.4948 0.7641
CLD5/Carey
31.8279
2.6689
0.1482 0.9366 0.5857 0.3226
CLD6/Briah
36.4902
2.3310
0.1494 0.7438 0.6728 0.4042
CLD7/Briah
34.7329
2.5536
0.1495 0.8503 0.7208 0.2641
8.3
CONCLUSION
This study has introduced several statistical approaches in modelling oil palm
yield. These models provide important information which can be used in the oil
palm industry. The nonlinear growth model has been applied to the oil palm yield
growth data and this method has never been implemented to the oil palm data before.
The nonlinear growth model is found to be a suitable approach in estimating the oil
palm yield at any stages of ages.
239
When the nutrient balance ratio was applied to the oil palm yield model, there
was not a significant difference between using nutrient balance ratio and principle
nutrient component. The robust M regression was also suggested and the accuracy
of modelling was not indicates much difference to the multiple linear regression
approach. On the other hand, the accuracy of modelling is still low with the multiple
linear regression, due to using the linear model.
The neural networks model which also has never been used in modelling oil
palm yield was introduced in this study. We found that the accuracy of the modelling
using neural network is much better than the multiple linear regression approach.
The neural networks model proved to be an efficient and reliable model.
The usage of the fertilisers differs from the inland and coastal areas. The
palms grown in the coastal area need less the N, P and K fertiliser compared to the
inland area. Thus, oil palm yield produced in the coastal areas will produce more
FFB yield so that this area will generates more income to the planters.
8.4
AREAS FOR FURTHER RESEARCH
For further research, a comprehensive model can be built and run it in operational
mode. There are many other factors that affect the oil palm yield, not only fertiliser
and foliar nutrient composition. One can conduct an experiment to gather the data.
The development of a future model must consider other factors that may influence
the oil palm yield. The factors that significantly impact the oil palm yield are soil
factors (including soil pH, nutrient content, soil moisture, clay content and land
slope), climate factors (including rainfall, temperature, sunshine, humidity and water
level), management (including the type of fertiliser used, fertiliser dosage, pruning
and labour cost), planting density (including leaf area index and planting practice)
and general factors (including palm age, species, inflorescence rate, abortion rate and
oil palm genetics). The factors mentioned above are summarised in Figure 8.1.
240
The investigation and identification of the effects of each factor to the oil palm
yield performed by using path diagrams or causality effect models. It is
necessary to identify the individual effects before integrating all the factors into a
model. Too many factors will make the model very complex, and the model’s
development will be difficult to interpret. If necessary, the number of factors
should be reduced to several component variables, while the valuable and
required information should be retained. The principle component analysis is
developed purposely to reduce the number of factors into several components
that are not correlated to each other. This analysis may useful to reducing the
dimensions of the oil palm yield’s complex system. Other techniques that can be
explored include the use of neural network to select the best combination of the
variables that influence oil palm yield, based on a sensitivity analysis. It is
emphasised that even if all the common site factors which normally influence
fertiliser requirements are taken into account, some unusual factors may still
unpredictably alter the complex system of nutrients chain supply. However, the
relevant data is unavailable, which could prove be a problem.
241
Soil factors
pH, Soil type, Nutrient content, Soil moisture
Climate
Planting density
Rainfall, Temperature,
LAI, Planting practice
Humidity, Sunshine.
Oil Palm Yield
Management
Type of fertileser dosage,
General factors
Pruning, Labour cost,
Palm age, Species,
Fertiliser application,
Abortion rate, Genetics
Fertiliser
Nitrogen, Phosphate, Potassium, Calcium
Magnesium, etc
Figure 8.1: The factors which may influence oil palm yield.
242
REFERENCES
Adam, J. B. (1999). Predicting pickle harvest using a parametric feedforward neural
network, Journal of Applied Science, 26(2): 165-176.
Ahmad Tarmizi Mohammed and Wahid Omar (2002). Pembajaan sawit yang
berkesan. Prosiding Persidangan Kebangsaan Pekebun Kecil Sawit 2002:
Strategi Ke Arah Pengukuhan dan Hala Tuju Sektor Pekebun Kecil Sawit.
Ahmad Tarmizi Mohammed, Foster, H. L. Zin Zawawi Zakaria and Chow C. S.
(1986). Statistical and economic analysis of oil palm fertilizer trials in
Peninsular Malaysia between 1970-1981. PORIM Occasional Paper. No. 22.
Ahmad Tarmizi Mohammed, Hamdan Abu Bakar., Mohd Tayeb Dolmat and Chan
K. W. (1999). Development and validation of PORIM fertilizer
recommendation system in Malaysian oil palm cultivation. Proceedings of
the 1999 PORIM International Palm Oil Congress (Agriculture). 203-217.
Ahmad Tarmizi Mohammed, Zin Zawawi Zakaria, Mohd Tayeb Dolmat and Ariffin
Darus (2004). Oil palm fertilizer programme: A proposal for higher yield.
Presented at Mesyuarat Plan Tindakan MPOB dan RISDA, February, 10
2004, at Kluang.
Ahmad Tarmizi Mohammed, Zin Zawawi Zakaria, Mohd Tayeb Dolmat and Ariffin
Darus (2004). Oil palm fertilizer programme: A proposal for higher yield.
Presented in Mesyuarat Plan Tindakan MPOB dan RISDA, at Prime City,
Kluang.
Ahmad Tarmizi Mohmmed, Zin Zawawi Zakaria, Mohd Tayeb Dolmat, Foster, H.
L., Hamdan Abu Bakar and Khalid Haron (1991). Relative efficiency of urea
to sulphate of ammonia in oil palm: Yield response and environmental
factors. Proceedings of the 1991 PORIM International Palm Oil ConferenceAgriculture. 340-348.
Alder, D. (1980). Forest volume estimation and yield prediction. Yield Prediction,
vol. 2, FAO Forestry Paper 22/2.
243
Amer, F. A. and Williams, W. T. (1957). Leaf area growth in Pelargonium Zonale,
Ann. Biot. 21, 339.
Amstrong, J. S., Brodie, R. J. and McIntyre, S. H. (1987). Forecasting methods for
marketing-review of empirical research. International Journal of
Forecasting, 3: 355-376.
Anderson, V. L. and McLean, R. A. (1974). Design of Experiments: A Realistic
Approach. New York: Marcel Dekker, Inc..
Andrew, D. F. (1974). A robust method for multiple linear regression.
Technometrics, 16: 523-551.
Angstenberger, J. (1996). Prediction of the S and P 500 Index with Neural Networks,
43-152, Neural Networks and their Applications, edited by J. G. Taylor, John
Wiley and Sons, Inc.
Azme Khamis and Mokhtar Abdullah (2004). On robust environmental quality
indices. Pertanika Journal of Science and Technology, 12(1): 1-10.
Azme Khamis and Zuhaimy Ismail (2003). Perbandingan di antara regresi berganda
dan regresi komponen utama dalam menganggar harga minyak sawit mentah.
Prosiding Seminar Kebangsaan Sains Matematik Ke XI, 22-24 Disember
2003.
Azme Khamis and Zuhaimy Ismail. (2004). Comparative study on nonlinear growth
curve to tobacco leaf growth data. Journal of Agronomy, 3(2): 147-153.
Azme Khamis, Zuhaimy Ismail and Ani Shabri (2003). Pemodelan harga minyak
sayuran menggunakan analisis regresi linear berganda. Matematika, 19(1):
59-70.
Bansal, A, Kauffman, R. and Weitz, R. (1993). Comparing the modeling
performance of regression and neural networks as data quality varies: A
business value approach. Journal of Management Information System, 10: 1132.
Barnett, V. and Lewis, T. (1995). Outliers in Statistical Data. England: John Wiley
& Sons,
Bass, F. M. (1960). A new product growth model for consumer durables.
Management Science, 15: 215-227.
Bates, D. M. and Watts, D. V. (1988). Nonlinear Regression Analysis and its
Applications, New York: John Wiley.
244
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language.
Wadsworth, Pacific Grove, CA.
Belanger, G., Walsh, J. R., Richards, J. E., M. P. H. and Ziadi, N. (2000).
Comparison of three statistical models describing potato yield response to
nitrogen fertilizer. Agronomy Journal. 92: 902-908.
Bewley, R. and Fiebig, D. 1988. Flexible logistic growth model with application in
telecommunications. International Journal of Forecasting. 4: 177-192.
Birkes, D and Dodge, Y. (1993). Alternative Methods of Regression. New York:
John Wiley and Sons, Inc.
Bishop, C. M. (1995). Neural Networks for Pattern Recognition, Oxford University
Press.
Boussabaine, A. H. and Kaka, A. P. (1998). A neural networks approach for cost
flow forecasting. Construction Managemant and Economics. 16: 471-479.
Box, G. E. P. and Draper, N. R. (1987). Empirical model building and response
surfaces, New York: John Wiley & Sons.
Causton, D. R. and Venus, J. C. (1981). The Biometry of Plant Growth, London:
Edward Arnold,
Chaddha, R. L. and Chitgopekar, S. S. (1971). A generalization of the logistic curve
and long range forecast (1966-1981) of residence telephones. The Bell
Journal of Economics and Management Science, 2: 542-560.
Chan, K. W, Wahid, M. B., Ngan, M. A. and Basiron, Y. (2003). Climate change and
its effects on Yield of Oil Palm. Proceedings of International Palm Oil
Congress: Agriculture Conference: 237-260.
Chan, K. W., Lim K. C. and Ahmad Alwi (1991). Fertilizer efficiency studies in oil
palm. Proceedings of the 1991 PORIM International Palm Oil ConferenceAgriculture. 302-311.
Chan. K. W. (1999). System approach to fertilizer management in oil palm.
Proceedings of the 1999 PORIM International Palm Oil Congress –
Agriculture: 171-187.
Chatterjee, S. and Price, B. (1991). Regression Analysis by Example. 2nd Edition.
New York: John Wiley and Sons, Inc.
Chin, S. A. (2002). Narrowing the yield gap in oil palm between potential and
realization. The Planters, 78(919), 541-544.
245
Chow, C. S. (1984). Forecast of Malaysian palm oil production up to year 2000.
Proceedings of Int. Seminar on Market Development for Palm Oil Products.
31-47.
Chow, C. S. (1987). The seasonal and rainfall effects on palm oil production in
Peninsular Malaysia. Proceedings of 1987 Oil Palm Conference –
Agriculture: 46-52.
Chow, C. S. (1988). The seasonal and rainfall effects on palm oil production in
Peninsular Malaysia. Proceedings of the 1987 International Oil Palm/Palm
Oil Conference: Progress and Prospects: Agriculture. 46-55.
Christensen, R. (2001). Advanced Linear Modeling. Multivariate, Time Series and
Spatial Data; Nonparametric Regression and Response Surface
Maximization. New York: Springer-Verlag.
Cline, R. A. (1997). Leaf analyses for fruit crop nutrition. Horticultural Research
Institute of Ontario.
Connor, D. (1988). Data transformation explains the basics of neural networks. EDN,
33(10): 138-144.
Corley, R. H. V. and Gray, B. S. (1976). Growth and morphology. In Corley, R. H.
V., Haron, J. J. and Wood, B. J. (1976). Oil palm research: Development in
crop science (1). Elsevier Scientific Publishing Company. 7-21.
Corley, R. H. V. (1976). Photosynthesis and productivity. In Corley, R. H. V.,
Haron, J. J. and Wood, B. J. (1976). Oil palm research: Development in crop
science (1). Elsevier Scientific Publishing Company. 55-76.
Corley, R.H.V. (1976). Physiological aspect of nutrition. In Corley, R. H. V., Haron,
J. J. and Wood, B. J. (1976). Oil palm research: Development in crop
science (1). Elsevier Scientific Publishing Company. 157-164.
Corley, R.H.V. and Mok, C. K. (1972). Effects of nitrogen, phosphorus, potassium
and magnesium on growth of oil palm. Expl. Agri. 8: 347-353.
Corne, S. A., Carver, S. J., Kunin, W. E., Lenon, J. J. and van Hees, W. W. S.
(2000). Using neural network methods to predict forest characteristics in
southeast Alaska. 4th International Conference on Integrating GIS and
Environmental Modeling (GIS/EM4): Problems, Prospects and Research
Needs.
246
Corne, S. Kneale, P., Openshaw, S. and See L. (1998). The use and evaluation of
artificial neural networks in flood forecasting.
http://www.ccg.leeds.ac.uk/simon/maff98.htm.
Deck, S. H., Morrow, C. T., Heinemann, P. H. and Sommer, H. J. (1995).
Comparison of a neural network and traditional classifier for machine vision
inspection of potatoes. Applied Engineering in Agriculture, 11: 319-326.
Department of Statistics, Malaysia, 1975 – 1989. Kuala Lumpur, Malaysia.
Donaldson, R. G., Kamstra, M., and Kim, H. Y. (1993). Evaluating alternative
models for conditional stock volatility: Evidence from international Data.
Working Paper, University of British Columbia.
Draper, N. R. and Smith. H. (1981). Applied regression analysis. New York: John
Wiley and Sons,
Drummond, S. T. Sudduth, K. A. and Birrell. (1995). Analysis and correlation
methods for spatial data. ASAE Paper No. 95-1335. St. Joseph, Mich.: ASAE
Drummond, S. T., Sudduth, K. A., Joshi, A., Birrell, S. J. and Kitchen, N. R. (2002).
Statistical and neural network methods for site-specific yield prediction.
http://www.nal.usda.gov/ttic/tektran/data/000013/14/0000131434.html
Epstein, E. (1972). Mineral nutrition of plants: principle and perspectives. New
York: Wiley.
Evan, O. V. D. (1997). Short-term currency forecasting using neural networks. ICL
System Journal. 11(2).
Fairhurst, T. H. and Mutert, E. (1999). Interpretation and management of oil palm
leaf analysis data. Better Crops International. 13(1): 48-51.
Farazdaghi, H. and Harris, P. M. 1968. Plant competition and cop yield. Nature, vol.
217: 289-290.
Fausett, L. (1994). Fundamentals of Neural Networks: Architectures, Algorithms, and
Applications. Prentice-Hall, Inc.
Fekedulegn, D., Mac Suirtain, M. P. and Colbert, J. J. (1999). Parameter estimation
of nonlinear growth models in forestry, Silva Fennica 33(4): 327-336.
Foong, F. S. (1991). Potential evapotranpiration, potential yield and leaching losses
of oil palm. Proceedings of the 1991 PORIM International Palm Oil
Congress (Agriculture): 105-119.
247
Foong, F. S. (1999). Impact of moisture on potential evapotranspiration, growth and
yield of palm oil. Proceedings of the 1999 PORIM International Palm Oil
Congress (Agriculture): 265-287.
Foong, S. F. (2000). Faktor yang menentukan pengeluaran hasil dan mempengaruhi
potensi hasil sawit. Modul Kursus Pengurusan Ladang untuk Pengurus
FELDA.
Foster, H. (2003). Assessment of oil palm fertilizer requirements. . In Thomas
Fairhust and Rolf Hardter, Oil Palm Management for Large and Sustainable
Yields. PPI, PPIC and IPI.
Foster, H. L. (1995). Experience with fertilizer recommendation system for oil palm.
Proceedings of 1993 PORIM International Palm Oil Congress – update and
vision (Agriculture): 313-328.
Foster, H. L. and Chang, K. C. (1977). The diagnosis of the nutrient status of oil
palms in West Malaysia. In Earp, D. A. and Newall, W. (eds.). International
Developments in Oil Palm. Malaysian International Agricultural Oil Palm
Conference, Kuala Lumpur. 14-17 June 1976. ISP, 290-312.
Foster, H. L. and Chang, K. C. Mohd Tayeb Dolmat, Ahmad Tarmizi Mohammed
and Zin Zawawi Zakaria (1985). Oil palm yield responses to N and K
fertilizers in different environments in Peninsular Malaysia . PORIM
Occasional Paper No. 16. Palm Oil Research Institute of Malaysia Kuala
Lumpur.
Foster, H. L., Ahmad Tarmizi Mohammed and Zin Zawawi Zakaria. (1987). Foliar
diagnosis of oil palm in Peninsular Malaysia. Proceedings of 1987
International Palm Oil Conference – Agriculture: 249-261.
Foster, H. L., Mohd Tayeb Dolmat and Gurmit Singh. (1987). The effect of
fertilizers on oil palm bunch components in Peninsular Malaysia.
Proceedings of 1987 International Palm Oil Conference – Agriculture: 294-
305.
Franses, P. H. and Homelen, P. V. (1998). On forecasting exchange rate using neural
networks. Applied Financial Economics. 8: 589-596.
Gallant. A. R. (1987). Nonlinear statistical models. New York: John Wiley and Sons.
248
Gan, W. S. and Ng, K. H. (1995). Multivariate FOREX forecasting using artificial
neural networks. IEEE International Conference of Neural Networks. 2: 10181022.
Garcia, O. (1983). The stochastic differential equation model for the height growth of
forest stands, Biometrics, 39, 1059-1072.
Garcia, O. (1988). Growth modeling - A review. New Zealand Forestry, 33(3): 1417.
Garcia, O. (1989). Growth modeling – New development. In Nagumo, H., and
Konohira, Y (Eds.), Japan and New Zealand Symposium on Forestry
Management Planning, Japan Association for Forestry Statistics, 152-158.
Garcia, O. (1993). Stand growth models: Theory and practice. In Advancement in
Forest Inventory and Forest Management Sciences – Proceedings of IUFRO
Seoul Conference. Forest Research Institute of The Republic of Korea.
Garcia, R. and Gency, R. (2000). Pricing and hedging derivative securities with
neural networks and a homogeneity hint. Journal of Econometrics. 94: 93115.
Gaudart, J., Giusiano, B. and Huiart, L. (2003). Comparison of the performance of
multi-layer perceptron and linear regression for epidemiological data.
Computational Statistics and Data analysis.
Glasbey, C. A. (1979). Correlated residual in non-linear regression applied to growth
data. App. Stat. 28: 251-259.
Goh, K. J., Hardter, R. and Fairhust, T. (2003). Fertilizing for Maximum Return. In
Thomas Fairhust and Rolf Hardter, Oil Palm Management for Large and
Sustainable Yields. PPI, PPIC and IPI. 279-306.
Gorr, W. L. (1994). Research prospective on neural network forecasting.
International Journal of Forecasting, 10: 1-4.
Green, A. H. (1976). field Experiments as a Guide to Fertilizer Practice. In Corley,
R. H. V., Hardon, J. J. and Wood, B. J. 1976. Oil Palm Research:
Developments in Crop Science (1). Elsevier Scientific Publishing Company,
Netherlands.
Gujarati, D. N. (1988). Basic Econometrics. New York, McGraw-Hill.
Hagan, M. T., Demuth, H. B. and Beale, M. H. (1996). Neural Network Design.
Boston: PWS Publishing Company.
249
Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal
of the American Statistical Association, 69: 383-393.
Harris, J. M. and Kennedy, S. (1999). Carrying capacity in agriculture: global and
regional issues. Ecological Economics, 29: 443-461.
Hartley, C. W. S. (1988). The Oil Palm (Elaeis guineensis Jacq.). New York:
Longman Scientific & Technical, 3rd Edition
Heeler, R. M. and Hustad, T. P. (1980). Problems in predicting new product growth
for consumer durables. Management Science, 26:1007-1020.
Henson, I. E. (2000). Modeling the effects of ‘haze’ on oil palm productivity and
yield. Journal of Oil Palm Research. 12(1): 123-134.
Henson, I. E. and Chan, K. C. (2000). Oil palm productivity and its component
process. Advances in Oil Palm Research, 1: 97-145.
Henson, I. H. and Mohd Harun, H. (2004). Seasonal in oil palm fruit bunch
production: Its origin and extent. The Planters, 80(937): 201-212.
Hill, T. and Remus, W. (1994). Neural network models for intelligent support of
managerial decision making. Decision Support Systems, II: 449-459.
Holliday, R. 1960. Plant population and crop yield: Part I. Field Crop Abstract, vol.
13(3), 159-167.
Holliday, R. 1960. Plant population and crop yield: Part II. Field Crop Abstract, vol.
13(4), 247-254.
Hornik, K., Stinchcombe, M. and White, H (1989). Multilayer feedforward networks
are universal approximators. Neural Networks, 2: 359-366.
Hsieh, W. W. and Tang, B. (1998). Applying neural network models to prediction
and data analysis in meteorology and oceanography. Bulletin of the American
Meteorology Society. 79(9): 1885-1887.
Hsu, C. C. and Chen, C. Y. (2003). Regional load forecasting in Taiwan –
applications of artificial neural networks. Energy Conversion and
Management, 44: 1941-1949.
Hsu, K., Gupta, H. V. and Sorooshian, S. (1993). Artificial neural network modeling
of the rainfall-runoff process. Water Resources Research, 29(4): 1185-1194
Huber, P. J. (1981). Robust Statistics. New York: Wiley.
Hunt, R. (1982). Plant Growth Curves, London: Edward Arnold.
Hykin, S. (1999). Neural networks: A comprehensive foundation. 2nd edition. New
Jersey: Prentice Hall.
250
Indro, D. C., Jiang, C. X., Patuwo, B. E. and Zhang, G. P. (1999). Predicting mutual
fund performance using artificial neural networks. Omega, Int. J. Mgmt. Sci.
27: 569-582.
Jennrich, R. I. (1995). An introduction to computational statistics: Regression
analysis. New Jersey: Prentice Hall, Englewood Cliffs.
Jiang, X., Chen, M. S., Manry, M. T., Dawson, M. S. and Fung, A. K. (1994).
Analysis and optimization of neural networks for remote sensing. Remote
Sensing Review, 9: 97-114.
Kastens, T. L. Dhuyvetter, K. C., Schmidt, J. P. and Stewart, W. M. (2000). Wheat
yield modeling: How important is soil test phosphorus? Better Crops. 84(2):
8-10.
Kastens, T. L., Schmidt, J. P. and Dhuyvetter, K. C. (2000). Wheat yield modeling
with site-specific information: A Kansas farm case study. In A. J. Schlegel
(ed.) Proceedings of the 2000 Great Plains Soil Fertility Conference. 41-48.
Kastens, T. L., Schmidt, J. P. and Dhuyvetter, K. C. (2003). Yield models implied
by traditional fertilizer recommendations and a framework for including
nontraditional information. Soil Sci. Soc. Am. J. 67: 351-364.
Kee, K. K. and Chew, P. S. (1991). Oil palm responses to nitrogen and drip irrigation
in a wet monsoonal climate in Peninsular Malaysia. Proceedings of PORIM
International Palm Oil Conferences. Module 1: Agriculture: 321-339.
Klein, B. D. and Rossin, D. F. (1999a). Data errors in neural network and linear
regression models: An experimental comparison. Data Quality Journal, 5(1):
1-19.
Klein, B. D. and Rossin, D. F. (1999b). Data quality in neural network: effect of
error rate and magnitude of error on predictive accuracy. Omega,
International Journal of Management Science, 27: 569-582.
Kominakis, A. P., Abas, Z., Maltaris, I. and Rogdakis, E. (2002). A preliminary
study of the application of artificial neural networks to prediction of milk
yield in dairy sheep. Computers and Electronics in Agriculture. 35(1): 35-48.
Kruse, J. R. 1999. Trend yield analysis and yield growth assumption. Technical
Report No. 06-99. Food and Agricultural Policy Research Institute.
Lai, L. L. (1998). Intelligent system applications in power engineering: Evolutionary
programming and neural networks. West Sussex, England: John Wiley and
Sons.
251
Law, R. (2000). Back-propagation learning in improving the accuracy of neural
network-based tourism demand forecasting. Tourism Management, 21: 331340.
Law, R. and Au, N. (1999). A neural network model to forecast Japanese demand for
travel to Hong Kong. Tourism Management, 20: 89-97.
Lawrence, S., Giles, C. L. and Tsoi, A. C. (1996). What size neural network gives
optimal generalization? Convergence properties of backpropagation.
Technical Report UMIACS-TR-96-22 and CS-TR-3617, University of
Marryland College Park.
Lee, J. C., Lu, K. W. and Horng, S. C. (1992). Technological forecasting with nonlinear models. Journal of Forecasting, 11: 195-20.
Lehmann, E. L. (1998). Nonparametrics: Statistical methods based on ranks.1st
edition (revised), New Jersey: .Prentice Hall.
Lei, Y. C. and Zhang, S. Y. (2004). Features and partial derivatives of BertalanffyRichards growth model in forestry. Nonlinear Analysis: Modelling and
Control, vol. 9(1): 65-73.
Leng, T., Pin, O. K. and Zainuriah, A. (2000). Effects of fertilizer withdwawal prior
to replanting on oil palm performance. Proceeding of the International
Planters Conference. 233-249.
Limsombunchai, V., Gan, C. and Lee, M. (2004). House price prediction: Hedonic
price model vs. artificial neural network. American Journal of Applied
Science, 1(3): 193-201.
Lin, F., Yu, X. H., Gregor, S. and Irons, R. (1995). Time series forecasting with
neural networks. http://www.csu.edu.au/ci/vo102/cmxhk/cmxhk.html
Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP
Magazine. 4-22.
Liu, J., Goering, C. E. and Tian. L. (2001). A neural network for setting target corn
yields. Transactions of the American Society of Agriculture Engineers. 44
(3): 705-713.
Ludlow, A. R. (1991). Modeling as a tool for oil palm research. PORIM
International Palm Oil Conference – Agriculture. 273-289.
Makowski, D., Wallach, D. and Meynard, J. M. (2001). Statistical methods for
predicting responses to applied nitrogen and calculating optimal nitrogen
rates. Agronomy Journal. 93: 531-539.
252
Malaysian Palm Oil Board Statistics (1990 – 2003). Malaysian Palm Oil Board,
Kuala Lumpur: Malaysia.
Masters, T. (1993). Practical Neural Network Recipes in C++. Academic Press, Inc.
McCullagh, P and Nelder, J. A. (1983). Generalized Linear Models. London:
Chapman and Hall.
Md Yunus Jaafar. (1999). Pensuaian model taklinear Gompertz terhadap
pertumbesaran pokok koko. Matematika, 15(1): 1-20.
Meade, N. (1984). The use of growth curves in forecasting market development-a
review and appraisal. Journal of Forecasting, 3: 429-451.
Meade, N. (1985). Forecasting using growth curves-an adaptive approach. Journal of
Operational Research Society, 36: 1103-1115.
Meade, N. (1988). A modified logistic model applied to human populations. Journal
of Royal Statistical Society, Ser. A, 151: 491-498.
Meade, N. and Islam, T. (1995). Forecasting with growth curves: An empirical
comparison. International Journal of Forecasting, 11: 199-215.
Michaelsen, J. (1987). Cross-validation in statistical climate forecast models. J. Clim.
Appl. Meteor. 26: 1589-1600.
Mihalakakou, G., Santamouris, M. and Asimakopoulos, D. N. (2000). The total solar
radiation time series simulation in Athens using neural networks. Theoritical
and Applied Climatology, 66: 185-197.
Mohd Haniff Harun. (2000). Yield and yield components and their physiology.
Advances in Oil Palm Research, 1: 146-170.
Mokhtar Abdullah (1994). Analisis Regresi. Kuala Lumpur: Dewan Bahasa dan
Pustaka.
Montgomery, D. C. (1984). Design of Experiments, second edition, New York: John
Wiley & Sons, Inc.
Morgan, P. H., Mercer, L. P. and Flodin, N. W. (1975). General model for
nutritional response of higher organisms. Proc. Nat. Acad. Sci. USA. 72:
4327-4331.
Moshiri, S. and Cameron, N. (2000). Neural network versus econometric models in
forecasting inflation. Journal of Forecasting. 19(3): 201-217.
Motiwalla, L. and Wahab, M. (2000). Predictable variation and profitable trading of
US equities: A trading simulation using neural networks. Computers and
Operational Research, 27: 1111-1129.
253
Murata, N., Yoshizawa, S. and Amari, S. (1994). Network information criteriondetermining the number of hidden units for an artificial neural network
model. IEEE Transactions on Neural Networks. 5: 865-872.
Myers, R. H. and Montgomery, D. C. (1995). Response Surface Methodology :
Process and product optimization using design experiments. John Wiley and
Sons, NY.
Nasr, G. E., Badr, E. A. and Joun, C. (2003). Backpropagation neural networks for
modeling gasoline consumption. Energy Conversion and Management, 44:
893-905.
Navone, H. D. and Ceccatto, H. A. (1994). Predicting Indian monsoon rainfall: a
neural networks approach. Climate Dynamics. 10: 305-312.
Naylor, R, Falcon, W. and Zavaleta, E. 1997. Variability and growth in grain yields,
1950-1994. Population Dev. Rev. 23(1): 41-58.
Nelder, J. A., Austn, R. B., Bleasdale, K. A. and Salter, P. J. 1960. An approach to
the study of yearly and other variation in crop yield. J. Hort. Sci., vol. 35: 7382.
Nelson, L. (1997). Statistics in Fertilizer Use Research (Modern Agriculture and
Fertilizers Lecture Series No. 1) Potash and Phosphate Institute of Canada,
India Programme, Dundahera, Gurgaon.
Norušis, M, J. (1998). SPSS® 8.0. Guide to Data Analysis. New Jersey: Prentice
Hall.
Oboh, B. O. and Fakorede, M. A. B. (1999). Effects of weather on yield components
of the oil palm in forest location in Nigeria. Journal of Oil Palm Research.
11(1): 79-89.
Oil World Annual: (1999 – 2003). Oil World. ISTA Mielke Gmbh. Hamburg.
Oil World. 2003. Oil World. ISTA Mielke Gmbh. Hamburg.
Oliver, F. R. (1970). Estimating the exponential growth function by direct least
squares. Applied Statistics, 19: 92-100.
Patrick, B. K. and Smagt, V. D. (1996). An Introduction to Neural Network. The
University of Amsterdam.
Patterson, D. W. (1996). Artificial Neural Networks: Theory and Applications.
Singapore: Prentice Hall.
Penman, H. L. (1956). Weather and water in growth of grass. In Milthrope, F. L. The
growth of leaves. London, Butterworths Scientific Publications.
254
Philip, M. S. (1994). Measuring trees and forests. 2nd edition. Wallingford, UK:
CAB International.
Pienaar, F. J. and Turnbull, K. J. (1973). The Chapman-Richards generation of Von
Bertalanffy’s growth model for basal area growth and yield in even-aged
stands. For. Sci. 19: 2-22.
Pushparajah, E. (1994). Leaf analysis and soil testing for plantation tree crops.
Presented at International Workshop Leaf Diagnosis and Soil Testing as a
Guide to Crop Fertilization.
Ralston, M. L. and Jennrich, R. I. (1978). DUD, A Derivative-Free Algorithm for
Nonlinear Least Squares. Technometrics, 20: 7-14.
Rao, S. K. (1985). An empirical comparison of sales forecasting models. Journal of
Product Innovation Management, 4: 232-242.
Ratkowskay, D. A. (1983). Nonlinear regression modeling. New York. Marcel
Dekker.
Rawson, H. M. and Hackett, C. (1974). An exploration of the carbon economy of
the tobacco plant III. Gas exchange of leaves in relation to the position of the
stem, ontogeny and nitrogen content, Aust. J. Plant Physio., 1: 551.
Reed, R. D. and Mark, R. J. (1999). Neural Smoothing: Supervised Learning in
Feedforward Artificial Neural Networks. The MIT Press.
Richard, F. J. (1969). The quantitative analysis of growth. In Stewart (Ed.), Plant
Physiolog, Volume VA: Analysis of Growth: Behaviour of Plants and their
Organs. New York: Academic Press.
Ripley, B. D. (1994). Neural networks and flexible regression and discrimination.
Advances in Applied Statistics. 39-57.
Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge
University Press.
Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier
Detection. New York: John Wiley and Sons, Inc.
Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning internal
representation by back-propagating errors. In: Rumelhart, D. E., McCleland,
J.L., and the PDP Group, editors. Parallel processing: explorations in the
microstructure of cognation. MA. MIT Press.
Sarles, W. S. (1997). Neural Network FAQ, periodic posting to the Usenet
newsgroup comp.ai.neural-net, URL: ftp://ftp.sas.com/pub/neural/FAQ.html.
255
Sarles, W. S. (1998). Neural networks and statistical models. Proceedings of the
Nineteenth Annual SAS Users Group International Conference.
SAS/STAT User’s Guide, (1992). Release 6.03 edition. SAS Institute, Cary, NC.
Schnute, J. (1981). A versatile growth model with statistically stable parameters,
Can. J. Fish. Aquat. Sci. 38: 1128.
Seber, G. A. F. and Wild. C. J. (1989). Nonlinear regression New York: John
Wiley and Sons.
Shearer, S. A., Burks, T. F., Fulton, J. P., Higgins, S. F., Thomasson, J. A., Mueller,
T. G. and Samson, S. (1994). Yield prediction using a neural network
classifier trained using soil landscape features and soil fertility data.
Agronomy Journal, 89: 54-59.
Shrestha, D. S. and Steward, B. L. (2002). Early growth stage corn plant population
measurement using neural network approach. Proceedings of the World
Congress of Computers in Agriculture and Natural Resources.: 8-14.
Smith, K. A. and Gupta, J. N. D. (2000). Neural networks in business: techniques
and applications for the operations research. Computers and Operation
Research. 27: 1023-1044.
Soon, B. B. F. and Hong, H. W. (2001). Oil palm responses to N, P, K and Mg
fertilizers on two major soil types in Sabah. Proceedingsof the 2001 PIPOC
International Palm Oil Congress (Agriculture, 318-334.
S-Plus 6 for Windows: Guide to Statistics, Volume. 1. (2001). Insightful
Corporation, Seattle, Washington.
Tangang, F. T. and Hsieh, W. W. and Tang, B. (1997a). Forecasting the equatorial
Pacific sea surface temperatures by neural network models. Climate Dynamic,
13: 135-147.
Tangang, F. T., Tang, B., Monahan, A. H. and Hsieh, W. W. (1998). Forecasting
ENSO events: A neural network-extended EOF approach. Journal of Climate,
29-41.
Tanner, J. C. (1978). Long term forecasting of vehicle ownership and road traffic.
The Journal of the Royal Statistical Society, Ser. A, 141:14-63.
Teng, Y. and Timmer, V. R. (1996). Modeling nitrogen and phosphorus interactions
in intensively managed nusery soil-plant systems. Canadian Journal Soil
Science. 76: 523-530.
256
Teoh, C. H. (2000). Land use and the palm oil industry in Malaysia. Report produced
under Project MY 0057 ‘Policy Assessment of Malaysian Conservation
Issues’, Kuala Lumpur.
Thiesing, F. M. and Vornberger, O. (1997). Sales forecasting using neural networks.
IEEE.Proceedings ICNN, Houstan, 4: 2125-2128,
Timmermans, A. J. M. and Hulzebosch, A. A. (1996). Computer vision system for
on-line sorting of pot plants using an artificial neural network classifier.
Computers and Electronics in Agriculture, 15: 41-55.
Tinker, P. B. (1976). Soil requirements of the oil palm. In Corley, R. H. V., Haron,
J. J. and Wood, B. J. (1976). Oil palm research: Development in crop
science (1). Elsevier Scientific Publishing Company. 165-179.
Tkacz, G. and Hu, S. (1999). Forecasting GDP growth using artificial neural
networks. Bank of Canada Working Paper 99-3.
Tsoularis, A. & Wallace, J. (2002). Analysis of logistic growth models.
Mathematical Biosciences. 179: 21-55.
Uysal, M. and El Roubi, M. S. (1999). Artificial neural networks versus multiple
regression in tourism demand analysis. Journal of Travel Research, 38(2):
111-118.
Vanclay, J. K. 1994. Modelling forest growth and yield. Wallingford, UK, CAB
International.
Verdooren, R. (2003). Design and analysis fertilizer experiments. In Fairhust, T. and
Hardter, R. 2003. Oil Palm: Management for Large and Sustainable Yields.
PPI, PPIC and IPI.
Wang, D., Dowell, F. E. and Lacey, R. E. (1999). Single wheat kernel color
classification using neural networks. Transaction of the ASAE, 42: 233-240.
Welch, S. M., Roe, J. L. and Dong, Z. (2003). A genetic neural network model of
flowering time control in Arabidopsis thaliana. Agron. J. 95: 71-81.
Welstead, S. T. (1994). Neural Network and Fuzzy Logic Applications in C++. New
York: John Wiley and Sons, Inc.
Wendroth, O, Reuter, H. I. and Kersebaum, K. C. (2003). Predicting yield of barley
across a landscape: a state-space modeling approach. Journal of Hydrology.
272: 250-263.
257
Wong, B. K. and Selvi, Y. (1998). Neural network application in finance: A review
and analysis of literature (1990-1996). Information and Management, 34:
129-139.
Wong, B. K., Lai, V. S. and Lam, J. (2000). A bibliography of neural networks
business applications research: 1994-1998. Computers and Operations
Research. 27: 1045-1076.
Yang, C. C., Prasher, S. O., Lacroix, R., Sreekanth, S., Madani, A. and Masse, L.
(1997). Artificial neural network model for subsurface-drained farmlands.
Journal of Irrigation and Drainage Engineering, 123: 285-292.
Yang, C. C., Prasher, S. O., Landry, J. A., Ramaswamy, H. S. and Ditommaso, A.
(2000). Application of artificial neural networks in image recognition and
classification of crop and weeds. Canadian Agriculture Engineering, vol.
42(3): 147-152.
Yao, J, Li, Y. and Tan, C. L. (2000). Option prices forecasting using neural
networks. Omega, Int. J. Mgmt. Sci. 28: 455-466.
Yao, J. and Poh, H. L. (1995). Forecasting the KLSE Index using neural networks.
IEEE. 1013-1017.
Yao, J. and Tan, C. L. (2000). A case study on using neural networks to performs
technical forecasting of forex. Neurocomputing, 34: 79-98.
Yuancai, L., Marques, C. P. & Macedo, F. W. (1997). Comparison of Schnute’s and
Bertalanffy-Richards’ growth function. Forest Ecology and Management. 96:
283-288
Zhang, G. and Hu, M. Y. (1998). Neural network forecasting of the British
pound/Us dollar exchange rate. Omega, Int. J. Mgmt. Sci. 26(4): 495-506.
Zhang, G. P., Patuwo, G. E. and Hu, M. Y. (2001). A simulation study of artificial
neural networks for nonlinear time-series forecasting. Computers and
Operations Research. 28: 381-396.
Zuhaimy Ismail and Azme Khamis (2001). A review on combining neural network
and genetic algorithm. Laporan Teknik/M. 5. Jabatan Matematik, UTM.
Zuhaimy Ismail and Azme Khamis (2002). A review on neural network and its
application in forecasting. Laporan Teknik/M. 3. Jabatan Matematik, UTM.
Zuhaimy Ismail and Azme Khamis (2003). Rangkaian neural dalam peramalan
harga minyak kelapa sawit. Jurnal Teknologi: C, 39: 17-28.
258
Zuhaimy Ismail, Azme Khamis and Md Yunus Jaafar. (2003). Fitting
nonlinear Gompertz curve to tobacco growth data. Pakistan Journal of
Agronomy, 2(4): 223-236.
259
Appendix A
The list of oil palm experimental stations
Code
Name of station (Estate)
Soil series
Palm age
Coastal area
CLD1
West Carey Estate, Klang, Selangor
Carey
16 - 17
CLD2
North Carey Estat, Klang, Selangor
Selangor
16 - 17
CLD3
Nordanal Estate, Muar, Johor
Briah
19 - 20
CLD4
Dusun Durian, Selangor
Sedu
12 - 14
CLD5
South Carey Estate, Carey Island,
Carey
12 - 15
Selangor
CLD6
Athlone Estate, Kapar, Perak
Briah
11 - 14
CLD7
Athlone Estate, Kapar, Perak
Briah
14 - 17
Bungor
12 - 14
Rengam
7 - 11
Munchong
7 - 10
Batu Anam
10 - 13
Pohoi
11 - 14
Inland area
ILD1
Sungai Mahang Estae, Nilai, Negeri
Sembilan
ILD2
Daimond Jublee Estate, Jasin,
Melaka
ILD3
Sendayan Estate, Port Dickson,
Negeri Sembilan
ILD4
Gomali Estate 3, Bt. Anam,
Segamat, Johor
ILD5
Lepan Kabu Estate, Kuala Krai,
Kelantan
ILD6
Gomali Estate, Batu Anam, Segamat, Batu Anam
7 - 10
Johor
ILD7
Gomali Estate, Batu Anam, Segamat, Durian
Johor
9 - 12
260
Appendix B
The rate and actual value of fertiliser (kg/palm/year)
CLD1
Level
0
1
2
N (kg N26)
0.00
1.82
3.64
P (kg CIRP)
0.00
1.82
3.64
K (kg KCI)
0.00
2.37
5.46
Mg (kg Kies)
0.00
1.82
3.64
CLD2
Level
0
1
2
N (kg N26)
0.00
1.82
3.64
P (kg CIRP)
0.00
1.82
3.64
K (kg KCI)
0.00
2.72
5.44
Mg (kg Kies)
0.00
1.82
3.64
CLD3
Level
0
1
2
N (kg ASN)
0.00
3.64
7.28
P (kg CIRP)
0.00
1.82
3.64
CLD4
Level
0
1
2
N (kg ASN)
0.00
3.64
7.28
P (kg CIRP)
0.00
1.82
3.64
CLD5
Level
0
1
2
N (kg CAN)
0.00
2.73
5.46
P (kg CIRP)
0.00
4.55
9.10
K (kg KCI)
0.00
3.64
7.28
Mg (kg Kies)
0.00
1.82
3.64
K (kg KCI)
0.00
3.64
7.28
Mg (kg Kies)
0.00
1.82
3.64
K (kg ASH)
0.00
9.10
18.20
Mg (kg Kies)
0.00
4.55
9.10
CLD6
Level
0
1
2
3
N (kg N26)
0.00
1.82
3.64
5.45
P (kg CIRP)
0.00
2.73
---
K (kg KCI)
0.00
2.27
4.55
6.82
CLD7
Level
0
1
2
3
N (kg N26)
0.00
1.82
3.64
5.45
P (kg CIRP)
0.00
2.73
---
K (kg KCI)
0.00
2.27
4.55
6.82
261
Fertiliser Rate (kg/palm/year)
ILD1
Level
0
1
2
N (kg N26)
0.00
3.18
6.36
P (kg CIRP)
0.00
1.82
3.64
K (kg KCI)
0.00
2.50
5.00
Mg (kg Kies)
0.00
1.82
3.64
ILD2
Level
0
1
2
3
N (kg N26)
0.00
1.82
3.64
5.45
K (kg KCI)
0.00
2.73
5.45
8.18
ILD3
Level
1
2
3
4
N (kg N26)
1.34
3.64
5.92
8.20
P (kg CIRP)
0.90
1.80
---
K (kg KCI)
1.34
3.64
5.92
8.20
K (kg KCI)
1.68
3.36
6.72
Mg (kg Kies)
0.58
1.15
2.30
ILD4
Level
1
2
3
N (kg AS)
1.60
3.20
6.40
ILD5
Level
0
1
2
N (kg N26)
0.00
1.36
2.73
ILD6
Level
1
2
3
N (kg AS)
1.45
2.90
4.80
P (kg CIRP)
0.75
1.50
3.00
K (kg KCI)
1.65
3.30
6.60
Mg (Kies)
0.60
1.20
2.40
ILD7
Level
1
2
3
N (kg AS)
1.73
3.46
6.92
P (kg CIRP)
0.91
1.82
3.64
K (kg KCI)
1.39
2.78
5.56
Mg (Kies)
0.70
1.39
2.78
P (kg CIRP)
0.75
1.50
3.00
P (kg CIRP)
0.00
2.27
4.55
K (kg KCI)
0.00
2.27
4.55
262
Appendix C
Summary of macro nutrients needed by plants
Nitrogen (N): Nitrogen is essential constituent of proteins, nuclide acid and various
coenzymes. The majority of biochemical reactions in the plant are catalyzed by
enzymes, which are protein and thus nitrogen plays an essential part in virtually all
physiological process. A general symptom of nitrogen deficiency is chlorosis of the
leaves, since chlorophyll synthesis is inhibited; this leads to reduced photosynthesis.
While reduced proteins synthesis will result in a general loss of ‘vigour’. Corley and
Mork (1972) showed that nitrogen fertiliser caused increased leaf area, leaf weight
and rate of leaf production and a higher net assimilation rate.
Phosphorus (P): Phosphate is an essential constituent of nuclide acids and
phospholipids, while the processes of photosynthesis and respiration involve reaction
among sugar phosphates and coenzymes adenosine diphosphate and triphosphate
(ADP and ATP) and nicotine-adenin dinucleotide phosphate. ATP in particular, is an
intermediate metabolite whose breakdown, to ADP and phosphate, releases energy,
which is used for almost all the energy requiring processes in plant metabolism. ATP
is produced in both photosynthesis and respiration, and its synthesis depends on a
supply of ADP and phosphate. Hence phosphate deficiency can be expected to cause
considerable disruption of metabolism.
Potassium (K): The main function of potassium is an activator of numerous
enzymes; that is, presence of potassium ions is necessary for activity of the enzyme,
but potassium is not constituent of the actual enzyme molecule, nor of a co-factor. In
the oil palm, stomata resistance is considerably increased when potassium is
deficient (Corley 1976 and Tinker 1976).
Magnesium (Mg): Magnesium has a general role as an enzyme activator, and is
required by even more enzymes than potassium. The requirement is not always very
specific, and other divalent cations can often substitute, in particular manganese.
Among many systems requiring magnesium is that of fatty acid synthesis.
263
Magnesium is an essential component of the chlorophyll molecule, and magnesium
deficiency in maize was found to cause decreased chlorophyll content and hence
decreased photosynthesis (Corely 1976).
Calcium (Ca): Calcium is an essential component of an enzyme, amylase, and is
required as an activator by some enzymes. It is also a major component of the middle
famella of plant cell walls, and may therefore have effects on the mechanical strength
of tissues. Epstein (1972) point out that a major function of calcium may be in
maintaining cell organization.
Nutrient uptake is selective; chemically similar ions may be taken up at quite
different rates. For example, the rate of potassium uptake was found to be
independent of the presence of the sodium in the solution, even when the
concentration of sodium was one hundred times that of potassium (Epstein 1972).
Interaction in the uptake of pairs of chemically dissimilar ions occur; the negative
relationship between tissue levels of potassium and magnesium is well known, and
probably results from competition between the ions in uptake or translocation.
Uptake of cations is independent of anion uptake. For example, the potassium uptake
occur at a similar rate whether the anion in the solution is chloride or sulphate,
although sulphate is taken up much more slowly than chloride (Epstein 1972). Where
a cation is taken up faster than the anion, the anion deficit in the cell is compensated
for by production of organic anions such as malate.
264
Appendix D
The list of paper published from 2001 until Now
International Level
1. Zuhaimy Ismail, Azme Khamis and Md Yunus Jaafar, (2003). Fitting nonlinear
Gompertz curve to tobacco growth data. Pakistan Journal of Agronomy, 2:
223-236.
2. Azme Khamis and Zuhaimy Ismail. (2004). Comparative study on nonlinear
growth model to tobacco leaf growth data. Journal of Agronomy, 3 (2): 147153.
3. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed.
(2005). The effects of outliers data on neural network performance. Journal
of Applied Sciences. 5(8): 1394-1398
4. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed.
(2005). Modelling oil palm yield using multiple linear regression and robust
M-regression. Journal of Agronomy 5(1): 32-36.
5. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed.
(2005). Nonlinear Growth Models for Modeling Oil Palm Yield Growth.
Journal of Mathematics and Statistics.1(3): 225-233
6. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed.
Neural Network Model for Oil Palm (Eleais guineensis. Jacq.) Yield
Modeling. Journal of Applied Science (In Press).
7. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed.
The Use of Response Surface Analysis in Obtaining Maximum Profit in Oil
Palm Industry. Songklanakarin Journal of Science and Technology Vol.
28(3), May-June 2006. (In Press).
265
National Level
1. Zuhaimy Ismail and Azme Khamis. (2001). A review on combining neural
networks and genetic algorithm. Laporan Teknik, LT/M Bil. 5/2001. Jabatan
Matematik, Universiti Teknologi Malaysia.
2. Zuhaimy Ismail and Azme Khamis, (2002). A review on neural networks and its
application in Forecasting. Laporan Teknik, LT/M Bil. 3/2002. Jabatan
Matematik, Universiti Teknologi Malaysia
3. Zuhaimy Ismail dan Azme Khamis.
(2003). Perbandingan di antara regresi
berganda dan regresi komponen utama untuk menganggar harga minyak
sawit Mentah. Prosidang Simposium Kebangsaan Sains Matematik Ke-11, di
Kota Kinabalu, Sabah pada 22-24 Disember 2003. 647-656.
4. Zuhaimy Ismail dan Azme Khamis. (2003). Rangkaian neural dalam peramalan
harga minyak kelapa sawit. Jurnal Teknologi C: 17-28.
5. Azme Khamis, Zuhaimy Ismail dan Ani Shabri. (2003). Pemodelan harga-harga
minyak
sayuran
menggunakan
analisis
regresi
linear
berganda.
MATEMATIKA, 19: 59-70.
6. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed.
The effects of foliar nutrient compositions on oil palm yield (In progress).
7. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed.
Modelling In Oil Palm Industry: A Review (In Progress)
Seminar/Symposium
1. Malaysian Science and Technology Congress (MSTC) 2002, Symposium C: Life
Science, 12-14 December 2002, Kuching Sarawak.
2. Simposium Kebangsaan Sains Matematik Ke-X, organized by Universiti
Teknologi Malaysia and Persatuan Sains Matematik (PERSAMA) at Johor Bahru,
Johor. 23-24 December 2002.
3. Simposium Kebangsaan Sains Matematik Ke-XI, organized by Universiti
Malaysia Sabah and PERSAMA at Kota Kinabalu, Sabah. 23-24 December 2003.
266
4. Seminar Jabatan Matematik, Universiti Teknologi Malaysia (UTM) pada 25 Ogos
2004.
5. Annual Foundation of Science Seminar (AFSS) at Universiti Teknologi Malaysia
on 4-5 July 2005.
6. International Science Congress (ISC) 2005. (Incorporating Malaysian Science
Technology Convention (MASTEC) 2005 and MASO 2005) at Putra World Trade
Center, Kuala Lumpur on 4-7 August 2005.
267
Appendix E
The ridge analysis
Consider the B matrix discussed in equation (3.65). Consider also the P matrix
(orthogonal) which diagonalizes B. That is,
P′BP = Λ
0
⎤
⎡λ1
⎥
⎢
λ2
⎥
⎢
=
⎥
⎢
...
⎥
⎢
0
λm ⎦
⎣
where the λi are the eigenvalues of B. The solution x that produces locations where
∂L
1
= 0 is given by (B - κI)x = − b
∂x
2
If we pre-multiply (B - κI) by P′ and post-multiple by P we obtain
P′(B - κI)P = Λ- κI
because P′P = Im. If (B - κI) is negative definite, the resulting solution x is at least a
local maximum on the radius H = (x′x)1/2. On the other hand if (B - κI) is positive
definite, the result is a local minimum. Because
(B - κI) = Λ- κI
⎡λ1 − κ
⎢
=⎢
⎢
⎢
⎣ 0
⎤
⎥
⎥,
⎥
⎥
λm − κ ⎦
0
λ2 − κ
then if κ > λmax, (B - κI) is negative definite and if κ < λmin, (B - κI) is positive
definite.
268
Appendix F
(i) Nonlinear least squares iterative phase, nonlinear least squares summary statistics
and normal probability plot for the logistic growth model
Iter
α
β
0 27.500000
0.550000
1 27.784101
0.437181
2 28.070708
0.443325
3 28.359738
0.398224
4 28.271777
0.434605
5 28.526438
0.450143
6 36.872926
2.065726
7 36.914557
2.109364
8 36.919986
2.119640
9 36.824820
2.179654
10 36.332521
3.329466
11 36.895564
4.793287
12 36.814083
4.951146
13 36.806923
5.191885
14 37.089334
4.978397
15 37.087246
4.775178
16 37.090585
4.784499
17 37.086662
4.784170
18 37.079051
4.822734
19 37.080554
4.814801
20 37.080606
4.814883
Note: Convergence criterion met.
Source
DF
Regression
3
Residual
16
Uncorrected Total 19
(Corrected Total) 18
κ
Sum of Squares
1.250000 1549.336981
0.821661 1454.859996
0.633481 1368.691939
0.432456 1326.059200
0.586234 1322.598926
0.492724 1265.244775
0.467708 120.790077
0.515355 102.781174
0.547810 100.262635
0.542003
98.127250
0.663181
74.732572
0.820524
58.338084
0.829236
58.204258
0.837600
57.803919
0.784076
56.336902
0.778952
56.153913
0.779078
56.152935
0.778778
56.152772
0.782226
56.150410
0.781682
56.150223
0.781682
56.150223
Sum of Squares
22234.038977
56.150223
22290.189200
994.230779
Mean Square
7411.346326
3.509389
Normal Probability Plot
2.5+
++*++ *
|
+*+*+*
|
* ***+***
|
* * **+++
|
++++++
|
++++++*
| +++++ *
-4.5++++
*
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
269
(ii) Nonlinear least squares iterative phase, nonlinear least squares summary statistics
and normal probability plot for the Gompertz growth model
Iter
α
β
0 35.000000
0.605000
1 35.049524
0.510863
2 35.292655
0.796643
3 35.483424
0.651081
4 35.525819
0.641890
5 35.558886
0.649027
6 35.559849
0.647886
7 36.494416
1.388648
8 36.569993
1.472994
9 37.076966
1.960631
10 37.074823
1.958641
11 37.161537
2.017509
12 36.843065
2.065772
13 37.159282
2.274528
14 37.186649
2.275576
15 37.192075
2.265236
16 37.187339
2.264565
17 37.186098
2.261244
18 37.185294
2.261130
19 37.180485
2.265362
20 37.178973
2.268331
21 37.178910
2.268318
22 37.178870
2.268295
Note: Convergence criterion met
Source
DF
Regression
3
Residual
16
Uncorrected Total 19
(Corrected Total) 18
κ
1.250000
0.940512
1.156999
0.504601
0.405256
0.365298
0.361218
0.373371
0.413736
0.530213
0.530140
0.538426
0.590572
0.621459
0.620041
0.618033
0.611715
0.611743
0.611758
0.612635
0.613302
0.613233
0.613232
Sum of Squares
746.480434
682.655909
647.769394
367.245225
321.924597
305.800586
305.673059
162.906709
123.313547
67.721805
67.721114
65.990714
62.799193
60.031550
60.008103
60.000902
59.907512
59.906776
59.906715
59.905620
59.905488
59.905481
59.905481
Sum of Squares Mean Square
22230.283719 7410.094573
59.905481
3.744093
22290.189200
994.230779
Variable=YRESID
Normal Probability Plot
2.5+
+++*++ *
|
+*+*+*
|
* ***+***
|
* * **+++
|
++++++
|
+++++*
|
+++++ *
-4.5+++++ *
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
270
(iii) Nonlinear least squares iterative phase, nonlinear least squares summary
statistics and normal probability plot for the von Bertalanffy growth model
Iter
α
β
0 33.000000 -0.050000
1 35.023737 -0.014217
2 35.259687 -0.038748
3 35.460718 -0.049915
4 37.072682 -0.046658
5 37.075892 -0.041762
6 37.079520 -0.035101
7 37.078108 -0.033745
8 37.065582 -0.061425
9 37.068230 -0.062993
10 37.068501 -0.062515
11 37.065402 -0.062551
12 37.066245 -0.061998
13 37.087627 -0.060947
14 37.076089 -0.062608
15 37.070430 -0.059468
16 37.066570 -0.057282
17 37.068257 -0.058354
18 37.064574 -0.056308
19 37.065255 -0.056627
20 37.064611 -0.056125
21 37.063814 -0.055596
22 37.060425 -0.053924
23 37.057934 -0.052462
24 37.046978 -0.048243
25 37.046618 -0.048856
26 37.046616 -0.048777
27 37.046529 -0.048572
28 37.043026 -0.046837
29 37.041871 -0.045239
30 37.042169 -0.045275
31 37.042016 -0.045635
32 37.041767 -0.045685
33 37.041866 -0.045607
34 37.041787 -0.045561
35 37.041600 -0.045527
36 37.041603 -0.045522
Note: Convergence criterion met
κ
0.750000
1.089793
1.003824
0.981359
0.904712
0.929619
0.971731
0.967617
0.904977
0.904702
0.902104
0.901063
0.899972
0.849564
0.847887
0.851623
0.854553
0.853254
0.856166
0.855948
0.856523
0.857331
0.859822
0.862092
0.869122
0.869373
0.869316
0.869590
0.871616
0.873640
0.873495
0.872874
0.872721
0.872944
0.873010
0.873113
0.873121
δ
Sum of Squares
2.500000 380.624938
3.055292 110.839035
2.630589 98.069723
2.482279 92.784953
2.506261 57.711529
2.585367 57.006317
2.716818 56.899189
2.720237 56.775826
2.446393 56.554304
2.425974 56.481544
2.421153 56.452830
2.430985 56.425526
2.426527 56.354069
2.382534 55.843272
2.369937 55.837479
2.393605 55.827506
2.410832 55.822992
2.403249 55.820656
2.420114 55.814715
2.418831 55.812588
2.422487 55.812014
2.426665 55.811648
2.440569 55.809136
2.453299 55.807034
2.491502 55.806373
2.490510 55.799233
2.490282 55.797878
2.491954 55.797666
2.507261 55.796739
2.522672 55.796239
2.522255 55.796210
2.519291 55.795827
2.518556 55.795804
2.519454 55.795800
2.519914 55.795799
2.520290 55.795798
2.520335 55.795798
Source
DF
Sum of Squares
Regression
Residual
Uncorrected Total
(Corrected Total)
4
15
19
18
22234.393402
55.795798
22290.189200
994.230779
Mean Square
5558.598350
3.719720
271
Normal Probability Plot
2.5+
++*+++*
|
+*+*+*
|
* ***+***
|
* * **++++
|
++++++
|
++++++*
| +++++ *
-4.5++++
*
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
(iv) Nonlinear least squares iterative phase, nonlinear least squares summary
statistics and normal probability plot for the negative exponential growth model
Iter
α
β
Sum of Squares
0 33.000000
0.750000 378.689391
1 34.984151
0.382846 198.582665
2 37.703541
0.363557 86.365416
3 37.754348
0.408138 78.461783
4 37.453151
0.408425 77.140826
5 37.457453
0.408447 77.140560
6 37.505210
0.404297 77.092328
7 37.501321
0.404674 77.090988
8 37.501711
0.404673 77.090988
Note: Convergence criterion met
Source
DF
Regression
2
Residual
17
Uncorrected Total 19
(Corrected Total) 18
Sum of Squares
22213.098212
77.090988
22290.189200
994.230779
Mean Square
11106.549106
4.534764
Normal Probability Plot
2.5+
++++* *
|
**+*+* *
|
* ***+*++
|
**++++
|
++*++
|
+++*+*
|
+++++*
-4.5+ ++++ *
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
272
(v) Nonlinear least squares iterative phase, nonlinear least squares summary statistics
and normal probability plot for the monomolecular growth model
Iter
α
β
0 33.000000
0.050000
1 33.032719
0.055313
2 33.167509
0.087638
3 33.172952
0.088424
4 33.169883
0.088022
5 33.200838
0.095092
6 35.343956
0.639100
7 35.551615
0.681366
8 35.587558
0.629953
9 35.854233
0.692699
10 35.904523
0.699714
11 35.928524
0.703668
12 35.929749
0.703622
13 35.930117
0.703661
14 37.030801
0.985914
15 37.449930
1.116188
16 37.802734
1.214744
17 37.784063
1.210648
18 37.461509
1.172099
19 37.232354
1.164131
20 37.332588
1.138833
21 37.328771
1.139745
22 37.328512
1.139692
23 37.328118
1.139644
24 37.328278
1.140025
25 37.323417
1.140850
26 37.323469
1.140840
Note: Convergence criterion met.
Source
DF
Regression
3
Residual
16
Uncorrected Total 19
(Corrected Total) 18
κ
Sum of Squares
0.750000 952.737196
0.578546 934.320019
0.253167 855.574556
0.212198 855.374636
0.231795 854.973130
0.198418 846.660342
1.573268 825.135717
1.496942 801.224046
0.369259 234.290890
0.448329 224.648373
0.379435 188.420044
0.353969 182.218343
0.350383 182.196122
0.349565 182.185870
0.306075 177.849315
0.527952 97.505012
0.458968 77.006483
0.455475 76.910058
0.451685 72.445808
0.467682 70.908883
0.455252 70.731600
0.458701 70.685734
0.458658 70.685732
0.458671 70.685727
0.458940 70.685700
0.459255 70.685268
0.459251 70.685268
Sum of Squares
22219.503932
70.685268
22290.189200
994.230779
Mean Square
7406.501311
4.417829
Normal Probability Plot
2.5+
+++*+ *
|
***+*+*+*
|
* ***++++
|
**++++
|
+*+*+
|
++++*
|
+++++ *
-4.5++++++ *
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
273
(vii) Nonlinear least squares iterative phase, nonlinear least squares summary
statistics and normal probability plot for the log-logistic growth model
Iter
α
β
0 33.000000
0.050000
1 36.873205
0.586054
2 37.765368
0.683157
3 40.797958
1.375772
4 42.764468
1.286080
5 42.364462
1.163568
6 42.443784
1.764739
7 42.438028
1.603621
8 42.510603
1.788032
9 42.519733
1.969985
10 36.176134
4.723675
11 37.831765
3.606682
12 38.010828
3.100183
13 38.108277
3.134929
14 38.105742
3.148018
15 38.105381
3.145318
16 38.111655
3.188866
17 38.114910
3.190565
18 38.118430
3.189399
19 38.117328
3.195449
20 38.117246
3.194337
21 38.117302
3.194701
22 38.117217
3.194701
Note: Convergence criterion met.
Source
Regression
Residual
Uncorrected Total
(Corrected Total)
DF
3
16
19
18
κ
Sum of Squares
0.750000 925.951285
1.994110 496.804068
0.973327 336.498308
1.490174 330.177803
0.673665 301.635265
0.721770 253.547673
1.282947 246.848668
1.162195 219.956210
1.231401 216.205677
1.290002 210.482475
2.087874 166.794624
1.931179 92.358634
1.869265 88.478175
1.872005 88.275154
1.873689 88.270873
1.873690 88.270792
1.887547 88.242252
1.887523 88.241946
1.886678 88.241941
1.887102 88.241687
1.887335 88.241584
1.887450 88.241581
1.887462 88.241581
Sum of Squares
22201.947619
88.241581
22290.189200
994.230779
Mean Square
7400.649206
5.515099
Normal Probability Plot
2.5+
*+*++*
*
|
***+*+
|
***+++
|
***++
|
++*++
|
++*+*
|
++++*
-4.5+
++++*
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
274
(viii) Nonlinear least squares iterative phase, nonlinear least squares summary
statistics and normal probability plot for the Richard’s growth model
Iter α
β
0 38.500000
1.000000
1 37.037098
3.791557
2 37.070057
5.463712
3 37.075208
5.206513
4 37.071095
7.077390
5 37.042877
5.077377
6 37.046336 10.548642
7 36.976194 14.796500
8 36.996401 12.599241
9 37.008334 11.950700
10 37.006412 13.561781
11 37.007115 13.522883
12 37.058076 10.278632
13 37.040608 10.476589
14 37.040910 10.471770
15 37.043505 10.466284
16 37.045387 10.513839
17 37.044652 10.622521
18 37.040266 11.174733
19 37.039364 11.193722
20 37.039959 11.185724
21 37.040101 11.196660
22 37.041969 11.086868
23 37.042311 11.014116
24 37.042011 11.017303
25 37.041910 11.045851
26 37.041868 11.043308
Note: Convergence criterion met.
Source
Regression
Residual
Uncorrected Total
(Corrected Total)
DF
4
15
19
18
κ
0.750000
0.896609
0.835800
0.835216
0.860141
0.761043
0.890059
0.894784
0.894737
0.890894
0.905913
0.905719
0.862950
0.866185
0.866327
0.866290
0.866820
0.868164
0.875043
0.875488
0.875251
0.875299
0.873544
0.872708
0.872742
0.873021
0.872991
δ
Sum of Squares
0.250000 101.120493
0.855818 80.655550
1.177625 67.886130
1.136591 67.827042
1.349998 66.974132
0.973428 61.746233
1.605857 61.377600
1.616958 59.914409
1.493960 58.277801
1.524840 56.078416
1.650332 55.838269
1.646851 55.838119
1.465099 55.817393
1.486096 55.797584
1.485281 55.797474
1.484138 55.797358
1.487882 55.797112
1.494567 55.796621
1.528595 55.796294
1.527649 55.796149
1.528173 55.796040
1.528254 55.796031
1.522826 55.795815
1.518612 55.795807
1.518904 55.795807
1.520647 55.795804
1.520482 55.795804
Sum of Squares
22234.393396
55.795804
22290.189200
994.230779
Mean Square
5558.598349
3.719720
Normal Probability Plot
2.5+
++*+++*
|
+*+*+*
|
* ***+***
|
* * **++++
|
++++++
|
++++++*
| +++++ *
-4.5++++
*
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
275
(ix) Nonlinear least squares iterative phase, nonlinear least squares summary
statistics and normal probability plot for the Weibull growth model
Iter
α
β
0 38.500000
1.000000
1 36.693634 -5.578793
2 37.186435 -3.876103
3 37.221938 -6.028528
4 37.164563 -5.576102
5 37.189684 -5.807271
6 37.189151 -5.798332
7 37.189417 -5.802804
8 37.189421 -5.802874
9 37.189421 -5.802874
10 37.247139 -5.802853
11 37.307196 -5.165007
12 37.311748 -5.235857
13 37.316802 -5.302401
14 37.314270 -5.268956
15 37.314591 -5.273087
16 37.327455 -5.200161
17 37.323384 -5.245264
18 37.323416 -5.245274
Note: Convergence criterion met
Source
Regression
Residual
Uncorrected Total
(Corrected Total)
DF
4
15
19
18
κ
0.250000
0.389093
0.351401
0.399031
0.398703
0.392160
0.359488
0.349130
0.348513
0.348513
0.348535
0.340476
0.341665
0.342307
0.342075
0.342118
0.341330
0.341550
0.341543
δ
Sum of Squares
1.250000 117.719064
1.250000 75.335040
1.349939 74.152620
1.201141 71.303092
1.175434 70.975680
1.203346 70.957721
1.304608 70.906026
1.341709 70.904658
1.344404 70.904630
1.344404 70.904630
1.344311 70.801907
1.344214 70.697946
1.344206 70.687157
1.344198 70.686437
1.344202 70.686418
1.344202 70.686417
1.344181 70.686224
1.344188 70.685275
1.344188 70.685275
Sum of Squares
22219.503925
70.685275
22290.189200
994.230779
Mean Square
5554.875981
4.712352
Normal Probability Plot
2.5+
+++*+ *
|
***+*+*+*
|
* ***++++
|
**+++
|
+*+*+
|
++++*
|
+++++ *
-4.5++++++ *
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
276
(x) Nonlinear least squares iterative phase, nonlinear least squares summary statistics
and normal probability plot for the Morgan-Mercer-Flodin growth model
Iter
α
β
0 38.500000
1.000000
1 38.249392
3.869795
2 37.230205 12.119868
3 37.326554
8.955990
4 37.023578 10.832959
5 36.871996 11.072769
6 36.889486 11.362276
7 37.132723 11.456073
8 37.181558 11.492333
9 37.206113 11.417663
10 37.202889 11.529857
11 37.203170 11.537770
12 37.206314 11.534682
13 37.205619 11.511673
14 37.202367 11.523844
15 37.202640 11.523035
16 37.202705 11.526734
17 37.203101 11.526568
18 37.203064 11.526166
19 37.203431 11.522900
20 37.203295 11.523517
21 37.203293 11.523466
Note: Convergence criterion met.
Source
Regression
Residual
Uncorrected Total
(Corrected Total)
DF
4
15
19
18
κ
0.500000
0.449415
0.341647
0.379149
0.369296
0.363338
0.357945
0.351383
0.352844
0.353986
0.353840
0.353702
0.353656
0.353584
0.353436
0.353450
0.353388
0.353383
0.353379
0.353425
0.353417
0.353418
δ
Sum of Squares
2.000000 89.336065
2.191960 81.251936
2.869364 69.940937
2.957411 64.622283
3.546338 62.432376
3.451139 62.327671
3.454736 62.112501
3.562961 61.132272
3.487270 60.924105
3.422928 60.896587
3.428889 60.888174
3.429581 60.888066
3.429003 60.887743
3.426838 60.886582
3.436280 60.886174
3.435540 60.886160
3.435925 60.886156
3.435800 60.886154
3.435800 60.886153
3.434246 60.886148
3.434779 60.886146
3.434784 60.886146
Sum of Squares
22229.303054
60.886146
22290.189200
994.230779
Mean Square
5557.325763
4.059076
Normal Probability Plot
2.5+
+++*++ *
|
+*+*+*
|
***+***
|
* * ***+++
|
+++++
|
+++++*
|
+++++ *
-4.5+++++ *
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
277
(xi) Nonlinear least squares iterative phase, nonlinear least squares summary
statistics and normal probability plot for the Chapman-Richards growth model
(without initial stage)
Iter
α
β
0 38.500000
0.560000
1 33.993720
0.513276
2 35.792623
0.530343
3 49.643601
0.696943
4 50.538640
0.707388
5 47.031521
0.670853
6 42.550992
0.648050
7 42.997201
0.670309
8 42.652712
0.657826
9 43.028836
0.699233
10 40.845932
0.861524
11 38.961788
0.956035
12 38.581442
0.973301
13 38.306085
0.995402
14 38.026912
1.022873
15 37.062580
1.070834
16 36.413954
1.051037
17 36.408496
1.050035
18 35.893254
0.913519
19 35.908290
0.909064
20 35.887363
0.909787
21 35.851319
0.894635
22 35.810845
0.765624
23 35.810683
0.768957
24 35.831330
0.835459
25 35.818542
0.785766
26 35.811810
0.735750
27 35.816218
0.739790
28 35.800050
0.679127
29 35.808517
0.674202
30 35.804077
0.685928
31 35.848106
0.484023
32 35.846885
0.489853
33 35.850257
0.492709
34 35.850251
0.492708
Note: Convergence criterion met.
Source
Regression
Residual
Uncorrected Total
(Corrected Total)
DF
4
15
19
18
κ
0.045000
0.062003
0.053794
0.072607
0.074171
0.087675
0.120025
0.128805
0.122262
0.130870
0.145267
0.166807
0.170144
0.173753
0.176798
0.206045
0.264899
0.265892
0.314263
0.312353
0.315521
0.330325
0.387839
0.386393
0.358676
0.374956
0.390200
0.384300
0.401744
0.395098
0.394744
0.453282
0.451648
0.448850
0.448852
δ
Sum of Squares
0.080000 2468.767845
0.080000 2464.777917
0.083203 2460.989879
-0.071904 482.252977
-0.080944 474.707577
-0.080292 415.564276
-0.158240 327.059886
-0.214598 322.513467
-0.181961 322.074734
-0.282806 318.978586
-0.710588 241.188647
-0.983996 204.952110
-1.037388 204.731139
-1.083441 202.446113
-1.151367 201.205523
-1.110594 193.262002
-0.541846 183.878853
-0.530933 183.864537
-0.186842 171.741554
-0.219698 170.976538
-0.186385 170.786205
-0.059910 166.886532
0.346693 156.097171
0.338015 156.077072
0.103711 151.906854
0.229262 147.710313
0.328394 142.992947
0.272940 141.524912
0.408820 139.606336
0.327855 138.345518
0.341492 138.220997
0.653138 119.614321
0.644287 119.574830
0.615498 117.615892
0.615520 117.615891
Sum of Squares
22172.573309
117.615891
22290.189200
994.230779
Mean Square
5543.143327
7.841059
278
Normal Probability Plot
4.5+
++++ *
|
+*+*+ *
|
* ***+*
|
* **+++
0.5+
**++
|
+++++
|
+++* * *
|
+++++*
-3.5+ +++++ *
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
279
(xii) Nonlinear least squares iterative phase, nonlinear least squares summary
statistics and normal probability plot for the Chapman-Richards growth model (with
initial stage)
Iter
α
β
0 38.500000
0.500000
1 37.217360
0.484961
2 36.907855
0.481578
3 56.656108
0.682619
4 35.289487
0.448210
5 40.754837
0.521055
6 40.048640
0.526815
7 40.462229
0.552919
8 38.210043
0.600941
9 32.950994
0.893468
10 33.655694
0.885063
11 34.742333
0.899866
12 36.103646
0.919999
13 36.043927
0.893442
14 36.166539
0.965428
15 36.190951
0.978607
16 36.333225
0.988828
17 36.457478
0.992876
18 37.347514
0.992236
19 37.292582
0.997060
20 37.286322
0.995257
21 37.315466
0.992053
22 37.257466
0.958418
23 37.256833
0.958931
24 37.189980
0.886867
25 37.198506
0.908511
26 37.145752
0.901760
27 37.148035
0.902718
28 37.149930
0.902755
29 37.196462
0.877263
30 37.191931
0.842678
31 37.184449
0.839492
32 37.188198
0.855167
33 37.187170
0.855663
34 37.190470
0.855694
35 37.191925
0.855342
36 37.207464
0.856268
37 37.206252
0.855007
38 37.206384
0.854916
39 37.204178
0.854171
40 37.204479
0.854009
41 37.203555
0.853873
42 37.203612
0.853018
43 37.203615
0.853017
Note: Convergence criterion met.
κ
0.045000
0.050819
0.052202
0.078506
0.161607
0.170883
0.156056
0.139103
0.177091
0.534869
0.540658
0.553465
0.567058
0.535999
0.549325
0.552540
0.533779
0.520870
0.423379
0.572447
0.518608
0.507655
0.507318
0.507703
0.526041
0.524380
0.537214
0.537531
0.537892
0.547903
0.551048
0.552431
0.551066
0.551220
0.550486
0.550912
0.548523
0.548815
0.548829
0.549263
0.549430
0.549784
0.549842
0.549842
δ
0.010000
0.010000
0.009990
0.004653
0.005024
0.002702
0.003065
0.002956
-0.001396
-0.031843
-0.031509
-0.032686
-0.034214
-0.032581
-0.038331
-0.040504
-0.026776
-0.003680
0.188456
0.379137
0.300896
0.300079
0.322802
0.322427
0.409870
0.391729
0.422911
0.424261
0.425508
0.459972
0.488779
0.493205
0.480301
0.480296
0.479558
0.480596
0.478032
0.479395
0.479470
0.480505
0.480889
0.481396
0.482228
0.482230
Sum of Squares
2071.048903
2031.893903
2030.732887
1152.174660
898.906215
703.919386
660.327502
642.855433
523.028032
363.140209
302.512504
230.445532
197.877670
195.366325
158.537844
154.649146
134.127691
118.898898
78.089598
69.421072
66.842736
66.687613
66.361057
66.360482
66.136845
66.011263
65.838707
65.838427
65.836593
65.707827
65.701849
65.700912
65.687721
65.687651
65.685462
65.684933
65.681859
65.681716
65.681715
65.681553
65.681490
65.681482
65.681481
65.681481
280
Source
Regression
Residual
Uncorrected Total
(Corrected Total)
DF
4
16
20
19
Sum of Squares
22224.507719
65.681481
22290.189200
2059.028700
Mean Square
5556.126930
4.105093
Normal Probability Plot
2.5+
+++*++ *
|
* **+*+*
|
** ***++
|
*+**+++
|
+*+*+
|
+++++*
|
+++++ *
-4.5+++++ *
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
281
(xiii) Nonlinear least squares iterative phase, nonlinear least squares summary
statistics and normal probability plot for the Stannard growth model
Iter
α
β
0 35.000000 -1.045000
1 35.640588 -2.067772
2 35.640588 -1.771936
3 35.640588 -1.373180
4 35.463236 -1.256043
5 35.499276 -1.303724
6 36.089690 -1.105630
7 36.155718 -1.139667
8 36.159860 -1.140183
9 36.892215 -1.308567
10 36.888110 -1.295091
11 36.879269 -1.278124
12 36.911693 -1.355531
13 36.884808 -1.337089
14 36.896398 -1.332579
15 36.887265 -1.336947
16 36.895198 -1.361796
17 36.896634 -1.384481
18 37.005026 -1.464346
19 37.004930 -1.464490
20 37.171606 -1.569225
21 37.167611 -1.559235
22 37.143872 -1.505202
23 37.162064 -1.508168
24 37.159311 -1.517878
25 36.992354 -1.586426
26 36.999329 -1.570217
27 36.999731 -1.569415
28 37.000228 -1.569793
29 37.025016 -1.564521
30 37.058818 -1.583146
31 37.043097 -1.578987
32 37.053282 -1.577412
33 37.053698 -1.577679
34 37.043488 -1.578802
35 37.040573 -1.579539
36 37.040666 -1.579107
37 37.040225 -1.579493
38 37.041198 -1.580190
39 37.041105 -1.579913
40 37.041753 -1.579855
41 37.041517 -1.579922
42 37.041535 -1.579912
Note: Convergence criterion met.
κ
0.500000
1.061590
0.800030
0.343750
0.268235
0.295337
0.230696
0.248086
0.247860
0.290225
0.287138
0.283690
0.315211
0.306488
0.304536
0.306815
0.317937
0.326825
0.382989
0.383113
0.519864
0.633889
0.621281
0.606732
0.588598
0.522211
0.532928
0.533323
0.532195
0.552621
0.577006
0.579297
0.576183
0.576647
0.574987
0.571696
0.573094
0.573887
0.574949
0.574530
0.574665
0.574375
0.574317
δ
0.025000
0.025000
0.025000
0.025000
0.022929
0.022982
0.038723
0.039568
0.039633
0.050499
0.051679
0.058741
0.276153
0.230527
0.219595
0.211809
0.236663
0.246988
0.368672
0.368919
0.615387
0.804502
0.783638
0.763620
0.729979
0.555381
0.579347
0.580369
0.578453
0.623102
0.664950
0.669078
0.664059
0.664729
0.659680
0.653109
0.655796
0.657077
0.658573
0.658080
0.658451
0.657871
0.657769
Sum of Squares
519.027261
512.087550
354.520684
104.516052
102.709378
95.856199
85.575323
83.109861
82.957766
75.117621
75.067117
74.687433
64.390085
62.107130
61.857148
61.564232
61.089886
60.930184
59.712098
59.712096
58.473657
56.638522
56.454211
56.429117
56.315092
55.998533
55.872298
55.871923
55.870812
55.820573
55.810653
55.799853
55.798769
55.798768
55.796004
55.795953
55.795872
55.795837
55.795835
55.795806
55.795803
55.795799
55.795799
282
Source
DF
Regression
4
Residual
15
Uncorrected Total 19
(Corrected Total) 18
Sum of Squares
22234.393401
55.795799
22290.189200
994.230779
Mean Square
5558.598350
3.719720
Normal Probability Plot
2.5+
++*+++*
|
+*+*+*
|
* ***+***
|
* * **++++
|
++++++
|
++++++*
| +++++ *
-4.5++++
*
+----+----+----+----+----+----+----+----+----+----+
-2
-1
0
+1
+2
283
Appendix G
The parameters estimate using multiple linear regression for MNC as independent
variables for inland area
Station
B
Std. Error T value (df)
Sig
R2
F value/
df
0.392 73.190
(2, 240)
ILD1 (Constant) -20.311
P
342.077
CA
-14.697
4.103
28.326
2.573
-4.950 (1)
12.076 (1)
-5.712 (1)
0.000
0.000
0.000
ILD2 (Constant) 5.707
MG
-53.642
P
201.609
K
-6.298
N
3.039
4.120
4.308
31.780
1.355
1.500
1.385 (1)
-12.453 (1)
6.344 (1)
-4.647 (1)
2.026 (1)
0.167
0.000
0.000
0.000
0.043
0.422
93.332
(4, 522)
ILD3 (Constant) -30.657
N
21.679
CA
7.863
K
-7.738
MG
15.727
7.254
1.924
0.815
2.343
7.535
-4.226 (1)
11.268 (1)
9.648 (1)
-3.303 (1)
2.087 (1)
0.000
0.000
0.000
0.001
0.038
0.404
46.235
(4, 282)
ILD4 (Constant)
N
CA
P
K
MG
-5.806
8.207
11.621
41.336
-4.932
-7.476
4.212
1.124
2.162
10.595
1.491
3.072
-1.379 (1)
7.303 (1)
5.376 (1)
3.902 (1)
-3.307 (1)
-2.433 (1)
0.169
0.000
0.000
0.000
0.001
0.015
0.185
28.322
(5, 625)
ILD5 (Constant) -15.529
N
12.466
K
4.321
3.329
1.500
1.896
-4.665 (1)
8.311 (1)
2.279 (1)
0.000
0.000
0.024
0.400
64.574
(2, 204)
ILD6 (Constant) 26.527
CA
-14.099
P
286.344
N
-14.127
5.982
3.447
40.512
2.077
4.434 (1)
-4.091 (1)
7.068 (1)
-6.802 (1)
0.000
0.000
0.000
0.000
0.317
25.254
(3, 164)
ILD7 (Constant) 7.008
N
7.131
P
-37.198
CA
5.650
2.651
0.891
12.875
2.337
2.643 (1)
8.005 (1)
-2.889 (1)
2.417 (1)
0.008
0.000
0.004
0.016
0.118
22.542
(3, 536)
ILDT (Constant) -1.007
N
6.782
P
48.554
MG
-11.788
CA
4.130
1.346
0.465
7.128
1.353
0.589
-0.748
14.575 (1)
6.812 (1)
-8.713 (1)
7.012 (1)
0.455
0.000
0.000
0.000
0.000
0.148 112.422
(4, 2598)
284
Appendix H
The parameters estimate using multiple linear regression for MNC as independent
variables for coastal area
Station
CLD1
B
Std. Error T value (df)
Sig
R2
F value/
df
0.380 32.725
(4, 238)
Constant
K
Mg
P
Ca
-44.721
30.526
71.737
198.541
-11.440
9.684
5.361
10.890
60.920
4.796
-4.618 (1)
5.694 (1)
6.588 (1)
3.259 (1)
-2.385 (1)
0.000
0.000
0.000
0.001
0.018
Constant
Mg
N
Ca
0.189
22.750
10.443
-7.509
6.732
4.168
2.458
2.779
0.028 (1)
5.459 (1)
4.248 (1)
-2.702 (1)
0.978
0.000
0.000
0.007
0.171
15.344
(3, 239)
Constant
N
Mg
-9.901
17.664
-18.550
6.258
1.925
6.314
-1.582 (1)
9.178 (1)
-2.938 (1)
0.118
0.000
0.004
0.687
81.083
(2, 78)
Constant
Ca
18.262
12.794
2.472
4.095
7.387 (1)
3.124 (1)
0.000
0.002
0.050
9.762
(1, 322
Constant
Ca
N
P
16.528
11.367
-10.294
185.701
10.879
5.032
3.055
61.097
1.519 (1)
2.259 (1)
-3.370 (1)
3.039 (1)
0.130
0.025
0.001
0.003
0.111
8.728
(3, 238)
Constant
Ca
N
40.988
-12.548
-3.330
4.703
2.935
1.477
8.715 (1)
-4.275 (1)
-2.254 (1)
0.000
0.000
0.025
0.043
9.137
(2, 509)
CLD7 (Constant) 6.810
P
66.648
K
8.788
Mg
-25.001
N
4.804
4.335
17.194
1.555
4.835
1.421
1.571 (1)
3.876 (1)
5.652 (1)
-5.171 (1)
3.380 (1)
0.117
0.000
0.000
0.000
0.001
0.231
36.897
(4, 667)
CLDT (Constant) 8.998
P
128.949
Mg
-6.424
2.188
14.067
1.852
4.113 (1)
9.167 (1)
-3.469 (1)
0.000
0.000
0.001
0.044
44.460
(2, 2315)
CLD2
CLD3
CLD4
CLD5
CLD6
285
Appendix I
1.00
1.00
.75
.75
.50
.50
Expected Cum Prob
Expected Cum Prob
Normal probability plot of multiple linear regression for the inland area
.25
0.00
0.00
.25
.50
.75
.25
0.00
1.00
0.00
Observed Cum Prob
.25
.50
.75
1.00
Observed Cum Prob
ILD1
ILD2
1.00
1.00
.75
.75
.50
Expected Cum Prob
Expected Cum Prob
.50
.25
.25
0.00
0.00
.25
0.00
.25
.50
.75
1.00
1.00
.50
.75
1.00
ILD4
ILD3
1.00
1.00
.75
.75
.50
.50
Expected Cum Prob
Expected Cum Prob
.75
Observed Cum Prob
Observed Cum Prob
.25
0.00
0.00
.25
.50
.75
.25
0.00
0.00
1.00
.25
Observed Cum Prob
Observed Cum Prob
ILD5
ILD6
1.00
1.00
.75
.75
.50
.50
Expected Cum Prob
Expected Cum Prob
.50
0.00
.25
0.00
0.00
.25
.50
.75
1.00
.25
0.00
0.00
.25
Observed Cum Prob
Observed Cum Prob
ILD7
ILDT
.50
.75
1.00
286
Appendix J
1.00
1.00
.75
.75
.50
.50
Expected Cum Prob
Expected Cum Prob
Normal probability plot of multiple linear regression for the coastal area
.25
0.00
0.00
.25
.50
.75
.25
0.00
0.00
1.00
.25
.50
.75
1.00
Observed Cum Prob
Observed Cum Prob
CLD1
CLD2
1.00
1.00
.75
.75
.50
Expected Cum Prob
Expected Cum Prob
.50
.25
.25
0.00
0.00
0.00
0.00
.25
.50
.75
1.00
.25
CLD3
1.00
1.00
1.00
.75
.75
.50
.50
Expected Cum Prob
Expected Cum Prob
.75
CLD4
Observed Cum Prob
.25
0.00
0.00
.25
.50
.75
.25
0.00
0.00
1.00
.25
.50
.75
1.00
Observed Cum Prob
Observed Cum Prob
CLD6
CLD5
1.00
1.00
.75
.75
.50
.50
.25
0.00
0.00
.25
.50
.75
1.00
Expected Cum Prob
Expected Cum Prob
.50
Observed Cum Prob
.25
0.00
0.00
.25
.50
Observed Cum Prob
CLD7
Observed Cum Prob
CLDT
.75
1.00
287
Appendix K
The parameters estimate using multiple linear regression using MNC and NBR as
independent variables for the coastal area.
Variables
CLD1
Constant
P
K
Def-Mg
Mg-Ca
CLD2
Constant
CLP
Def-Mg
N-K
CLD3
Constant
CLP
TB
CLD4
Constant
Ca
CLD5
Constant
Ca
N-P
CLD6
Constant
TLB
CLD7
Constant
Def-K
P
N-P
K-CA
K-MG
Def-Mg
Mg-Ca
CLP
CLDT
Constant
N
P
Mg
N-P
N-Mg
B
Std.
Error
T value
(df)
Sig.
R2
F value/
df
-75.969
185.968
47.134
1.358
-19.911
11.265
61.015
7.420
0.329
10.238
-6.744 (1)
3.048 (1)
6.352 (1)
4.130 (1)
-1.945 (1)
0.000
0.003
0.000
0.000
0.053
0.388
33.985
(4, 214)
-18.975
257.755
0.338
-1.589
8.473
52.184
0.057
0.633
-2.239 (1)
4.939 (1)
5.905 (1)
-2.511 (1)
0.026
0.000
0.000
0.013
0.181
16.802
(3, 228)
-18.173
356.146
-0.152
9.169
40.470
0.051
-1.982 (1)
8.800 (1)
-2.983 (1)
0.051
0.000
0.004
0.676
81.451
(2, 78)
18.262
12.794
2.472
4.095
7.387 (1)
3.124 (1)
0.000
0.002
0.035
9.762
(1, 268)
46.292
11.528
-1.658
7.452
4.972
0.402
6.212 (1)
2.318 (1)
-4.120 (1)
0.000
0.021
0.000
0.108
13.076
(2, 215)
36.039
-0.137
2.544
0.033
14.166 (1)
-4.101 (1)
0.000
0.000
0.033
16.814
(1, 500)
-0.132
3.587
529.187
4.341
-32.683
-15.947
-2.769
84.381
-530.956
17.175
0.550
103.760
.898
7.026
2.913
0.503
20.387
128.616
-0.008 (1)
6.521 (1)
5.100 (1)
4.834 (1)
-4.652 (1)
-5.475 (1)
-5.505 (1)
4.139 (1)
-4.128 (1)
0.994
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.315
28.678
(8, 498)
-64.453
-24.419
701.608
-63.647
5.148
-2.371
16.704
6.562
108.298
6.922
0.959
0.272
-3.859 (1)
-3.721 (1)
6.478 (1)
-9.195 (1)
5.370 (1)
-8.717 (1)
0.000
0.000
0.000
0.000
0.000
0.000
0.094
42.036
(5, 2023)
288
Appendix L
Multiple linear regression using foliar analysis and nutrient balance ratio as
independent variables for the inland area.
Variable
ILD1: Const.
P
Ca
ILD2: Const.
P
K
N-P
N-Mg
ILD3: Const.
N
K
N-P
N-K
K-Mg
ILD4: Const
CLP
K-Ca
N-P
Def-K
Def-Mg
ILD5: Const.
P
N-P
CLP
Mg-Ca
ILD6: Const.
N-P
N-Ca
ILD7: Const.
P
Mg
CLP
N-Ca
N-Mg
ILDT : Const.
N
P
Mg
N-Mg
N-Ca
B
-20.311
342.077
-14.697
4.150
130.174
-4.285
-0.952
1.990
19.393
35.888
-41.774
0.712
-20.692
-0.484
-23.297
234.930
-14.829
-0.706
1.019
0.347
262.286
2621.417
-27.160
3724.885
5.735
64.016
-3.054
3.013
10.776
-43.857
-33.313
188.723
-0.551
-0.236
7.622
8.214
42.607
-24.641
-0.254
-0.591
Std.
Error
4.103
28.326
2.573
5.718
24.067
1.185
0.265
0.149
22.051
7.864
16.246
0.304
9.556
0.181
9.688
25.959
3.564
0.150
0.332
0.154
53.064
485.151
4.813
620.871
2.901
T value (df)
Sig.
-4.950 (1)
12.076 (1)
-5.712 (1)
0.726 (1)
5.409 (1)
-3.616 (1)
-3.597 (1)
13.365 (1)
0.879 (1)
4.564 (1)
-2.571 (1)
2.343 (1)
-2.165 (1)
-2.667 (1)
-2.405 (1)
9.050 (1)
-4.161 (1)
-4.722 (1)
3.075 (1)
2.259 (1)
4.943 (1) (1)
-5.403 (1)
-5.642 (1)
5.999 (1)
1.977 (1)
0.000
0.000
0.000
0.468
0.000
0.000
0.000
0.000
0.380
0.000
0.011
0.020
0.031
0.008
0.016
0.000
0.000
0.000
0.002
0.024
0.000
0.000
0.000
0.000
0.049
5.374
0.353
0.619
4.001
12.981
11.946
21.870
0.163
0.114
1.517
0.509
7.710
2.353
0.042
0.092
11.911 (1)
-8.650 (1)
4.867 (1)
2.694 (1)
-3.378 (1)
-2.789 (1)
8.629 (1)
-3.383 (1)
-2.080 (1)
5.026 (1)
16.026 (1)
5.526 (1)
-10.472 (1)
-6.078 (1)
-6.406 (1)
0.000
0.000
0.000
0.007
0.001
0.005
0.000
0.001
0.038
0.000
0.000
0.000
0.000
0.000
0.000
R2
F value/
df
0.379
73.190
(2, 240)
0.438
101.508
(4, 522)
0.478
38.949
(5, 213)
0.215
34.244
(5, 625)
0.474
45.515
(4, 202)
0.325
39.661
(2, 165)
0.132
16.218
(5, 534)
0.168
102.046
(5, 2529)
289
Appendix M
The Q-Q plot for inland stations.
242
522
-10
-10
-5
0
Residuals
10
Residuals
243
240
0
5
20
10
527
-3
-2
-1
0
1
2
3
30
Quantiles of Standard Normal
-3
-2
-1
ILD1
0
1
2
3
Quantiles of Standard Normal
ILD2
632
Residuals
-10
0
Residuals
0
5
10
10
247
-10
-20
-5
2
1
568
4
-3
-3
-2
-1
0
1
2
-2
-1
3
0
1
2
3
Quantiles of Standard Normal
Quantiles of Standard Normal
ILD4
20
ILD3
22462259
-10
Residuals
-30
-5
-20
0
Residuals
0
5
10
10
202 168
-10
1122
-2
18
-3
-2
-1
0
1
2
0
2
Quantiles of Standard Normal
3
ILD6
Quantiles of Standard Normal
ILD5
-10
5
-10
-5
-5
0
Residuals
0
Residuals
10
5
15
530 536
539
48
51
-3
29
-2
-1
0
1
Quantiles of Standard Normal
-2
-1
0
1
2
Quantiles of Standard Normal
ILD7
ILDT
2
3
290
Appendix N
The Q-Q plot for coastal stations.
Residuals
5
0
4 37
-10
-10
-5
Residuals
-5
0
10
15
5
169 205
-15
1
62
-3
-3
-2
-1
0
1
2
-2
-1
0
1
2
3
Quantiles of Standard Normal
3
CLD2
Quantiles of Standard Normal
CLD1
10
78
Residuals
-10
-20
0
-5
Residuals
0
5
79
17
2 27
14
-2
-1
0
1
2
-3
-2
-1
Quantiles of Standard Normal
0
1
2
CLD3
CLD4
502
-10
0
-10
-5
-5
Residuals
0
Residuals
5
5
10
10
212
3
3
10
-3
-2
-1
0
1
2
-3
3
-2
-1
0
1
2
Quantiles of Standard Normal
Quantiles of Standard Normal
CLD6
2
-3
4
-5
537
569
7
-20
-15
-15
-10
-10
-5
Residuals
0
0
5
5
10
CLD5
Residuals
3
Quantiles of Standard Normal
-2
-1
0
1
Quantiles of Standard Normal
CLD7
2
3
541
-2
0
Quantiles of Standard Normal
CLDT
2
3
495
291
Appendix O
Example of the Matlab programming for neural network application(s)
clear net;
clc;
load c11_coastal.txt
rawdata=c11_coastal;%(randperm(243),1:6); % Randomize
[R Q]=size(rawdata);
P1 = rawdata(:,1); % input1: pN
P1 = P1';
P2 = rawdata(:,2); % input2: pP
P2 = P2';
P3 = rawdata(:,3); % input3: pK
P3 = P3';
P4 = rawdata(:,4); % input4: pCa
P4 = P4';
P5 = rawdata(:,5); % input4: pMg
P5 = P5';
T1 = rawdata(:,6); % input4: ffb yield
T1 = T1';
for i=1:R
data(1,i) = P1(i);
data(2,i) = P2(i);
data(3,i) = P3(i);
data(4,i) = P4(i);
data(5,i) = P5(i);
data(6,i) = T1(i);
end
[N , M] = size(data);
for i = 1:N
max(i) = data(i,1);
min(i) = data(i,1);
for j = 1:M
if data(i,j)>max(i)
max(i)=data(i,j);
end
if data(i,j)<min(i)
min(i)=data(i,j);
end
end
rawdata1(i,:) = ((data(i,:)-min(i))+0.01)/((max(i)-min(i))+0.01);
end
292
randdata=rawdata1(1:6,randperm(219)); % Randomize
P=randdata(1:5,:);
T=randdata(6,:);
trP = P(1:5,1:153); % Input for Train
v.P = P(1:5,154:197); % Input for Validate
t.P = P(1:5,198:219); % Input fot Test
trT = T(1,1:153); % Output for Train
v.T = T(1,154:197); % Output for Validate
t.T = T(1,198:219); % Output for Test
% Neural network Modelling
S1=input(' please input hidden node(s)' ); % number of nodes
net=newff(minmax(P),[S1 1],{'logsig','tansig'},'trainlm');
net.performFcn = 'mse';
net.trainParam.epochs=500;
net.trainParam.goal=1e-6;
net.trainParam.max_fail=5;
net.trainParam.show=5;
net=init(net);
[net,tr]=train(net,trP,trT,[],[],v,t); % early stoping
figure
plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.epoch,tr.tperf);
legend('training','validation','testing',-1);
ylabel('mean squared error');
xlabel('epoch');
%Simulation
an1=sim(net,trP);
e1=trT-an1;
an2=sim(net,v.P);
e2=v.T-an2;
an3=sim(net,t.P);
e3=t.T-an3;
293
tramse=mse(e1)
valmse=mse(e2)
tesmse=mse(e3)
tramae=mae(e1)
valmae=mae(e2)
tesmae=mae(e3)
%Regression
an4=sim(net,P);
figure
[m,b,r]=postreg(an4,T)
%dinormalization data
%Plotting
lg = length (randdata);
j = 1:lg;
j1 = (1:153);
j2 = (154:197);
j3 = (198:219);
figure
plot(j,T,'k+-',j1,an1,'b+--',j2,an2,'g+--',j3,an3,'r+--');
legend('k+-','actual','b+-','training','g+-','validation','r+-','testing',-1);
xlabel('data set');ylabel('yield');
294
Appendix O
(i) Graphical illustration for the best regression line of the data for the ILD1 and
ILD2 stations.
ILD1
ILD2
295
(ii) Graphical illustration for the best regression line of the data for the ILD3 and
ILD4 stations.
ILD3
ILD4
296
(iii) Graphical illustration for the best regression line of the data for the ILD5 and
ILD6 stations.
ILD5
ILD6
297
(iv) Graphical illustration for the best regression line of the data for the ILD7 and
ILDT stations.
ILD7
ILDT
298
Appendix P
(i) Graphical illustration for the best regression line of the data for the CLD1 and
CLD2 stations.
CLD1
CLD2
299
(ii) Graphical illustration for the best regression line of the data for the CLD3 and
CLD4 stations.
CLD3
CLD4
300
(iii) Graphical illustration for the best regression line of the data for the CLD5 and
CLD6 stations.
CLD5
CLD6
301
(iv) Graphical illustration for the best regression line of the data for the CLD7 and
CLDT stations.
CLD7
CLDT
302
Appendix R
The MSE, RMSE, MAE and MAPE values for each neural network model in the
inland area.
Station
ILD1:
LL
LP
LT
TP
TL
TT
ILD2:
LL
LP
LT
TP
TL
TT
ILD3:
LL
LP
LT
TP
TL
TT
ILD4
LL
LP
LT
TP
TL
TT
ILD5
LL
LP
LT
TP
TL
TT
ILD6
LL
LP
LT
TP
TL
TT
MSE
Inland stations
RMSE
MAE
MAPE
13.1079
16.1591
18.3725
17.3737
16.4507
16.9262
3.6204
4.0198
4.2863
4.1681
4.0559
4.1141
2.6677
3.0529
3.3716
3.3228
3.2195
3.2553
0.1159
0.1318
0.1460
0.1443
0.1429
0.1456
11.4734
13.0502
12.5840
13.1767
12.7739
12.6520
3.3872
3.6125
3.5473
3.6299
3.5740
3.5569
2.5910
2.8193
2.7435
2.8527
2.7885
2.7734
0.1162
0.1257
0.1244
0.1306
0.1263
0.1223
12.5471
12.8565
12.9833
11.9431
12.6355
13.2690
3.5421
3.5855
3.6032
3.4558
3.5546
3.6426
2.8732
2.8602
2.8526
2.7076
2.7736
2.8736
0.1195
0.1190
0.1126
0.1092
0.1141
0.1166
14.9820
14.0759
14.4406
14.3058
15.1920
14.7159
3.8706
3.7517
3.8000
3.7823
3.8976
3.8361
3.1113
2.9545
3.0288
2.9811
3.1389
3.0421
0.1356
0.1276
0.1304
0.1288
0.1350
0.1316
5.9225
8.3286
7.1422
6.4416
6.6394
6.1199
2.4336
2.8859
2.6724
2.5380
2.5767
2.4730
1.8398
2.1426
2.0706
1.9076
1.8867
1.8735
0.0944
0.1133
0.1067
0.0964
0.0956
0.0968
9.6650
10.0368
10.6006
9.6884
10.1928
7.0829
3.1088
3.1681
3.2558
3.1126
3.1926
2.6613
2.4421
2.5010
2.4220
2.4864
2.5638
1.7861
0.0987
0.1010
0.0992
0.1018
0.1052
0.0719
303
ILD7
LL
LP
LT
TP
TL
TT
ILDT
LL
LP
LT
TP
TL
TT
21.4372
20.4788
21.0046
20.8042
20.1759
22.1352
4.6300
4.5253
4.5830
4.5611
4.4917
4.7048
3.6377
3.5087
3.6420
3.5111
3.5242
3.7578
0.1674
0.1580
0.1691
0.1642
0.1614
0.1747
2.9298
2.8263
2.8168
2.9255
2.8642
2.8567
1.7116
1.6811
1.6783
1.7104
1.6923
1.6901
1.3014
1.2689
1.2776
1.3086
1.2870
1.2984
0.0576
0.0560
0.0564
0.0578
0.0570
0.0573
304
Appendix S
The MSE, RMSE, MAE and MAPE values for each neural network model in the
coastal area.
Station
CLD1
LL
LP
LT
TP
TL
TT
CLD2
LL
LP
LT
TP
TL
TT
CLD3
LL
LP
LT
TP
TL
TT
CLD4
LL
LP
LT
TP
TL
TT
CLD5
LL
LP
LT
TP
TL
TT
CLD6
LL
LP
LT
TP
TL
TT
MSE
Coastal Stations
RMSE
MAE
MAPE
18.6614
15.9936
17.2149
18.4294
15.1412
19.6718
4.3243
3.9991
4.1490
4.2929
3.8911
4.4353
3.3415
3.0397
3.1762
3.2861
2.8715
3.4314
0.1456
0.1337
0.1398
0.1437
0.1220
0.1461
6.8989
7.5109
6.8896
6.0799
7.7396
7.2305
2.6265
2.7406
2.6248
2.4657
2.7820
2.6889
2.1306
1.9190
2.1182
1.9766
2.3227
2.1741
0.0725
0.0638
0.0717
0.0665
0.0779
0.0743
6.0158
6.6892
6.0254
6.0634
6.7345
6.6433
2.4527
2.5863
2.4546
2.4623
2.5950
2.5774
1.9229
1.7651
1.8708
1.7934
2.0811
1.9708
0.0756
0.0657
0.0750
0.0688
0.0844
0.0789
12.7014
14.8279
13.2776
15.6767
14.8006
12.6890
3.5639
3.8507
3.6438
3.9593
3.8471
3.5621
2.6862
2.9406
2.6550
3.0043
2.7499
2.6248
0.1126
0.1298
0.1118
0.1282
0.1163
0.1125
18.8531
19.0369
19.9523
18.5634
19.5699
19.7144
4.3420
4.3631
4.4667
4.3085
4.4237
4.4400
3.3444
3.4643
3.4873
3.3054
3.5144
3.4900
0.1488
0.1498
0.1578
0.1431
0.1530
0.1514
15.8088
15.2817
15.0867
15.2480
16.4554
16.8372
3.9760
3.9091
3.8841
3.9048
4.0565
4.1033
3.1577
3.0726
3.0213
3.0911
3.2557
3.1893
0.1279
0.1237
0.1232
0.1274
0.1335
0.1302
305
CLD7
LL
LP
LT
TP
TL
TT
CLDT
LL
LP
LT
TP
TL
TT
12.8178
13.0888
13.2882
11.6043
12.5051
13.1083
3.5801
3.6178
3.6452
3.4065
3.5362
3.6205
2.8541
2.8347
2.8827
2.6659
2.8061
2.8880
0.1066
0.1086
0.1088
0.1003
0.1056
0.1103
20.8289
20.0818
21.2836
22.1141
21.7819
21.2177
4.5638
4.4812
4.6134
4.7025
4.6671
4.6062
3.6248
3.5495
3.6542
3.7652
3.7182
3.6741
0.1499
0.1456
0.1516
0.1572
0.1531
0.1518
306
Appendix T
(i) The calculation of total profit (RM) for the ILD3 and ILD4 stations according to
each radius.
Fertiliser level (kg/palm/year)
Station
ILD3
ILD4
Estimated
Total
Radius
N
P
K
Mg
FFB yield
Profit
0.0
4.7800
0.9000
4.1000
-
29.7351
7429.485
0.1
5.081
0.9358
4.2057
-
29.8744
7421.667
0.2
5.3312
0.9707
4.4629
-
29.9887
7389.767
0.3
5.4781
1.0025
4.8707
-
30.0938
7343.894
0.4
5.5525
1.0305
5.3158
-
30.2047
7301.802
0.5
5.5958
1.0560
5.7555
-
30.3281
7267.386
0.6
5.6247
1.0800
6.1862
-
30.4667
7240.201
0.7
5.6461
1.1035
6.6097
-
30.6212
7219.431
0.8
5.6629
1.1264
7.0282
-
30.7923
7204.67
0.9
5.6768
1.1488
7.4428
-
30.9803
7195.667
1.0
5.6887
1.1711
7.8547
-
31.1853
7192.161
0.0
4.0000
1.8750
4.2000
1.4400
25.8575
6999.028
0.1
4.2044
1.8790
4.3248
1.4542
25.9351
6983.408
0.2
4.3186
1.8998
4.5724
1.4448
26.0419
6969.688
0.3
4.3397
1.9274
4.8514
1.4133
26.1475
6962.206
0.4
4.3373
1.9539
5.1096
1.379
26.2814
6969.527
0.5
4.3284
1.9795
5.3557
1.3446
26.446
6989.154
0.6
4.3168
2.0046
5.5953
1.3104
26.642
7020.059
0.7
4.3039
2.0294
5.831
1.2763
26.8696
7061.783
0.8
4.2901
2.0539
6.0641
1.2423
27.129
7114.161
0.9
4.2758
2.0783
6.2956
1.2083
27.4202
7177.004
1.0
4.2611
2.1026
6.5259
1.1745
27.7434
7250.288
307
(ii) The calculation of total profit (RM) for the ILD5 and ILD6 stations according to
each radius
Fertiliser level (kg/palm/year)
Estimated
Station
ILD5
ILD6
Radius
N
P
K
Mg
FFB yield
Total Profit
0.0
1.3650
2.2750
2.2750
-
29.5075
7889.188
0.1
1.4014
2.1037
2.4119
-
29.6488
7916.833
0.2
1.4936
1.9752
2.5417
-
29.7814
7934.744
0.3
1.6307
1.9119
2.6462
-
29.9233
7950.476
0.4
1.7757
1.8845
2.7298
-
30.0891
7973.126
0.5
1.9188
1.8712
2.803
-
30.2842
8005.052
0.6
2.0595
1.8646
2.8706
-
30.5106
8046.636
0.7
2.1983
1.8615
2.9352
-
30.7691
8097.879
0.8
2.3358
1.8604
2.9977
-
31.0602
8158.823
0.9
2.4723
1.8606
3.0589
-
31.384
8229.395
1.0
2.6081
1.8619
3.1192
-
31.7407
8309.577
0.0
4.6700
1.8750
3.9750
2.0800
24.4806
5675.753
0.1
4.9085
1.8581
4.0471
2.1299
24.7152
5704.79
0.2
5.1234
1.8453
4.1449
2.1918
24.9275
5724.58
0.3
5.3048
1.8384
4.2665
2.2679
25.1228
5737.59
0.4
5.4476
1.8388
4.4022
2.3587
25.3072
5747.368
0.5
5.5539
1.8479
4.539
2.4628
25.4866
5757.348
0.6
5.6308
1.8661
4.6654
2.5774
25.6659
5770.159
0.7
5.6859
1.8934
4.7744
2.6996
25.8489
5787.439
0.8
5.7256
1.9291
4.8636
2.8265
26.0384
5810.035
0.9
5.7546
1.9716
4.9333
2.9559
26.2365
5838.355
1.0
5.7761
2.0197
4.9856
3.0861
26.4447
5872.447
308
(iii) The calculation of total profit (RM) for the ILD7 station according to each radius
Fertiliser level (kg/palm/year)
Estimated
Station
ILD7
Radius
N
P
K
Mg
FFB yield
Total Profit
0.0
4.3250
2.2750
3.4750
1.7400
24.4803
5792.874
0.1
4.5706
2.3088
3.4393
1.7512
24.7287
5841.644
0.2
4.8201
2.3354
3.4304
1.7757
24.9778
5885.423
0.3
5.0655
2.3536
3.4351
1.8127
25.2337
5928.851
0.4
5.3021
2.3635
3.447
1.8623
25.5011
5974.67
0.5
5.5273
2.3664
3.4628
1.9225
25.7835
6024.754
0.6
5.7406
2.3636
3.4806
1.9909
26.0841
6080.512
0.7
5.9427
2.3564
3.4994
2.065
26.405
6142.797
0.8
6.1351
2.3461
3.5186
2.1431
26.7479
6212.125
0.9
6.3192
2.3333
3.538
2.2241
27.1141
6288.832
1.0
6.4966
2.3188
3.5574
2.3068
27.5043
6373.06
309
Appendix U
(i) The calculation of total profit (RM) for the CLD1 and CLD2 stations in the
coastal areas according to each radius
Fertiliser level (kg/palm/year)
Station
CLD1
CLD2
Estimated
Total
Radius
N
P
K
Mg
FFB yield
Profit
0.0
1.8200
1.8200
2.7300
1.8200
27.2633
6975.318
0.1
1.8872
1.894
2.9434
1.7663
27.4295
6986.194
0.2
1.9709
1.9481
3.1626
1.7194
27.6082
6998.702
0.3
2.067
1.985
3.3833
1.6779
27.8022
7014.663
0.4
2.1718
2.0082
3.6028
1.6405
28.0132
7035.249
0.5
2.2825
2.0209
3.8199
1.6061
28.2428
7061.291
0.6
2.3975
2.026
4.0342
1.5739
28.4919
7093.169
0.7
2.5152
2.0253
4.2457
1.5433
28.7111
7116.768
0.8
2.6351
2.0202
4.4546
1.514
29.0508
7175.367
0.9
2.7565
2.0117
4.6611
1.4856
29.3615
7225.931
1.0
2.8791
2.0007
4.8655
1.4579
29.6934
7282.87
0.0
1.8200
1.8200
1.3600
1.8200
29.9191
7939.661
0.1
1.8745
1.9835
1.9835
1.8645
30.0254
7859.443
0.2
1.9414
2.1366
2.1366
1.913
30.1287
7845.839
0.3
2.0145
2.2792
2.2792
1.9627
30.2305
7833.232
0.4
2.0906
2.4108
2.4108
2.0117
30.3319
7822.558
0.5
2.1675
2.5315
2.5315
2.0592
30.4344
7814.529
0.6
2.244
2.6422
2.6422
2.1044
30.5389
7809.421
0.7
2.3196
2.7439
2.7439
2.1474
30.6462
7807.296
0.8
2.3937
2.8381
2.8381
2.1882
30.757
7808.106
0.9
2.4665
2.9259
2.9259
2.2271
30.8718
7811.717
1.0
2.5379
3.0086
3.0086
2.2644
30.9909
7817.925
310
(ii) The calculation of total profit (RM) for the CLD3 and CLD4 stations in the
coastal areas according to each radius
Fertiliser level (kg/palm/year)
Estimated
Station
CLD3
CLD4
Radius
N
P
K
Mg
FFB yield
Total Profit
0.0
3.6400
1.8200
3.6400
1.6200
28.2359
6959.635
0.1
4.0036
1.8121
3.6367
1.8200
28.6748
7030.195
0.2
4.3673
1.8053
3.6425
1.8256
29.0506
7100.774
0.3
4.7298
1.8003
3.6652
1.8450
29.3638
7149.483
0.4
5.081
1.8008
3.7274
1.9124
29.6169
7171.094
0.5
5.3372
1.8192
3.8738
2.127
29.8253
7161.207
0.6
5.4454
1.8477
4.0349
2.4012
30.0304
7156.518
0.7
5.5045
1.8741
4.172
2.6447
30.2560
7169.401
0.8
5.5479
1.8986
4.2952
2.8671
30.5079
7195.708
0.9
5.5844
1.9218
4.4104
3.0766
30.7880
7233.378
1.0
5.6169
1.9442
4.5206
3.2781
31.0969
7281.329
0.0
3.6400
1.8200
3.6400
1.8200
30.9592
7723.786
0.1
3.8391
1.9564
3.6175
1.8868
31.0200
7709.367
0.2
4.1241
2.0637
3.5584
1.9331
31.0846
7696.572
0.3
4.4415
2.1512
3.4809
1.9663
31.1577
7688.178
0.4
4.7692
2.2279
3.3953
1.9927
31.2409
7684.185
0.5
5.1002
2.2984
3.3058
2.0153
31.3352
7684.389
0.6
5.4324
2.3654
3.2141
2.0356
31.4408
7688.494
0.7
5.7648
2.4299
3.1211
2.0544
31.5581
7696.443
0.8
6.0973
2.4929
3.0272
2.0722
31.6869
7708.019
0.9
6.4297
2.5547
3.9327
2.0893
31.8276
7577.663
1.0
6.7619
2.6157
2.8378
2.1059
31.9800
7742.055
311
(iii) The calculation of total profit (RM) for the CLD5 and CLD6 stations in the
coastal areas according to each radius
Fertiliser level (kg/palm/year)
Estimated
Station
CLD5
CLD6
Radius
N
P
0.0
2.7300
4.5500
0.1
2.8881
0.2
K
Mg
FFB yield
Total Profit
9.1000 4.5500
26.0478
5162.702
4.7353
8.911
4.2429
26.2357
5247.941
3.0149
4.7345
8.8843 3.8037
26.4774
5352.977
0.3
3.1188
4.6571
8.8651 3.3592
26.8602
5505.119
0.4
3.2131
4.5551
8.8422 2.9228
27.2298
5655.665
0.5
3.3027
4.4429
8.8168 2.4925
27.7504
5850.55
0.6
3.3897
4.3256
8.7898 2.0661
28.369
6074.076
0.7
3.4752
4.2053
8.7616 1.6423
29.0858
6326.131
0.8
3.5596
4.0831
8.7328 1.2204
29.9012
6606.707
0.9
3.6433
3.9596
8.7034 0.7998
30.8152
6915.787
1.0
3.7266
3.8353
8.6736 0.3801
31.8279
7253.349
0.0
2.725
1.365
3.4100
-
32.6262
8541.086
0.1
2.9364
1.3988
3.2121
-
33.0906
8680.256
0.2
3.1074
1.4489
2.9718
-
33.5074
8814.959
0.3
3.2377
1.5115
2.7024
-
33.8947
8948.736
0.4
3.3357
1.5816
2.4183
-
34.2664
9082.954
0.5
3.4111
1.6556
2.1284
-
34.6314
9218.124
0.6
3.4712
1.7317
1.8368
-
34.9953
9354.639
0.7
3.5207
1.8088
1.5453
-
35.3614
9492.779
0.8
3.5628
1.8864
1.2545
-
35.7317
9632.742
0.9
3.5995
1.9642
0.9646
-
36.1077
9774.748
1.0
3.6321
2.0422
0.6756
-
36.4902
9918.895
312
(iv) The calculation of total profit (RM) for the CLD7 station in the coastal areas
according to each radius
Fertiliser level (kg/palm/year)
Estimated
Station
CLD7
Radius
N
P
0.0
2.7250
1.8200
0.1
2.9651
0.2
K
Mg
FFB yield
Total Profit
3.4100
-
31.4923
8186.494
1.8948
3.4892
-
31.7818
8229.529
3.1851
1.9985
3.5571
-
32.0639
8272.324
0.3
3.3774
2.1309
3.611
-
32.3452
8317.951
0.4
3.5395
2.2864
3.6506
-
32.6327
8369.066
0.5
3.6743
2.4568
3.6781
-
32.9326
8427.349
0.6
3.7877
2.6356
3.6965
-
33.2493
8493.435
0.7
3.8852
2.8185
3.7084
-
33.5861
8567.606
0.8
3.9711
3.0035
3.7158
-
33.9447
8649.751
0.9
4.0482
3.1891
3.7199
-
34.3267
8739.965
1.0
4.1189
3.3749
3.7216
-
34.7329
8838.131
Download