1 APPLICATION OF STATISTICAL AND NEURAL NETWORK MODEL FOR OIL PALM YIELD STUDY AZME BIN KHAMIS Faculty of Science Universiti Teknologi Malaysia DECEMBER 2005 2 PSZ 19:16(Pind.1/97) UNIVERSITI TEKNOLOGI MALAYSIA BORANG PENGESAHAN STATUS TESISυ JUDUL: APPLICATION OF STATISTICAL AND NEURAL NETWORK MODEL FOR OIL PALM YIELD STUDY SESI PENGAJIAN: 2005/2006 Saya AZME BIN KHAMIS (HURUF BESAR) mengaku membenarkan tesis (PSM/Sarjana/Doktor Falsafah)* ini disimpan di Perpustakaan Universiti Teknologi Malaysia dengan syarat-syarat kegunaan seperti berikut; 1. 2. 3. 4. Tesis adalah hakmilik Universiti Teknologi Malaysia. Perpustakaan Universiti Teknologi Malaysia dibenarkan membuat salinan untuk tujuan pengajian sahaja. Perpustakaan dibenarkan membuat salinan tesis ini sebagai bahan pertukaran antara institusi pengajian tinggi. **Sila tandakan (√) √ SULIT (Mengandungi maklumat yang berdarjah keselamatan atau kepentingan Malaysia seperti yang termaktub di dalam AKTA RAHSIA RASMI 1972) TERHAD (Mengandungi maklumat TERHAD yang telah ditentukan oleh organisasi/badan di mana penyelidikan dijalankan) TIDAK TERHAD Disahkan oleh ___________________________ ___________________________ (TANDATANGAN PENULIS) (TANDATANGAN PENYELIA) Alamat Tetap: NO. 11 JALAN MANIS 7, TAMAN MANIS 2, 86400 PARIT RAJA, BATU PAHAT, JOHOR. Tarikh: _____________________________ CATATAN: ASSOC. PROF. DR. ZUHAIMY HJ ISMAIL Nama Penyelia Tarikh: ______________________ * Potong yang tidak berkenaan. ** Jika tesis ini SULIT atau TERHAD, sila lampirkan surat daripada pihak berkuasa/organisasi berkenaan dengan menyatakan sekali sebab dan tempoh tesis ini perlu dikelaskan sebagai SULIT atau TERHAD υ Tesis dimaksudkan sebagai tesis bagi Ijazah Doktor Falsafah dan Sarjana secara penyelidikan, atau disertasi bagi pengajian secara kerja kursus dan penyelidikan, atau Laporan Projek Sarjana Muda (PSM). 3 “I/We* hereby declare that I/we* have read this thesis and in my/our* opinion this thesis is sufficient in terms of scope and quality for the award of the degree of Doctor of Philosophy” Signature : ………………………………………………. Name of Supervisor I : Associate Professor Dr. Zuhaimy Bin Hj Ismail ……………………………………………….. Date : ………………………………………………. Signature : ………………………………………………. Name of Supervisor II : Dr. Khalid Bin Haron ………………………………………………. Date : ………………………………………………. Signature : ……………………………………………… Name of Supervisor III : ……………………………………………… Date : ……………………………………………… * Delete as necessary 4 BAHAGIAN A – Pengesahan Kerjasama* Adalah disahkan bahawa projek penyelidikan tesis in telah dilaksanakan melalui kerjasama antara …………………………………... dengan ………………………………………….. Disahkan oleh: Tandatangan : …………………………………………….... Nama : ……………………………………………… Jawatan : ……………………………………………… Tarikh: ……………….. (Cop rasmi) * Jika penyelidikan tesis/projek melibatkan kerjasama _ BAHAGIAN B – Untuk Kegunaan Pejabat Sekolah Pengajian Siswazah Tesis ini telah diperiksa dan diakui oleh: Nama dan Alamat Pemeriksa Luar : Prof. Dr. Zainudin Bin Hj. Jubok Pengarah Pusat Pengurusan Penyelidikan & Persidangan Universiti Malaysia Sabah Kampus Teluk Sepanggor (Beg Berkunci 2073) 88999 Kota Kinabalu, Sabah : Prof Madya Dr. Jamalludin Bin Talib Nama dan Alamat Pemeriksa Dalam I Fakulti Sains UTM, Skudai Nama Penyelia Lain (jika ada) : ..…………………………………………………... ……………………………………………………. ……………………………………………………. Disahkan oleh Penolong Pendaftar di SPS: Tandatangan : …………………………………………… Nama : …………………………………….……... Tarikh: ……………. 5 APPLICATION OF STATISTICAL AND NEURAL NETWORK MODEL FOR OIL PALM YIELD STUDY AZME BIN KHAMIS A thesis submitted in fulfilment of the requirements for the award of the degree of Doctor of Philosophy Faculty of Science Universiti Teknologi Malaysia DECEMBER 2005 6 I declare that this thesis entitled “Application of Statistical and Neural Network Model for Oil Palm Yield Study” is the result of my own research except as cited in the references. The thesis has not been accepted for any degree and is not concurrently submitted in candidature of any other degree. Signature : ………………………………... Name : AZME BIN KHAMIS Date : ………………………………… 7 ACKNOWLEDGEMENTS ﺑﺴﻢ اﷲ اﻟﺮﺣﻤﻦ اﻟﺮﺣﻴﻢ In the name of Allah, the most Beneficent and the most Merciful. I would like to express my gratitude to my supervisor, Associate Professor Dr. Zuhaimy Hj Ismail for his encouragement, patience, constant guidance, continuous support and assistance all through out the period. Most of his invaluable comments and suggestion would be preciously valued. His dedication to work and perfectionism will be always be remembered and learnt as a basic necessity of a successful scholar. I am also very grateful to my co-supervisor, Dr. Khalid Haron from Malaysian Oil palm Board (MPOB), Kluang Station for his, comments, suggestions and sincere support during this endeavour. I also would like to thank Haji Ahmad Tarmizi Mohammed from MPOB Bangi for his motivation, fruitful discussion and valuable comments. I am especially grateful to my beloved wife, Hairani Razali, for her patience, encouragement and constant support she gives. She is my ‘co-pilot’ and this study would not possible without her. To my two lovely sons, Amirul Fikri and Amirul Farhan, and my lovely daughter Amirah Afiqah: you’re daddy’s source of inspiration. Many thanks go to my beloved parents, who constantly and remotely gave me encouragement and advice. I am grateful to the Kolej Universiti Teknologi Tun Hussein Onn and Malaysian Government for the sponsorship given. Lastly, many thanks go to those who have contributed directly and indirectly to the completion of my work in the Universiti Teknologi Malaysia. 8 ABSTRACT This thesis presents an exploratory study on modelling of oil palm (OP) yield using statistical and artificial neural network approach. Even though Malaysia is one of the largest producers of palm oil, research on modelling of OP yield is still at its infancy. This study began by exploring the commonly used statistical models for plant growth such as nonlinear growth model, multiple linear regression models and robust M regression model. Data used were OP yield growth data, foliar composition data and fertiliser treatments data, collected from seven stations in the inland and coastal areas provided by Malaysian Palm Oil Board (MPOB). Twelve nonlinear growth models were used. Initial study shows that logistic growth model gave the best fit for modelling OP yield. This study then explores the causality relationship between OP yield and foliar composition and the effect of nutrient balance ratio to OP yield. In improving the model, this study explores the use of neural network. The architecture of the neural network such as the combination activation functions, the learning rate, the number of hidden nodes, the momentum terms, the number of runs and outliers data on the neural network’s performance were also studied. Comparative studies between various models were carried out. The response surface analysis was used to determine the optimum combination of fertiliser in order to maximise OP yield. Saddle points occurred in the analysis and ridge analysis technique was used to overcome the saddle point problem with several alternative combinations fertiliser levels considered. Finally, profit analysis was performed to select and identify the fertiliser combination that may generate maximum yield. 9 ABSTRAK Tesis ini mempersembahkan kajian penerokaan terhadap pemodelan hasil kelapa sawit melalui pendekatan statistik dan rangkaian neural buatan. Malaysia adalah negara pengeluar minyak kelapa sawit terbesar, namun begitu penyelidikan mengenai pemodelan hasil kelapa sawit masih berada diperingkat awal. Kajian ini dimulakan dengan penerokaan terhadap model statistik yang popular untuk pertumbuhan pokok seperti model pertumbuhan taklinear, analisis regresi linear berganda dan analisis regresi-M teguh. Data hasil kelapa sawit, data kandungan nutrien dalam daun dan data ujikaji pembajaan yang dikumpulkan daripada tujuh buah stesen di kawasan pedalaman dan tujuh buah stesen di kawasan tanah lanar pantai telah disediakan oleh Lembaga Minyak Sawit Malaysia (MPOB). Dua belas model pertumbuhan taklinear telah dipertimbangkan. Kajian awal menunjukkan model pertumbuhan taklinear logistik adalah yang terbaik untuk memodelkan pertumbuhan hasil kelapa sawit. Kajian ini diteruskan dengan menerokai hubungan di antara hasil kelapa sawit dengan kandungan nutrien dalam daun dan nisbah keseimbangan nutrien. Bagi mempertingkatkan keupayaan model, kajian ini menerokai penggunaan rangkaian neural. Kajian ini juga mengkaji kesan rekabentuk rangkaian neural seperti gabungan fungsi penggiat, kadar pembelajaran, bilangan nod tersembunyi, kadar momentum, bilangan larian dan data lampau terhadap prestasi rangkaian neural. Kajian perbandingan di antara beberapa model yang dikaji telah dilakukan. Analisis satah sambutan telah digunakan untuk menentukan nisbah baja yang paling optimum bagi menghasilkan hasil kelapa sawit yang maksimum. Masalah titik pelana berlaku di dalam analisis dan analisis permatang telah digunakan untuk mengatasi masalah tersebut dengan ia menyediakan beberapa pilihan kombinasi baja yang boleh dipertimbangkan. Akhir sekali, analisis keuntungan dilakukan untuk memilih dan mengenalpasti kombinasi baja yang boleh menghasilkan keuntungan maksimum. 10 TABLE OF CONTENTS CHAPTER 1 TITLE PAGE TITLE i DECLARATION ii ACKNOWLEDGEMENTS iii ABSTRACT iv ABSTRAK v TABLE OF CONTENTS vi LIST OF FIGURES xii LIST OF TABLES xvii LIST OF SYMBOLS xxi LIST OF APPENDICES xxv INTRODUCTION 1.1 Introduction 1 1.2 Research Background 1 1.3 Brief History of Oil Palm Industry in Malaysia 3 1.4 Problem Descriptions 7 1.5 Research Objectives 8 1.6 Scope of The Study 9 1.6.1 Data Scope 9 1.6.2 Model Scope 11 1.6.3 Statistical Testing Scope 12 1.7 Data Gathering 13 1.8 Leaf Analysis 14 1.9 Research Importance 17 1.10 Research Contribution 18 1.11 Thesis Organisation 19 11 2 REVIEW OF THE LITERATURE 2.1 Introduction 21 2.2 Oil Palm Yield Modelling 21 2.3 Nonlinear Growth Model 27 2.4 Application of Neural Network Modelling 30 2.4.1 Neural Network in Science and Technology 31 2.4.2 Neural Network in Economy 32 2.4.3 Neural Network in Environmental and 34 Health 35 2.4.4 Neural Network in Agriculture 2.5 Response Surface Analysis 2.6 3 37 38 Summary RESEARCH METHODOLOGY 3.1 Introduction 43 3.2 Data Analysis 43 3.3 Modelling 45 3.3.1 Nonlinear Growth Models 3.3.1.1 Nonlinear Methodology 3.3.2 Regression Analysis 3.3.2.1 Least Squares Method 45 47 51 51 3.3.3 Robust M-Regression 53 3.3.4 Neural Networks Model 55 3.3.4.1 Introduction to Neural Network 56 3.3.4.2 Fundamentals of Neural Network 57 3.3.4.3 Processing Unit 58 3.3.4.4 Combination Function 58 3.3.4.5 Activation Function 59 3.3.4.6 Network Topologies 62 12 3.3.4.7 Network Learning 64 3.3.4.8 Objective Function 65 3.3.4.9 Basic Architecture of Feed-Forward Neural Network 3.3.5 Response Surface Analysis 66 72 3.3.5.1 Introduction 73 3.3.5.2 Response Surface: First Order 73 3.3.5.3 Response Surface: Second Order 76 3.3.5.4 Stationary Point 77 3.3.5.5 Ridge Analysis 79 3.3.5.6 Estimate the standard error of predicted response 3.4 Summary 4 80 81 MODELLING OIL PALM YIELD GROWTH USING NONLINEAR GROWTH MODEL 4.1 Introduction 4.2 The Nonlinear Model 4.3 The Method of Estimation 4.4 Partial Derivatives for The Nonlinear Models 4.5 Results and Discussion 4.6 Conclusion 5 MODELLING OIL PALM YIELD USING MULTIPLE LINEAR REGRESSION AND ROBUST M-REGRESSION 5.1 Introduction 82 84 85 87 93 104 13 5.2 Model Development 105 5.3 Results and Discussion 105 5.3.1 Multiple Linear Regression 107 5.3.2 Residual Analysis for MLR 107 5.3.3 Robust M-Regression 110 5.3.4 Residual Analysis for RMR 115 5.4 Conclusion 116 119 6 NEURAL NETWORK MODEL FOR OIL PALM YIELD 6.1 Introduction 122 6.2 Neural Network Procedure 123 6.2.1 Data Preparation 123 6.2.2 Calculating Degree of Freedom 124 6.3 Computer Application 125 6.4 Experimental Design for Neural Network 129 6.4.1 Experiment 1 131 6.4.2 Experiment 2 131 6.4.3 Experiment 3 132 6.5 Results and Discussion 133 6.5.1 Statistical Analysis 133 6.5.2 Neural Network Performance 138 6.5.3 Residual Analysis 146 6.5.4 Results of Experiment 1 149 6.5.5 Results of Experiment 2 149 6.5.6 Results of Experiment 3 149 6.6 Comparative Study on Oil Palm Yield Modelling 155 6.7 Conclusion 167 14 7 THE APPLICATION OF RESPONSE SURFACE ANALYSIS IN MODELLING OIL PALM YIELD 7.1 Introduction 7.2 Response Surface Analysis 7.3 Data Analysis 7.4 Numerical Analysis 7.4.1 Canonical Analysis for Fertilizer Treatments 7.4.2 Ridge Analysis for Fertilizer Treatments 7.5 Economic Analysis 7.5.1 Profit Analysis 7.6 Conclusion 8 169 169 172 173 174 179 186 187 195 SUMMARY AND CONCLUSION 8.1 Introduction 196 8.2 Results and Discussion 196 8.2.1 Initial Exploratory Study 197 8.2.2 Modelling Using Neural Network 201 8.2.3 Modelling Using Response Surface Analysis 208 8.3 Conclusion 211 8.4 Areas for Further Research 211 REFERENCES 214 Appendices A - U 231 15 LIST OF TABLES TITLE TABLE NO. 1.1 PAGE The optimum value of nutrient balance ratio, NBR for foliar analysis 17 2.1 The summary of the literature reviews in this study 39 3.1 Nonlinear mathematical models considered in the study 50 3.2 Summary of the data set types and research approaches considers in this study 4.1 Partial derivatives of the Logistic and Gompertz and von Bertalanffy growth models 4.2 4.4 87 Partial derivatives of the Negative exponential, Monomolecular, log-logistic and Richard’s growth models 4.3 81 88 Partial derivatives of the Weibull, Schnute and MorganMercer-Flodin growth models 89 Partial derivatives of the Champan-Richard and Stannard 90 growth models 4.5 Parameter estimates of the logistic, Gompertz, negative exponential, monomolecular, log-logistic, Richard’s and Weibull growth models for yield-age relationship 94 16 4.6 Parameter estimates of the MMF, von Bertalanffy, Chapman-Richard and Stannard growth models for yieldage relationship 4.7 Asymptotic correlation for each nonlinear growth models fitted 4.8 95 96 The actual and predicted values of FFB yield, the associated measurement error and correlation coefficient between the actual and predicted values for Logistic , Gompertz, von Bertalanffy, negative exponential, mono molecular and log-logistic growth models 4.9 98 The actual and predicted values of FFB yield, the associated measurement error and correlation coefficient between the actual and predicted values for Richard’s , Weibull, MMF, Chapman-Richard, Chapman-Richard* (with initial) and Stannard growth models 4.10 99 The parameter estimates an asymptotic correlation for von Bertalanffy and Chapman-Richard when an initial growth response data point is added 4.11 The number of iteration and the root mean squares error for nonlinear growth models consider in this study 5.1 119 The regression equation for the inland and coastal station using MNC and NBR as independent variables 5.3 104 The regression equations and R2 values for the inland and coastal areas 5.2 103 Regression equation using robust M-regression for the 114 17 inland and coastal areas 6.1 The F statistics value for ANOVA for different activation functions used for inland area 6.2 137 Mean squares error for training, validation, testing and average of the neural networks model in the inland area 6.6 136 Duncan test for the average of MSE for homogeneous subsets for the inland and coastal areas 6.5 135 The Chi-Square value of MSE testing for the inland and coastal areas 6.4 134 The F statistics value for ANOVA for different activation functions used for the coastal area 6.3 116 138 Mean squares error for training, validation, testing and average of the neural networks model in the coastal area 139 6.7 The correlation coefficient of the neural network model 140 6.8 The MAPE values of the neural network model 141 6.9 The t-statistic values in the training data 152 6.10 The t-statistic values for the test data 155 6.11 The MSE, RMSE, MAE and MAPE for MLR, MMR and neural networks performance for inland 6.12 157 The MSE, RMSE, MAE and MAPE for MLR, MMR and neural networks performance for coastal area 158 18 6.13 The correlation changes from the MLR and MMR models to neural network model 6.14 The performance changes of the MAPE from the MLR and MMR to the neural network model 7.1 175 The eigenvalues and predicted FFB yield at stationary point for each critical fertilizer level in inland area 7.4 174 The average of FFB yield, MSE, RMSE and R2 values for coastal area 7.3 164 The average of FFB yield, MSE, RMSE and R2 values for inland area 7.2 163 176 The eigenvalues, the predicted FFB yield at the stationary points and critical values of fertiliser level for CLD1 and CLD2 stations 7.5 177 The eigenvalues, the predicted FFB yield at the stationary points and critical values of fertiliser level for CLD3, CLD4, CLD5, CLD6 and CLD7 7.6 The estimated FFB yield and fertiliser level at certain radius for stations ILD3 and ILD4 in the inland area 7.7 181 The estimated FFB yield and fertiliser level at certain radii for station ILD7 7.9 180 The estimated FFB yield and fertiliser level at certain radius for stations ILD5 and ILD6 in the inland area 7.8 178 The estimated FFB yield and fertiliser level at certain radii 182 19 for stations CLD1 and CLD2 in the coastal area 7.10 The estimated FFB yield and fertiliser level at the certain radii for stations CLD4 and CLD5 in the coastal area 7.11 198 The RMSE, MAPE and R2 values for the MLR and MMR modeling for the inland and coastal areas 8.3 193 The adequacy of fit measurement used for the nonlinear growth models 8.2 192 The estimated FFB yield and the foliar nutrient composition levels in (%) for the coastal area 8.1 189 The estimated FFB yield and the foliar nutrient composition levels in (%) for inland area 7.15 186 The fertiliser level, average estimated of FFB yield and total profit for the inland and coastal areas 7.14 185 The estimated FFB yield and fertiliser level at the certain radii for station CLD7 in the coastal area 7.13 184 The estimated FFB yield and fertiliser level at the certain radii for stations CLD5 and CLD6 in the coastal area 7.12 183 199 The RMSE, MAPE and R2 values for the MLR and MMR modeling for the coastal area 200 8.4 The F values of the analysis of variance for different activation functions for the inland and coastal areas 8.5 202 The MAPE values and the correlation of the neural network models for the inland and coastal areas 203 20 8.6 The F value of analysis of variance for Experiment 1, 2 and 3 8.7 204 The comparison of the MAPE values and the correlation values among the MLR, MMR and NN models for inland and coastal areas 8.8 The accuracy of the MLR, MMR, NN models and the accuracy changes for the inland area 8.9 209 The fertiliser level, average estimated of FFB yield and total profit for the coastal area 8.12 207 The fertiliser level, average estimated of FFB yield and total profit for the inland area 8.11 207 The accuracy of the MLR, MMR, NN models and the accuracy changes for the coastal area 8.10 205 209 The average estimated of the FFB yield and the foliar nutrient composition levels for the inland and coastal areas 210 21 LIST OF FIGURES TITLE FIGURE NO. 1.1 Annual production of crude palm oil (1975-2003) including Peninsular Malaysia, Sabah and Sarawak 1.2 PAGE 4 Oil palm planted area: 1975 – 2003 (hectare) including Peninsular Malaysia, Sabah and Sarawak 5 1.3 Annual export of palm oil: 1975 – 2003 (in tonnes) 5 1.4 World major producers of palm oil (‘000 tonnes) 6 1.5 World major exporter of palm oil and including reexporting country (*) 1.6 6 Summary of research framework and research methodology used in this study 10 3.1 Data analysis procedure used in this study 44 3.2 FFB yield growth versus time (year of harvest) 46 3.3 Processing unit 58 3.4 Identity function 60 3.5 Binary step function 60 22 3.6 Sigmoid function 61 3.7 Bipolar sigmoid function 61 3.8 Feed-forward neural network 62 3.9 Recurrent neural network 63 3.10 Supervised learning model 65 3.11 Backward propagation 70 3.12 The descent vs. learning rate and momentum 72 4.1 Residual plot for Logistic, Gompertz, von Bertalanffy, Negative exponential, Monomolecular and Log logistic growth models 4.2 100 Residual plot for Richard’s, Weibull, Morgan-MercerFlodin, Chapman-Richard, Chapman-Richard* and Stannard growth models 5.1 The error distribution plots of MLR model in coastal stations 5.2 5.5 112 The error distribution plots of RMR model in inland stations 5.4 111 The error distribution plots of MLR model in inland stations 5.3 101 117 The error distribution plots of RMR model in coastal stations 118 The R2 value for each model proposed for inland area 119 23 5.6 The R2 value for each model proposed for coastal area 6.1 Three layers fully connected neural networks with five input nodes and one output node 6.2 144 The actual and predicted FFB yield for CLD4, CLD5, CLD6 and CLD7 using the NN model 6.10 143 The actual and predicted FFB yield for ILDT, CLD1, CLD2 and CLD3 stations using the NN model 6.9 142 The actual and predicted FFB yield for ILD4, ILD5, ILD6 and ILD7 stations using the NN model 6.8 130 The actual and predicted FFB yield for ILD1, ILD2 and ILD3 stations using the NN model 6.7 129 The three layers fully connected neural networks with nine input nodes and one output node 6.6 128 The correlation coefficient between the actual and predicted value 6.5 128 The mean squares error for training, validation and testing 6.4 125 The early stopping procedure for feed-forward neural network 6.3 120 145 The actual and predicted FFB yield for CLDT using the NN model 146 24 6.11 The error distribution plot of neural network model for the inland stations 6.12 The error distribution plots of neural network model for the coastal stations 6.13 160 Comparison of the MAPE values between MLR, MMR and NN for inland area 6.20 159 The correlation coefficient from the MLR, MMR and NN models for coastal area 6.19 153 The correlation coefficient from the MLR, MMR and NN models for inland area 6.18 153 The MSE values for different levels of the magnitudeoutliers in the test data 6.17 151 The MSE values for different levels of the percentageoutliers in the test data 6.16 150 The MSE values for different levels of the magnitudeoutliers in the training data 6.15 148 The MSE values for different levels of the percentageoutliers in the training data 6.14 147 160 Comparison of the MAPE values between MLR, MMR and NN for coastal area 161 6.21 Comparison of the accuracy of models for inland area 165 6.22 Comparison of the accuracy of models for coastal area 165 25 6.23 The percentage changes of the model accuracy for inland area 6.24 The percentage changes of the models accuracy for coastal area 7.1 166 166 The response surface plots for fertiliser treatments in ILD1 and ILD2 stations in inland and CLD2 and CLD7 stations in coastal area 7.2 171 Data analysis procedure in obtaining the optimum level of fertiliser level and foliar nutrient composition 172 7.3 The fertiliser levels for each station in the inland area 190 7.4 The fertiliser levels for each station in the coastal area 191 7.5 The foliar nutrient composition levels for each station in the inland area 7.6 The foliar nutrient composition levels for each station in the coastal area 7.7 8.1 192 194 Comparison between the N and K fertiliser level needs by oil palm for the coastal and inland areas 194 The factors that contribute to oil palm yield production 213 26 LIST OF SYMBOLS FFB - Fresh Fruit Bunches FELDA - Federal Land Development Authority RISDA - Rubber Industry Smallholders Development Authority SADC - State Agriculture Development Corporations FELCRA - Federal Land Consolidation and Rehabilitation Authority LSU - Leaf Sampling Unit NN - Neural Network MLR - Multiple Linear Regression RMR - Robust M-Regression RSA - Response Surface Analysis MSE - Mean Square Error RMSE - Root Mean Square Error MAPE - Mean Absolute Percentage Error N - Nitrogen P - Phosphorus K - Potassium Ca - Calcium Mg - Magnesium TLB - Total Leaf Basis NBR - Nutrient Balance Ratio CLP - Critical Leaf Phosphorus Concentration MNC - Major Nutrient Component AS - Ammonium Sulphate CIRP - Christmas Island Rock Phosphate KIES - Kieserite 27 LIST OF APPENDICES APPENDIX TITLE PAGE A The list of oil palm experimental stations 231 B The rate and actual value of fertiliser (kg/palm/year) 232 C Summary of macro nutrients needed by plants 234 D The list of paper published from 2001 until Now 236 E The ridge analysis 239 F Nonlinear least squares iterative phase, nonlinear least squares summary statistics and normal probability plot for the nonlinear growth models G 240 The parameters estimate using multiple linear regression for MNC as independent variables for 255 inland area H The parameters estimate using multiple linear regression for MNC as independent variables for 256 coastal area I Normal probability plot of multiple linear regression for the inland area J 257 Normal probability plot of multiple linear regression for the coastal area 258 28 K The parameters estimate using multiple linear regression using MNC and NBR as independent variables for the coastal area L 259 The parameters estimate using multiple linear regression using MNC and NBR as independent variables for the inland area 260 M The Q-Q plot for inland stations 261 N The Q-Q plot for coastal stations 262 O Example of the Matlab programming for neural network application P Graphical illustration for the best regression line fitting for inland stations Q 276 The calculation of total profit (RM) for the inland stations according to each radius U 274 The MSE, RMSE, MAE and MAPE values for each neural network model in the coastal area T 270 The MSE, RMSE, MAE and MAPE values for each neural network model in the inland area S 266 Graphical illustration for the best regression line fitting for coastal stations R 263 278 The calculation of total profit (RM) for the coastal stations in the coastal areas according to each radius 281 29 CHAPTER 1 INTRODUCTION 1.1 INTRODUCTION This chapter presents the introduction to this thesis. It begins by describing the overall research background followed by a brief history of the oil palm industry in Malaysia. Research objectives, the scope of this study, research framework and discussion on the research contribution are also given. Finally, the brief of each chapter is outlined. 1.2 RESEARCH BACKGROUND In the oil palm industry, modelling plays an important role in understanding various issues. It is used in decision making and the advance in computer technology has created new opportunity for the study of modelling. Modelling can be categorised into statistical and heuristic modelling. Statistical modelling is defined as the analysis of the relationship between multiple measurements made on groups of subjects or objects, and the model usually contains systematic elements and random effects. As a mathematical aspect, statistical modelling can be defined as a set of probability distributions on the sample space. Modelling involves the appropriate application of statistical analysis techniques with certain assumptions on hypothesis testing, data interpretation, and applicable conclusion. Statistical analysis requires careful selection of analytical techniques, verification of assumptions and verification of the data. In conducting statistical 30 analysis, it is normal to begin with the descriptive statistics, graphs, and relationship plots of the data to evaluate the legitimacy of the data, identify possible outliers and assumption violations, and form preliminary ideas on variable relationships for modelling. The heuristic approach is defined as pertaining to the use of general knowledge based on experimentation, evaluating possible answers or solutions, or trial-and-error methods relating to solving problems by experience rather than theory. Heuristic is also the problem-solving procedure that involves conceiving a hypothetical answer to a problem at the outset of an inquiry for purposes of giving guidance or direction to the inquiry. One of the heuristic approaches is the neural network model, which is based on the rules of thumb and widely used in various fields. A very important feature of neural networks is their adaptive nature where ‘learning by example’ replaces ‘programming’ in solving problems. This feature renders these computational models very appealing in application domains, where one has little or incomplete understanding of the problem to be solved, but where training data or examples are available. Neural networks are viable and very important computational models for a wide variety of problems. These include pattern classification, function approximation, image processing, clustering, forecasting and prediction. It is common practice to use the trial and error method to find a suitable neural networks architecture for a given problem. A number of neural networks are successfully used and reported in literature (Zuhaimy and Azme, 2001; Zuhaimy and Azme, 2002). Neural network also has been applied in various fields such as in environmental (Corne et al., 1998; Hsieh and Tang, 1998; Navone and Ceccatto, 1994), in economy and management (Boussabaine and Kaka, 1998; Franses and Homelen, 1998; Garcia and Gency, 2000; Indro et al., 1999; Klein and Rossin, 1999b; Tkacz and Hu, 1999; Yao et al., 2000) and in agronomy (Shearer et al., 1994; Drummond et al., 1995; Liu et al., 2001; Kominakis et al., 2002; Shrestha and Steward, 2002). There are different types of the network are perceptron network, multiple layer perceptron, radial basis function network, Kohonen network, Hopfield network etc. However, the multiple layer perceptron is widely reported and used neural 31 networks in application. The most popular architecture, in the class of multiple layer perceptron, is the feedforward neural network. The developments of models for agriculture are normally divided into three steps. The first step is to develop a preliminary model, which is inadequate. This preliminary model does not have to be a good model but it acts as a basis. This leads to further research, to develop a comprehensive model incorporating all the processes that appear to be important. Such a model is valuable for research, but far too complex for everyday use. To overcome this, a set of summary models is produced, each containing enough detail to answer limited questions. For example, there might be a summary model to predict the response to fertilisers on different soil types. Another model might be used to predict cyclic variation in yield. Modelling helps to make predictions more accurate. There is no doubt that modelling will maintain its importance in oil palm research as the problems set more complex and difficult. This study proposes the development of statistical model and neural network in modelling oil palm yield. 1.3 BRIEF HISTORY OF OIL PALM INDUSTRY IN MALAYSIA Oil palm (Eleais guineensis. Jacq.), is a plant of African origin and is grown commercially in Africa. In the early 19th century the oil palm was brought into this country by the British. The oil palm was first planted in 1848 in Bogor-Indonesia and in Malaysia in 1870, at the same time rubber seeds were brought in (Hartley, 1977). Due to lower profitability of oil palm in comparison to rubber, the development of oil palm industry was rather slower. The first commercial planting of oil palm in Malaysia took place in 1917, six years after its systematic cultivation in Sumatera. The early planting was undertaken by European plantations, including Tannamaran Estate in Selangor and Oil Palm Malaya Limited. The 1960s and 1970s were marked by extensive development of oil palm undertaken largely by private 32 plantations and the Federal Land Development Authority (FELDA). In addition, a number of State Agriculture Development Corporations (SADC) became involved in oil palm cultivation after learning about its good prospects. The Rubber Industry Smallholders’ Development Authority (RISDA) and the Federal Land Consolidation and Rehabilitation Authority (FELCRA) were also involved in cultivating abandoned and idle rubber and paddy areas with oil palm (Teoh, 2000). From year 1975 to year 2000, the worldwide area planted with oil palm (Elaeis guineensis Jacq.) has increased by more than 150 percent. Most of this increase has taken place in Southeast Asia, with a spectacular production increase in Malaysia and Indonesia. The production of crude palm oil (CPO) in 2003 increased markedly, by 12.1 percent or 1.4 million tonnes to 13.35 million tonnes from 11.91 million tonnes in 2002 (Figure 1.1) (Teoh, 2000). 4000000 3500000 H e cta re 3000000 2500000 2000000 1500000 1000000 500000 03 20 01 20 97 99 19 19 95 19 93 19 91 19 89 19 87 19 85 19 81 83 19 19 79 19 77 19 19 75 0 Year Figure 1.1: Oil palm planted area: 1975 – 2003 (hectares) including Peninsular Malaysia, Sabah and Sarawak (Source: Department of Statistics, Malaysia: 1975-1989; MPOB: 1990-2003) 33 The production of crude palm kernel also rose substantially by 11.6 percent in to 1.6 million tonnes year 2003 from 1.47 million tonnes in year 2002. The increase was mainly attributed to the expansion in the matured area (Figure 1.2), favourable weather conditions and rainfall distribution as well as constant sunshine throughout the year. Exports of palm oil increased by 12.5 percent or 1.36 million tonnes to 12.25 million tonnes from 10.89 million tonnes in 2002 (Figure 1.3) (MPOB, 2003). Crude palm oil (tonnes) 16000000 14000000 12000000 10000000 8000000 6000000 4000000 2000000 20 03 20 01 19 99 19 97 19 95 19 93 19 91 19 89 19 87 19 85 19 83 19 81 19 79 19 77 19 75 0 Year Figure 1.2: Annual production of crude palm oil (1975-2003) including Peninsular Malaysia, Sabah and Sarawak. (Source: Department of Statistics, Malaysia: 19751989; MPOB: 1990-2003) 14000000 Palm oil (tonnes) 12000000 10000000 8000000 6000000 4000000 2000000 20 03 20 01 19 99 19 97 19 95 19 93 19 91 19 89 19 87 19 85 19 83 19 81 19 79 19 77 19 75 0 Year Figure 1.3: Annual export of palm oil: 1975 – 2003 in tonnes. (Source: MPOB) 34 Malaysia is the major producer and exporter of palm oil in the world (Teoh, 2000). Figure 1.4 shows Malaysian production of palm oil compared to Indonesia and other countries from 1999 to 2003. It shows that Malaysia and Indonesia recorded an increase in production every year. While Figure 1.5 presents the world’s major palm oil exporters of palm oil from year 1999 to 2003, it also indicates that Malaysia and Indonesia also recorded the higher volume. In 2003, the Malaysian palm oil exporting industry has increased by around 12.5 percent to 12,248 million tonnes, from 10,886 million tones the previous year. Indonesia only recorded a 7.07 percent increase over the same period. The development of the oil palm industry is growing at a fast rate and requires a lot of research. This study took the challenge to contribute our knowledge to the development of the oil palm industry. Production ('000 tonnes 16000 14000 12000 10000 8000 6000 4000 2000 0 Msia Indon Nigeria Colomb Cd'Ivoeir PNG Thai Other World countries 1999 2000 2001 2002 2003 Figure 1.4: World major producers of palm oil (‘000 tonnes) Source: Oil World (December 12, 2003), Oil World Annual (1999-2003) 35 Export ('000 tonnes) 14000 12000 10000 8000 6000 4000 2000 0 Msia Indon PNG CColomb Sing* d'Ivoeir HK* Other World countries 1999 2000 2001 2002 2003 Figure 1.5: World major exporter of palm oil, including re-exporting country (*) Source: Oil World (December 12, 2003), Oil World Annual (1999-2003) 1.4 PROBLEM DESCRIPTIONS The problem in modelling oil palm yield growth is that it does not follow a linear model. It normally follows a nonlinear growth curve. In modelling a nonlinear curve, the complexity of the problem increases with the increase in the number of independent variables. The function of a growth curve has a sigmoid form, ideally its origin is at (0,0), a point of inflection occurring early in the adolescent stage and either approaching a maximum value, an asymptote or peaking and falling in the senescent stage (Philip, 1994). Normally, oil palm can be harvested after three years of planting. The oil palm yield will increase vigorously until the tenth year of planting. The yield will then increase at a low increment until the twenty-fifth year. From our exploratory study on modelling practices, little work has been reported on modelling the oil palm yield growth (Corley and Gray, 1976). In most cases, researchers focused their study on the effect of environmental factors, such as evapotranspiration, moisture and rainfall to the oil palm growth. Chan et al. (2003) conducted a study on the effect of climate change to fresh fruit bunches (FFB) yield, and found that climate change has significantly affected oil palm yield. The most popular method used in the oil palm industry is multiple linear 36 regression. This model is used to investigate the causal effect of the independent variables to the dependent variable. The literature shows that the foliar nutrient composition can be used as an indicator to estimate the oil palm yield. Nevertheless the foliar nutrient composition is also dependent on several factors, such as climate, soil nutrients, fertilisers, pest and diseases, but little had been done on modelling these factors. This study explores the possibility of improving the model but in particular, in improving the level of accuracy it can produce. The proposed model should give smaller error values than previous model (Multiple Linear Regression, MLR). The response surface analysis is the technique used to model the relationship between the response variable (Fresh Fruit Bunch yield, FFB) and treatment factors (fertilisers). The factor variables are sometimes called independent variables and are subject to the control by the experimenter. In particular, response surface analysis also emphasises on finding a particular treatment combination, which causes the maximum or minimum response. For example, in the oil palm industry there is a relationship between the response variable (oil palm yield) and the four fertiliser treatments, namely nitrogen (N), phosphorus (P), potassium (K) and magnesium (Mg). The expected yield can be described as a continuous function of the application level of fertiliser used. A continuous second-degree-function (N2, P2, K2 or Mg2) is often a sufficient description of the expected yield over the range of factor levels applied (Verdooren, 2003). If the fertiliser application rates are greater or smaller than the optimum application rate it may result in reduced yields. Fertilisers are wasted if the amount applied is more than the optimum rate. The advantage of this technique is that the effects of treatment combinations that have not been carried out in the experiment may still be estimated. The use of response surface analysis is necessary to obtain the optimum level of fertiliser requirements. In response surface analysis, the eigenvalues will determine whether the solution gives a maximum, minimum or saddle point of the response curve. From our exploratory study on the use of response surface analysis, there is no solution if the stationary point is a saddle. This study will propose to use ridge analysis as an alternative solution to overcome the saddle point problem. 37 1.5 RESEARCH OBJECTIVES Even though Malaysia is the largest producer of palm oil in the world, studies on modelling yields have been very limited. The modelling of Malaysian oil palm yield has been a recent phenomenon for decades. Literature reviews on research conducted in this field are confined to simple models. The oil palm industry is currently under going a structural change and is becoming more complex due to technological advances, agricultural management, product demand and planting areas (Teoh, 2000). This research is an attempt to present a proper methodology for modelling oil palm yield. The model may then be used for estimating and managing the oil palm industry. We further refine the objectives as follows:• To study current modelling and estimating practices in the oil palm industry. • To explore and propose the best model for oil palm yield growth. • To explore the use of neural network to model oil palm yield. • To optimise fertiliser level which will generate optimum yield. These objectives will be achieved by following the research framework as presented in Figure 1.6. 1.6 SCOPE OF THE STUDY This section is divided into three subsections. The first section will discuss the scope of the data, followed by a discussion on the model scope, and finally the discussion on statistical testing deployed in this study. 38 1.6.1 Data Scope For modelling oil palm yield growth data used in this study is secondary data taken from research done by Foong (1991; 1999). The research was conducted at Serting Hilir in Negeri Sembilan with relatively wet weather. The annual rainfall in this area is between 1600 mm to 1800 mm with two distinct droughts in January to March and June to August. The data used here is the average fresh fruit bunches (tonnes/hectare) from 1979 to year 1997. The Malaysian Palm Oil Board (MPOB) provided us with a data set taken from several estates in Malaysia. The factors included in the data set were foliar composition, fertiliser treatments and FFB yield. The variables in foliar composition include percentage of nitrogen concentration N, percentage of phosphorus concentration P, percentage of potassium concentration K, percentage of calcium concentration Ca, and percentage of magnesium concentration Mg. The fertiliser treatments included N, P, K and Mg fertilisers, and they were measured in kg per palm per year, example 3.7 kg N fertilisers were needed for one palm per year. The foliar composition data was presented in the form of measured values while the fertiliser data in ordinal levels, from zero to three. 39 Research Design Review Secondary Data DATA GATHERING MPOB DATA MINING DATA ANALYSIS MODELLING Oil Palm Yield Growth Data Foliar Composition Nonlinear Growth Curve Fertiliser Data Response Surface Analysis MLR RMR Neural Network Goodness of Fit Testing No Yes Comparative Study No Yes Oil Palm Yield Model Figure 1.6: Summary of research framework and research methodology used in this study 40 1.6.2 Model Scope This study will confine the scope of models, namely the nonlinear growth model (NLGM), multiple linear regression (MLR), robust M-regression (RMR), response surface analysis (RSA) and neural network (NN) models. The nonlinear growth model will be used to model the data of oil palm yield growth. Using foliar analysis data we employ the multiple linear regression and robust M-regression to estimate the oil palm yield. In the MLR model the independent variables are N, P, K, Ca and Mg concentration (or as we call it, major nutrient component, MNC) and the dependent variable is fresh fruit bunches (FFB) yield. Aside from MNC concentration, we also introduce the use of nutrient balance ratio (NBR), critical leaf phosphorus concentration (CLP), total leaf basis (TLB), deficiency of K (defK) and deficiency of Mg as independent variables in the second part in MLR. In MM regression we only consider N, P, K, Ca and Mg concentration as independent variables and FFB yield as the dependent variable. We propose the use of the neural network to model oil palm yield. The discussion on the selection of neural network architecture and some statistical analysis will be given in Chapter 6. Chapter 7 will describe the use of response surface analysis to obtain the optimum fertiliser rate to produce an optimum FFB yield. Following this is a simple economic analysis to select the best combination of fertilisers input that generates the maximum profit. 1.6.3 Statistical Testing Scope In this study we considered several statistical tests. They are the error model, sum of squares error (SSE), root mean squares error (RMSE), determination coefficient (R2), coefficient of correlation (r), t-test, F test and chi-square test. The discrepancy between the predicted value from the model fitted, ŷ i and actual value yi is used to measure the model goodness of fit. The difference between the actual and the estimated value as known as the model error, and can be written as follows; ei = yi - ŷ i i = 1, 2,…, n 41 where ei is the model error in observation i. yi is the actual observation i, and ŷ i is the estimated value at i observation. If the model performance is ‘good’, the model error will be relatively small. For the purposes of measuring the accuracy of model fitting, we consider the four measurements commonly used in any research on model fitting. Namely sum squares error, root mean squares error, determination coefficient R2 and correlation coefficient. All formulas are given below; n (i) Sum Squares Error, SSE = ∑ ( yi − ŷi )2 , i = 1, 2,…, n i =1 ∑ ( yi − yˆ i ) n (ii) Mean Squares Error, MSE = 2 i =1 , i = 1, 2,…, n n n ∑ ( yi − ŷi )2 (iii) Root Mean Squares Error, RMSE = i =1 , i = 1, 2,…, n n n (iv) Determination of coefficient, R2 = 1- ∑ ( yi − ŷi )2 i =1 n 2 , i = 1, 2,…, n ∑ ( yi − y ) i =1 and (v) Coefficient of correlation, r = n ( xi − x )( y − y ) ∑ Var ( x)Var ( y) , i = 1, 2,…, n i =1 where yi observed value, ŷ predicted value, n number of observation, x and y are the mean of xi observation and yi observation, respectively, var(x) is the variance of X and var(y) is the variance of Y. SSE, MSE and RMSE are used to measure the model accuracy. The R2 value is a measure of how well the explanatory variables explain the response variable. Correlation coefficient is used to identify the strength of the relationship between any two variables. In the case of more then two samples, one-way analysis of variance (anova) can be used to test the different between the groups using F-test. The anova F-test is 42 calculated by dividing an estimate of the variability between the groups by the variability within the groups; F= Variance between groups Variance withion groups A high value of F, therefore, is evidence against the null hypothesis of equality of all population means. If the test shows the mean difference between groups to be statistically significant, the Multiple Duncan test can be used to examine which groups are different to each other (Montgomery, 1991). Another alternative to oneway analysis of variance is the Chi-square test, which is a nonparametric test which can be used when assumption of normality is not needed. The model performance will be measured using sum squares error, mean squares error, mean absolute, root mean squares error, mean absolute percentage error, coefficient of determination and coefficient of correlation. 1.7 DATA GATHERING The Malaysian Palm Oil Board (MPOB) provided data from the MPOB database of oil palm fertiliser treatments, which have been carried out from fourteen oil palm estates. All the data from each estate has been collected, recorded and compiled by MPOB researchers in the Research Database Center. All treatments were based on a factorial design with at least three levels of N, P and K fertiliser rates. Although different types of fertiliser were used in the treatments, the rates quoted in the final analysis will be equalized to the amounts of ammonium sulphate (AS), muriate of potash (KCI), Christmas Island Rock Phosphate (CIRP) and kieserite (Kies). Cumulative yields obtained over a period of two to five years in each trial were analyzed. The data of this study is experimental basic and was collected for a certain period of time and differs for each experiment. We study fourteen experimental stations (including Peninsular Malaysia and East Malaysia), seven stations in inland areas and seven stations in coastal areas. Appendix A presents the background of the experimental stations including age of oil palm, soil type and the location of the station. 43 Fresh fruit bunches (FFB) yield data used in this study was measured in tonnes per hectare per year or the average of FFB yield in one year. Foliar analysis was only done once a year and the samples are taken either on March or July every year. For example, if this year foliar analysis conducted in July, the next sample also conducted in July next year, and so on. The type of FFB yield data and foliar analysis data is continuous, and a fertiliser input is in coded form (0, 1, 2, and 3). If recode data is needed, the coded value will be recoded to the exact value (Appendix B). The detail of the leaf analysis procedure is presented in section 1.8. 1.8 LEAF ANALYSIS The best method of determining the kind and amount of fertiliser to apply to fruit trees is by leaf analyses. It effectively measures macro and micronutrients and indicates the need for changes in fertiliser programs (Cline, 1997). Leaf analyses integrate all the factors that might influence nutrient availability and uptake. The essentials of macronutrients to oil palm tree were listed in Appendix C. However, leaf analysis indicates the nutritional status of the crop at the time of sampling (Pushparajah, 1994). It also shows the balance between nutrients for example, magnesium (Mg) deficiency may be the result of a lack of Mg in the soil or due to antagonistic effect with excessive K levels or both of these conditions. It also shows hidden or incipient deficiencies. Adding N, for example, when K is low may result in a K deficiency because the increased growth requires more K (Fairhurst and Mutert, 1999). The leaf analysis was conducted to determine the nutritional status of leaflets from frond 9 on immature palms and frond 17 on mature palms (Corley, 1976). This is conducted to assist the preparation of annual fertiliser programmes. In each nominated leaf sampling the appropriate frond is correctly sampled for each leaf sampling unit (LSU). Frond 17 is sampled from the labeled reference LSU palm in some or all fields in a LSU and prepared for analysis. Cleanliness is essential at all 44 stages to prevent sample contamination and sampling time between 6.30 am and 12.00 noon. A frond 17 is identified by counting from the first fully open frond in the center of the crown (frond 1) (and moved three steps downward (frond 1, 9, 17) with the same stack) and removed with a sickle. The frond is cut into approximately three equal sections (to get the average of the nutrient concentration). The top and base sections are discarded and placed in the frond stack. Twelve leaflets are selected and removed from each frond. Six leaflets are cut from each site at the mid-point of the frond section (Corley, 1976). Ensure that the 12 leaflets comprise of three from the upper rank and three from the lower rank from each side of the rachis. The leaflets samples from each field (or smaller area if required) are put together in a large labeled plastic bag. About 500 leaflets are collected from each field of 30 hectare. The samples are then sent to the estate laboratory or sample preparation room for further preparation. The leaflets are bundled and trimmed to retain the 20-30 cm mid-section; it is not necessary to wash the leaves. The mid-rib of each leaflet’s section is removed and discarded. The remaining parts of the leaflet’s (lamina) are then cut into small pieces 2 cm long and placed on aluminium trays to be dried. The leaflets are dried in a fan-assisted oven for 48 hours (650C) or 24 hours (1050C). The leaf N concentration will be reduced if the temperature exceeds 1050C. After drying, the leaflets are placed in a labeled plastic bag. Half of the sample retained as a backup for future reference (stored in a cool, dry place) while the other is submitted for analysis. The LSU sample results from the laboratory are then formatted as a spreadsheet and the variability is calculated. Leaf samples are analyzed for N, P, K, Ca and Mg. Other nutrients may be included for palms planted on particular soil types. Leaf sampling is carried out once each year. Sampling is frequently conducted to examine specific areas or to investigate particular nutritional problems. Leaf sampling should be done at the same time each year and not during wet or very dry periods. Complete the sampling procedure in the shortest possible time. 45 Because of the synergism between nitrogen (N) and phosphorus (P) uptake, leaf concentration must be assessed in ratio to leaf N concentration (Ollagnier and Ochs, 1981). This is due to the constant ratio between N and P in protein compounds found in plant tissue (Fairhurst and Mutert, 1999). A critical curve has been developed where CLCp is defined as; Critical Leaf P concentration, CLCP = 0.0487 x Leaf N concentration + 0.039 A different approach to determine whether potassium (K) and magnesium (Mg) are deficient taking into account the relative concentrations of the leaf cations K, Mg and calcium (Ca). First, the total amount of bases in leaf (TLB) is calculated and K and Mg are assessed as a percentage of TLB (Foster 1999). TLB can be derived from equation below; TLB (cmol/kg) = (% leaf K/39.1 + % leaf Mg/12.14 + % leaf Ca/20.04) x 1000 roughly, K and Mg deficiency can then be assessed individually, based on their percentage of TLB. The deficiency of K and Mg can then be obtained ⎛ X ⎞ as ⎜ ⎟ x100 , where X is partial to TLB of K and Mg. The K and Mg deficiency ⎝ TLB ⎠ can be rated into three categories; If the value is below than 25 the rating is deficient, a low rating is between 25 to 30 and a rating more than 30 is considered sufficient. Nutrient Balance Ratio, NBR is defined as the ratio between the foliar nutrient composition and another foliar nutrient composition. For example, the NBR between N and K in foliar, is defined as the ratio between N and K concentration. The range of the NBR values for oil palm presented in Table 1.1. 46 Table 1.1: The optimum value of nutrient balance ratio (NBR) for foliar analysis Nutrient ratio N/K 1.9 NBR 2.50 – 3.00 N/Mg 14.00– 18.00 N/P 11.00 – 17.00 N/C 4.00 – 9.00 K/Mg 4.00 – 10.00 K/Ca 2.00 – 5.00 Mg/Ca 0.25 – 0.55 RESEARCH IMPORTANCE The nonlinear growth models are used in modelling the nonlinear phenomenon. Since the nonlinear growth model has not yet been explored in oil palm industry (Foong, 1999 and Ahmad Tarmizi et al., 2004), we proposed the use of the nonlinear growth model in the oil palm yield growth study. Here we will provide some mathematical basis in parameter estimation for modelling oil palm yield growth. Then from the results and analysis we can study the biological process of oil palm yield growth. Multiple linear regression can be used to find the relationship between the dependent variable and the independent variable. There can be more than one independent variable, which allows for the additional relevance of the independent variable to the model. In these sense, multiple linear regression is rather flexible. Our study emphasizes the proposed new independent variables into the model, an area yet to be explored by researchers. In real life, nothing seems to work linearly all the time. Data are sometime inclusive of outlier or unusual observation. We proposed the use of multiple robust regression to overcome the negative impact of outlier to the model’s development. 47 To improve the models, there are various new heuristic methods suggested in this literature. We explore the flexibility of the neural network to improve the estimated performance and the model’s accuracy. Previous studies in oil palm stopped when the stationary point was saddle (Ahmad Tarmizi, 1986). This caused did not make allowances for the possibility an incomplete inference from the model than produce inefficient decision. It also caused difficulties in implementing improvements in practice outcomes. This study proposes the use of ridge analysis when the stationary point is saddle to improve data analysis. 1.10 RESEARCH CONTRIBUTION There are many contributions in this study. Since it is an area of high importance for the sustainability of the oil palm industry, the contributions can be categorized as follows; • Identifying several nonlinear growth models for oil palm yield growth. • The investigation on the relationship between foliar nutrient composition and yield was conducted using MLR and RMR. A practical model and procedure were developed for this purpose. • Development of neural networks model to predict the oil palm yield and NN results more reliable compared with the MLR and RMR models. • This study proposes statistical testing to evaluate the factors that influence NN performance. The findings indicated that the combination activation and number of hidden nodes have a significant effect on the NN performance. However, the learning rate, momentum term and number of runs do not give any effect on the NN performance. • This study investigates the effects of outliers on NN performance. The findings show that percentage-outliers and magnitude-outliers significantly affect the NN performance. • The response surface analysis when combined with the ridge analysis was used to obtain the optimum level of foliar nutrient composition and fertiliser input to produce optimum oil palm yield. 48 Several of the contribution demonstrated above has been published in various form as described in Appendix D 1.11 THESIS ORGANISATION This thesis contains eight chapters. Chapter 1 is the introduction. This chapter gives an introduction to the problem’s description, research objectives, research scopes, research importance, research data and a brief description on the usage of the data in this research. Chapter 2 is the Literature Review. This chapter contains a discussion on the current and past research on oil palm yield. Here we present the application of neural network modelling in several fields, such as economic, management and agronomy. A summary is included at the end of the chapter. Four main models used in the thesis are explained in Chapter 3. It discusses the statistical methods such as nonlinear growth models, multiple linear regression, response surface analysis and the neural networks model. This chapter also proposes the research framework In Chapter 4 the use of the nonlinear growth curve to model the oil palm yield growth is considered. Twelve nonlinear growth models are presented and the partial derivative for each models are provided. Comparisons among the model is done and given at the end of the chapter. Chapter 5 discusses the development of multiple linear regression and robust M-regression to investigate the relationship between fresh fruit bunch and the nutrient foliar composition. The use of nutrient balance ratio, deficiency of magnesium, deficiency of potassium and critical leaf phosphorus as independent variables are proposed in this chapter. The numerical results from both methods are presented and compared in terms of modelling performance. 49 Chapter 6 presents the development of neural network to oil palm yield modelling. The experimental design is conducted to investigate the effect of the number of hidden nodes, the number of runs, momentum terms learning rate and outliers data to the NN performance. The results and conclusion of model selection have been carried out. The results from multiple regression analysis and neural network model are compared in terms of goodness of fit and model accuracy. Numerical results of the foliar nutrient composition and fertiliser treatments performed by response surface analysis are reported in Chapter 7. The use of ridge analysis is discusses to overcome the ‘saddle point’ problem at the stationary point. This chapter ends with a simple economics analysis to generate the optimum fertilisers level in order to maximise the profit. Chapter 8 concludes the relevant and important findings from this research. Recommendations on areas related to the findings and possible directions for future research are presented. 50 CHAPTER 2 REVIEW OF THE LITERATURE 2.1 INTRODUCTION The purpose of this literature review is to study the modelling and predicting practices in the oil palm industry. This chapter presents the review of the literature on modelling that is relevant to the oil palm yield in Malaysia and to the application of neural networks. The first section seeks to describe the methods used in oil palm yield modelling. This hopefully will enable us to establish a clearer picture of how oil palm yield is being modelled. The second section attempts to identify the nonlinear growth model, which may be applied to oil palm growth data. A broad discussion of the applications of neural networks is provided, presenting the advantages of neural networks and their usage in various fields. The last section will be the discussion on literature about the response surface. Finally, the essential points of review will be summarized as a guide for this research. 2.2 OIL PALM YIELD MODELLING Traditionally, oil palm yield is estimated based on the black bunch count. The simplest method for short term prediction of oil palm yield is to count the number of black bunches. This is widely used as a rough estimate for the prediction of oil palm yield. Several stages should be undertaken in order to count the black 51 bunches in the field. The black bunch count was usually conducted per hectare basis, with each hectare containing between 140 to 148 palms. The method can be used to forecast bunches per hectare for 3 to 6 months ahead (Chan, 1999). A typical black bunch count is as follows; assume the average black bunch is 3 per palm, and assume the average density is 145 per hectare. The bunch forecast is therefore 435 bunches per hectare (or 145 x 3). The oil palm yield per hectare is easily obtained by multiplying average weight of one bunch and the number of bunches per hectare. Multiple linear regression has also been used to model potential oil palm yield. Variables considered in this model are potential yield/production, total area under planting, re-planting area, new planting, total matured area and average yield per hectare. The functional relationship can be expressed as multiple linear regression as Mt = Mt-1 – Rt + αNt-3 + βNt-4 where Mt is the matured area at year t, Rt is the replanting area at year t and Nt-k is the total new planting and replanting at year, k before year t. Chow (1984) suggested that α + β = 1, to ensure that the estimated total planted area is equal to total matured area. Then the total potential production area can be estimated using the equation below; Pt = Mt - γ[αNt-3 + βNt-4] ; 0<γ<1 where Pt is the total potential production area at time t, Mt is the matured area at year t, and Nt-k is the total new planting and replanting at year, k before year t. The total potential oil palm production, TPP can be calculated by multiplying the total potential production area and the average oil palm yield per hectare. TPPt = Pt * average FFB per hectare Green (1976) performed an experimental design analysis to find the impacts of fertiliser level to yield. In his experiment, he considered three types of fertiliser; nitrogen (N) , phosphorus (P) and potassium (K). The method used in his study was multiple linear regression with quadratic and second order interaction. Green used 52 linear regression to investigate the relationship between foliar composition and yield. He found that yield and the leaf level of the four major elements N, P, K, and Mg are highly correlated. In addition, Ahmad Tarmizi et al. (1986) carried out intensive research on fertiliser trials at several estates in Peninsular Malaysia, and found that the fertiliser response to yield varied according to the type of soil nutrient contents. Ahmad Tarmizi et al. (1999) conducted two trials. In the first trial, he found significant quadratic effects when using N and K treatments. The second trial shows that there was a significant quadratic and linear effect of N fertiliser treatment. Response to P fertiliser was also significant but response to K fertiliser was found not to be significant. It was found that the size of the response to fertiliser also depends on two other factors, viz., the yield level and the status of other nutrients. Therefore, correcting these factors is essential before comparing the efficiency of fertiliser recovery in different situations. The correction was made by fitting response equation using linear regression to the fertiliser trial data, viz., agronomic factors, site characteristics, soil and climate data (Foster et al., 1985; Foster et al., 1987). According to Ahmad Tarmizi et al. (1991) the efficiency of urea in each trial was too complicated to be predicted by a general model (linear regression) using common factors. This was due to the interaction of many factors in the field. The factors that contributed significantly to the variation in the efficiency of urea are identified, namely soil pH, annual drainage, drought, rainfall, relative humidity, organic matter in the soil and ground cover characteristics. Ahmad Tarmizi and Omar (2002) and Ahmad Tarmizi et al. (2004) suggested that the efficient fertiliser practices could increase the oil palm yield from 14 tonnes per hectare to 30 tonnes per hectare. Foong (2000) found that the climate, nutrient composition and agronomic practices were significantly effect to the potential oil palm yield. Foong (1991; 1999) found that oil palm production in Malaysia is strongly influenced by the frequent droughts lasting two or three months in most part of the country, especially the severe droughts which result in inflorescence abortion and 53 unfavorable sex differentiation. Moisture limitation will restrict the oil palm height increment, although it is less restrictive in trees of older age, probably due to their greater affinity for sunlight because of the heavy mutual shading of the canopies. He also found that irrigation does not change the pattern in oil palm yield. Kee and Chew (1991) studied oil palm yield responses to nitrogen content and drip irrigation. They found that the moisture and nutrients were important factors that limit yields in the northern interior of the East Coast of Peninsular Malaysia. Significant responses to N were obtained in all trials. They used the quadratic model to obtain the responses of N to yield. They also found that there were no interactions between N fertiliser and irrigation. In the absence of irrigation, substantially higher fertiliser rates are required to achieve a similar yield to the irrigated treatment. Chow (1988) shows that the seasonality in oil palm yield, being highly significant in the Peninsular level, could be quite independent of rainfall, although rainfall should have interacted with and modified the seasons to some extent. There is a positive correlation between production and rainfall at lags of 20-24 months and 7-11 months before harvesting. Significant negative lagged effects on yield are observed at lag of 12-13, 24-25 months and the 36 months, which probably account for the yield fluctuations that seem independent of seasonal and rainfall effects. Chan (1999) suggested an integration of the six factors (economic evaluation; fertiliser input, R & D findings, nutrient management, site characteristic and procurement and distribution) with a systematic approach towards efficient fertiliser management to exploit the maximum yield. Harun (2000) discussed that oil palm yield depends on several factors such as planting materials, agronomic inputs, photosynthesis activities and seasonal climatic conditions. Furthermore, the bunch yields of oil palm are known to fluctuate seasonaly. This leads to a corresponding variation in oil palm and palm kernel production, and hence in supply to markets (Henson and Harun, 2004). Research about the effects of weather to yield component was carried out by Oboh and Fakorede (1999) in Nigeria, and they concluded that the minimum relative humidity and sunshine hours 18-24 months prior to the year of harvesting could serve as indicators to the yield pattern, and that the use of the path coefficient analysis along with correlations and regression analysis would give a better understanding of the 54 relationship between yield and weather factors in palm oil. They also found a negative correlation between the numbers of bunches and the mean bunch weight, this is because the climatic factors which increase mean bunch weight are detrimental to the formation of additional bunches. Henson (2000) found that the haze was effected the oil palm productivity. The relationship between leaf analysis and plant productivity is generally an evident for most crops, and an assessment of fertiliser needs can be based on such analysis. However for a cost effective approach, leaf analysis has to be integrated with soil analysis (Pushparajah, 1994). Foster (2003) used regression analysis to predict yield responses from the results of leaf nutrient analysis. He fitted linear and quadratic regression lines and found that the quadratic model gave the best fit compared to the linear model. As expected, response to N correlated most highly with leaf N using a single linear regression line. Soon and Hong (2001) studied the responses of N and K fertiliser to fertiliser on two major soil types in Sabah that is Paliu and Limisir soil, and found that the nitrogen fertiliser was the most important nutrient affecting FFB yield.. Nitrogen was found to be the most important nutrient affecting FFB production. There was no yield response to K fertiliser on Paliu Family soil, but on Lumisir Family soil, the K fertiliser increased the FFB yield. The P fertiliser markedly improved FFB yield on Paliu Family soil but this was not observed on Lumisir Family soil. There was no significant response of FFB yield to Mg fertiliser on either soil type. There was a systematic increase in leaf N, leaf P and leaf K with the application of N, P and K fertilisers respectively on both soil types. K fertiliser had an antagonistic effect on the amount of leaf Mg. Fertiliser experiments are used to determine oil palm response to mineral fertiliser under particular agro-ecological conditions and the results help to guide the preparation of routine fertiliser programmes for commercial plantation. From their studied, it was found that the optimum yield was achieved at 0.69 to 0.75 kg N per palm per year. Large amounts of costly mineral fertiliser are usually required to achieve optimum economic yields (Foster, 2003). 55 Makowski et al. (2001) showed how the statistical analysis used to describe wheat yield, grain protein content, and residual mineral N responses to applied N, can be used to predict responses to applied N and calculate optimal N rates. Teng and Timmer (1996) demonstrated that simultaneous prescriptions of nitrogen (N) and phosphorus (P) fertiliser for tree seedling culture can be facilitated by response surface models generated from a multi-level factorial experiment designed to quantify growth, nutritional response and (N x P) interactions in white spruce ( Picea glauca Voss). They also studied the interaction between nitrogen and phosphorus using response surface model and estimate the optimum levels of nitrogen and phosphorus in forestry application. B́elanger et al. (2000) evaluated quadratic, exponential and square root models describing the yield responses of potato (Solanum tuberosum L.) to six rates of N fertiliser (0 to 250 kg N per hectare), with and without supplemental irrigation, at four sites in Canada. They found that the quadratic model was best suited to describe the yield response of potato to N fertiliser, and to predict the optimum economic of N rates for areas with a similar ratio of the cost of N fertiliser to the price of potatoes to Atlantic Canada. Wendroth et al. (2003) showed in his study how autoregressive state space model can be used to evaluate the quality of barley yield predictions. Two sets of data set were used; the first data set comprised with soil information (texture, organic carbon content) and yield from previous year, and the second set was comprised of crop information (vegetation index, nitrogen status and land surface elevation). He found that both sets of variables elicited a similar quality of prediction. Drummond et al. (1995; 2002) conducted a study to understand the complex relationship between crop yield, site and soil characteristics in the soybean industry. They used statistical and neural network methods and they found that the neural network methods were consistently more accurate on the individual site-year analyses, particularly when compared to the linear statistical technique. Crop response to fertiliser nutrients such as phosphorus (P) and potassium (K) is commonly predicted using soil test information. Fertiliser recommendations 56 from soil tests are usually based on calibration curves (Kastens et al., 2000). They used data from specific farming at Northwest Kansas to produce a model that estimated dryland wheat yield for the farm. They found that P fertiliser had little effect on wheat yield. Kastens et al. (2000) conducted a second study to estimate yield model, which incorporated detailed site-specific field information (soil test and fertilisers N and P, soil organic matter content, soil texture, and soil pH). It was found that the wheat yield responded principally to soil test P, meaning that P should be treated as a capital investment and that optimal P fertilization depended on the length of the land tenure. An analytical framework was developed in which fertiliser P after crop removal builds up the amount of soil test P in future years, increasing wheat yields. Kastens et al. (2003) suggested to use statistical technique with maximum entropy to predicted wheat yield using fertiliser N and P, soil test N and soil test P, and other causal factors as independent variables independent. Instead of the information from the farm, they also considered soil lab recommendations information. They found that the information blending techniques resulted in models that predicted yield out of sample as well as or better than a model estimated with only farm data. 2.3 NONLINEAR GROWTH MODEL The analysis of growth data becomes more important in many fields of study. Biologists are interested in the description of biological growth and in trying to understand its underlying biological process. In agriculture there are obvious economic and management advantages in knowing how large things grow, how fast they grow, and how does these factors respond to environmental conditions or treatment over time. Social scientists are interested, for example, in the growth of populations, new political parties, the foods supply and energy demand. The same types of model occur when the explanatory variable x is no longer time but increasing intensity of some other factors, such as the growth of smog with increase in solar radiation, weight gains with increased nutrient in diet, changes in crop yield 57 with increase density of planting, and so on. Tsoularis and Wallace (2002) gave a comprehensive discussion on the characteristics of logistic growth models. The applications of non-linear growth model have been studied by number of researchers, for example Amer and William (1957) applied the Gompertz curve to leaf Pelargonium growth. Rawson and Hackett (1974) fitted the Gompertz curve to leaf growth. Causton and Venus (1981) found that Gompertz curve is the most frequently used to fit leaf growth data. Hunt (1982) pointed out the most frequent research data used in nonlinear growth models is organism growth. Fekedulegn et al. (1999) discussed the application of the nonlinear growth models in forestry. Md Yunus (1999) fitted the Gompertz curve to the cocoa tree growth. Meanwhile, Zuhaimy et al. (2003) studied the Gompertz curve to tobacco growth data and Azme and Zuhaimy (2004) further explored to fit several numbers of the nonlinear models to tobacco leaf growth data and found that the nonlinear growth curve could be fitted very well to the tobacco growth data. Many authors have written about the possibility of using growth model in modelling yield growth. Penman (1956) has related growth yield (increase in dry weight) with accumulated potential transpiration. Nelder et al. (1960) has used growth model in their study on the relationship between weight of crop yield and the time. They found that the effects of years are frequently greater than those of the treatments. Holliday (1960) studied the relationship between the grain yield and plant population. He used yield per unit as the dependent variable and the plant population as the independent variable. He also conducted a study on the relationship of yield of dry matter per unit area and plant population. Kruse (1999) studied the yield growth for corn yield in China, Argentina and European Union (EU), the yield growth for wheat yield in China and the yield growth for soybean yield in Argentina. The dependent variable chosen was the soybean yield in metric tons per hectare and the independent variable was time (year). Naylor et al. (1997) provided statistical support for the yield variability using yield growth model. 58 Garcia (1983; 1988; 1989; 1989) gave detail discussions on the application of growth and yield models for forecasting purposes in forest industry. BertalanffyRichards growth model was used by Lei and Zhang (2004) in their study on forest growth and yield data with the dependent variable was the growth volume per hectare. This shows that growth model has been commonly used for modelling forest yield. Harris and Kennedy (1999) studied the use of logistic and exponential yield growth model for cereal and maize for the developed country, wheat for the United State and paddy rice for the Japan. Their study showed that the logistic and exponential yield growth models fits are essentially equally good. They also found that the logistic yield growth model is better than the exponential yield growth model used in the Japan paddy rice yield. These literatures do demonstrate that it is not unusual to model yield using growth model as by definition growth is a measure of change in some characteristics (weight, basal area, volume, etc) over some specified amount of time and yield actually is the amount of some characteristic that can be harvested per period. Reviews above have directed this work towards adopting a similar approach of modelling oil palm yield using growth models. . 2.4 APPLICATION OF NEURAL NETWORK MODELLING Neural networks technique has attracted a lot attention in the last decade. Neural networks can be regarded, in one aspect, as multivariate nonlinear analytical tools, and are known to be very good at recognizing patterns from noisy, complex data and estimating their nonlinear relationship. Neural network (NN) technology offers significant support in term of organizing, classifying and summarizing data. The literature points at several limitations in multiple regression that are overcome by neural networks. The limitations include the following; 59 The complexity of multiple regression models is limited to the linear (i) combination of decision variables (Hill and Remus, 1994). Neural networks models are not subject to model misspecification as is (ii) the case with multiple regression, especially in short-term modelling (Hill and Remus, 1994; Gorr, 1994). (iii) Neural networks can partially transform the input data automatically, whereas this represents a rather exhaustive task with multiple regression (Hill and Remus, 1994; Connor, 1988; Donaldson et al., 1993). Neural networks models are capable with their nonlinear threshold (iv) functions of handling almost any kind of nonlinearity, while this is not a quality performance guarantee with multiple regression models (Gorr, 1994; Hill and Remus, 1994). However, the neural network approach has not been used as an alternative approach in oil palm modelling, even though this approach has proved appealing to many social scientists. Zuhaimy and Azme (2001; 2003) gave a brief review on neural network and its application in forecasting. The review of application neural network will be categorised into four categories. The first is the neural network in science and technology. The second category is application of the neural network in economy. While the third category is the neural network in environmental and health and the fourth category is the application of the neural network in agriculture and agronomy. 2.4.1 Neural Network in Science and Technology Neural networks in meteorology and oceanography was studied intensively by Tangang et al. (1997; 1998). They made a comparison between the statistical methods with neural network models and, they found that NN method was involved a versatile and powerful technique capable of augmenting traditional linear statistical methods in data analysis and forecasting. They conclude that NN model is also found to be a type of variational (adjoint) data assimilation, which allows it to be 60 readily linked to dynamical models under adjoint data assimilation, resulting in a new class of hybrid neural dynamical. Michaelsen (1987) and Mihalakakou et al. (2000) described a feedforward backpropagation neural network approach for modelling and making short-term predictions on the total solar radiation time series. They found that neural networks approach leads to better predictions than the autoregressive model. Navoane and Ceccatto (1994) predicted summer monsoon rainfall over India using neural networks model. As a general outcome, they conclude that neural network can be advantageously used in his context, showing comparable or better performance than conventional methods. Jiang et al. (1994) studied the NN in area of remote sensing, the results shows that small networks performed better than the larger one. The application of neural networks model for load forecasting in Taiwan was carry out by Hsu and Chen (2003). Results suggest that the neural network model yields more accurate than the regression based model under the same exogenous variables used. 2.4.2 Neural Network in Economy Evans (1997) applied NN model in currency forecasting. For the purpose of his study, the network was trained on data derived from Forex rates based on the four principle international currencies, US$, German DM, Japanese Yen, and £UK, processed in various ways with the UK/DM cross-rate used as the target currency for forecasting purposes. He concluded that with NN model the currency could be estimate up to ten days ahead with reliable results. Yao and Poh (1995) used neural network to predict the KLSE index. There are four challenges beyond the choice of either technical or fundamental data for using neural network to forecast the stock prices, the inputs and output variables, type of neural network, neural network architecture and evaluate the quality of trained neural network. They found that NN model can be use as forecasting tools as well as traditional forecasting methods such as Box-Jenkins. Yao et al. (2000) applied the backpropagation neural network to forecast option prices of Nikkei 225 61 index future. The results suggest that for volatile markets, a neural network option model outperforms the traditional Black-Scholes model, and NN have the ability to model nonlinear patterns and learns from the historical data. Gan and Ng (1995) applied artificial neural network for forecasting multivariate FOREX and found it performed better than statistical approach. Yao and Tan (2000) used NN to study the exchange rates between American Dollar and five other major currencies, Japanese Yen, Deutsch Mark, British Pound, Swiss Franc and Australian Dollar. They found that result from NN is better than the traditional methods. Thiesing and Vornberger (1997) studied the performance of the neural network and two prediction technique used in the supermarket i.e. naïve prediction and statistical prediction to forecast sales. The result shows that neural networks outperform the two conventional techniques with regard to the prediction quality. Smith and Gupta (2000) clarified the potential of multilayered feedforward neural networks, Hopfield neural networks, and self-organizing neural networks. They found that neural network approaches to solving business problems as very similar to statistical methods, with some relaxation of assumption and more flexibility. Boussabaine and Kaka (1998) investigated the feasibility of using neural networks for predicting the cost flow of construction projects, and they found that the results are very encouraging because the different between actual and predicted value are very small. Moshiri and Cameron (2000) compared the performance of the backpropagation neural networks with the traditional econometric approaches to forecast the inflation rate. The results show the backpropagation neural networks model are able to forecast as well as all the traditional econometric methods, and to outperform them in some cases. Uysal and El Roubi (1999) studied the usefulness of artificial neural networks as an alternative approach to the use of multiple regression in tourism demand studies. Canadian tourism expenditure in the United State are used as a measure of demand to demonstrate its application. The results reveal that the use of neural networks in tourism demand studies may result in better estimates in terms of prediction bias and accuracy. Law and Au (1999) and Law (2000) showed that neural 62 networks model outperform multiple regression, naïve regression, moving average and exponent smoothing in terms of forecasting accuracy. Motiwalla and Wahab (2000) studied the application of neural networks in stock market problems. The evidence suggests that neural network models are more successful compared to regression in providing a good fit to the data generating process of stock returns and in issuing profitable trading signals. In the financial and monetary problems, Tkacz and Hu (1999) tried neural networks model to improve forecasting performance. The main findings are that, at the first quarter forecasting horizon, neural network yields no significant forecast improvements. At the fourth quarter horizon, however, the improved forecast accuracy is statistically significant. In addition, they concluded that, the improved forecast may be capturing more fundamental nonlinearities between financial variables and real output growth at the longer horizon. Angstenberger (1996) studied the application of the neural network model in predicting the S and P 500 Index. Wong and Selvi (1998) and Wong et al. (2000) discussed in detail the application of neural networks modelling in business and finance. Zhang and Hu (1998) examined the effects of number of input and hidden nodes to neural networks performance to forecast British pound/US dollar exchange rate. They conclude that neural networks outperform linear model, in addition, the number of inputs nodes has a greater impact on performance than the number of hidden nodes, while a larger number of observation do reduce forecast error. Zhang et al. (2001) studied an experimental evaluation of neural networks for nonlinear forecasting. Results show that neural networks are valuable tools for modelling and forecasting nonlinear time series while traditional linear methods are not as competent, in addition the number of inputs nodes is much more important than the number of hidden nodes. Nasr et. al. (2003), investigated the application of neural networks in gas consumption. In general, they found, the multivariate model showed better forecasting performance over the univariate model. Zuhaimy and Azme (2003) studied the application of NN to predict crude palm oil prices and found that the NN performance is better than multiple linear regression. Limsombunchai et al. (2004) studied the used of hedonic price model and neural network model in 63 predicting the house price, and found that neural network approach is better than hedonic price model. 2.4.3 Neural Network in Environmental and Health Corne et al. (1998) evaluated the performance of artificial neural networks for flood forecasting and comparison was made with other conventional hydrodynamic model, and the NN performance are sufficient good and robust do as to provide a basic for practical short term flood forecasting. Gaudart et al. (2003) applied neural networks model to epidemiological data. The performance of multiplayer perceptron and that of linear regression were compared, with regard to the quality of prediction and estimation and the robustness to deviations from underlying assumption of normality, homoscedasticity and independence of errors. The results show that both model used had comparable performance and robustness. 2.4.4 Neural Network in Agriculture Corn yield prediction using a neural network classifier trained using soil landscape features and soil fertility data was done by Shearer et al. (1994). They found that neural network classifiers can be used in predicting spatial yield variability that described the actively growing corn crop. Drummond et al. (1995) compare several methods for predicting crop yield based on soil properties. They noted that the process of understanding yield variability is made extremely difficult by the number of the factors that affect yield. They used several methods such as multiple linear regression MLR, stepwise linear regression SMLR, partial least squares PLS, projection pursuit regression PPR, and back-propagation neural network. They found that neural network gives better result than MLR, SMLR and PLS, but quite similar to PPR. They also conclude that lesscomplex statistical methods, such as standard correlation, did not seem to be particularly useful in understanding yield variability. The correlation matrices described each factor’s linear relationship to yield. However, when complex 64 nonlinear relationships between factors exist, correlation may provide inaccurate and even misleading information about these relationships. Prediction capabilities were highest for nonlinear and non-parametric methods. One method Drummond et al. (1995) tried to used was a feed-forward, backpropagation neural network for corn and soybean yield prediction. They included soil properties, such as phosphorus, potassium, soil pH, organic matter, content soil depth, and magnesium saturation as inputs and compared the results with other statistical models. The ANN showed promise as aid in understanding yield variability, although their network model needed further improvements for increasing accuracy and they did not include weather information and other factors in their ANN. There have been many applications of ANNs reported for the interpretation of images in agri-food industry. Studies showed that artificial neural networks can be as accurate as procedural model for the interpretation of images (Deck et al., 1995; Timmermans and Hulzebosh, 1996). Yang (1997) conducted a study on the used of neural network model to estimate soil temperature. Wang et al. (1999) applied NN model for classification of wheat kernal colour using the neural networks model. Yang et al. (2000) discussed on the application of artificial neural networks in image recognition and classification of crop and weeds. The objective of their study was to develop a back-propogation artificial neural network model that could distinguish young corn plant from weeds. They found that an ANN-based weed recognition system can potentially be used in the precision sprying of herbicides in agriculture field. The NN becomes more popular in agriculture since it can overcome complicated algorithm by intrepreting images quickly and effectively. Neural network model also has been used in predicting forest characteristics in southeast Alaska (Corne et al., 2000). They used neural networks model, specifically learning vector quantization, to generate models based upon simple inventory parameters such as geographical location, elevation, slope and aspect with complementary satellite image data. Predictive maps were generated by obtaining input data from digital elevation models. The results of the predictions were compared with both the data recorded in the database and with published 65 classification maps for part of the study area produced by standard satelite image interpretation methods. Liu et al. (2001) used an artificial neural network to build a corn yield prediction model for precision farming application. A feedforward, completely connected, back-propagation artificial neural network was design to approximate the nonlinear yield function relating corn yield to factors influencing yield. After the artificial neural network was developed and trained, they considered three aspects of the input factors were investigated; (i) yield trends with four input factors, (ii) interaction between nitrogen application rate and late July rainfall, and (iii) optimization of the 15 input factors with a genetic algorithm to determine maximum yield. They concluded that, the back-propagation, feed-forward neural network predicted corn yields with 80 percent of accuracy. When an example with abnormally low yield was discarded, accuracy rose to 83.5 percent. They also found that the network was able to capture the expected interaction between rainfall and amount of applied nitrogen fertiliser. Shrestha and Steward (2002) used the neural network model in modelling early growth stage of corn plant and found that neural network aproach was effective in adjusting the segmentation decision surface based on general measures of lighting changes. Kominakis et al. (2002) applied neural network model to prediction of milk yield in dairy sheep and use Pearson and rank correlations between predicted and observed to test the goodness of fit. The result showed the average difference between observed and predicted yields was generally statistically non-significant. 2.5 RESPONSE SURFACE ANALYSIS This section will review the application of response surface analysis as a tool to obtain the optimum levels of certain factors that influenced yield production, for example fertiliser application in oil palm industry. 66 Ahmad Tarmizi et al. (1986) also analyzed fertiliser trials carried out over a range of environments in Peninsular Malaysia where their yield response functions for specific soil series have also been used for formulation of fertiliser recommendation. Yield response equation which take into account curvilinear response to each fertiliser treatment and two and three factor interactions between these treatments, were fitted to the plot data. Analysis of variance indicated the significance of the individual variables in these equations. Chan et al. (1991) studied the fertilizer efficiency in oil palm in different location in Peninsular Malaysia. The yield response and environmental factors effected to the fertilizers application was investigated by Ahmad Tarmizi et al. (1991), and found that the environmental factors contributed negatively to the efficiency of urea. Verdooren (2003) and Goh et al. (2003) conducted an experiment to determine the optimum levels of fertiliser inputs that gives the optimum yields. Statistical techniques involves in his study were the regression analysis and analysis of variance. He concluded that fertiliser experiments with at least three quantitative levels can be used to derive an estimate of the agronomic and economic optimum rate but much better to include five quantitative levels based on central composite design to obtain a reliable estimate for the optimum with a small standard error. 2.6 SUMMARY The view has provided us with an overview of the area of research with the method used as summarised in Table 2.1. It shows that MLR is one of the most common methods used in studying the relationship between the oil palm yield as dependent variable with other variables, such as climate, rainfall, sunshine, etc. Neural networks is a new area which imitate the behavior the memory process in human brain as being application to many area. Its application in oil palm yield is still at its infancy and has a great potential. Nonlinear growth model is not as popular as MLR and has not been explored yet in oil palm industry. So this study will be the first to apply the nonlinear growth model and neural network model in oil palm yield modelling in Malaysia. Here, we will also provide the mathematical 67 aspect in estimating the parameters. Nonlinear growth model will be used to understand the biological process of the oil palm yield growth. The term of response surface analysis is rarely used in the earlier research in fertiliser trials. The term used is analysis of variance and quadratic response analysis. When the stationary point is saddle they could not made any decision in their results. In this study, we explore the used of ridge analysis to overcome the saddle point problem. Table 2.1: The summary of the literature reviews in this study Author/year Green (1976) Research area Method used Study the impact of fertiliser level or input Multiple linear oil palm yield and investigate the regression relationship between foliar composition and oil palm yield. Chow (1984) Forecasting oil palm yield. Multiple linear Variables: Matured area, replanting area regression and total new planting and re-planting area Foster (1985; 1987) Study the effect of agronomic factors, site Multiple linear characteristics, soil and climate to oil regression palm yield. Ahmad Tarmizi et Study the formulation of fertiliser Regression and al. (1986) recommendation. analysis of variance. Chow (1988) Study the seasonality and rainfall effect to Time series oil palm yield. analysis and correlation Kee and Chew (1991) Investigate oil palm yield response to Regression method nutrient content and moisture in East Coast of Peninsular Malaysia. Chan et al. (1991) Study the interaction of fertiliser to oil Analysis of palm yield. variance. 68 Author/year Research area Method used Ahmad Tarmizi et al. Study carried out to determine Regression and (1991) circumstances under which urea could be partial correlation. efficiently used instead of conventional inorganic nitrogen. Ahmad Tarmizi et al. The effects of N, P and K fertiliser to oil Linear regression (1999) palm yield. Oboh and Fakorede Investigate the effects of weather to yield Regression and (1999) component (number of bunch, fresh fruit correlation bunch and mean bunch weight) in Nigeria. Foong (1999) The impact of moisture and potential Correlation analysis evapotranspiration, growth and yield of palm oil. Leng et al. (2000) Study the effects of fertiliser withdrawal Analysis of on yield and yield components (bunch variance. weight). Henson and Chang Study the physiology of oil palm and it’s Plant growth (2000) effect to yield. Factors namely solar analysis. energy incident on the crop canopy, the fraction of solar, efficiency of conversion and fraction of dry matter. Belanger et al. Study the effect of nitrogen fertiliser to (2000) potato. Soon and Hong Study the fertiliser (N, P and K) response (2001) to oil palm yield in Sabah. Chin (2002) Investigate the relationship between Regression Regression. Correlation analysis climate factors, soil and planting density to yield. Foster (2003) Predicting yields response from leaf Regression and analysis. correlation 69 Author/year Research area Method used Wendroth et al. Prediction barley yield with some soil and Regression (2003) nitrogen information. Vendooren (2003) Experimental design to obtain optimum Regression and level of fertiliser inputs. analysis of variance. Chan et al. (2003) Study the climate change and its effects Discussion on yield of oil palm. Ahmad Tarmizi et al. Study the fertiliser programme towards Regression and (2004) higher yield. correlation. Henson and Mohd Study the effects of seasonality variation Time series Hanif (2004) in oil palm fruit bunch production. analysis and correlation. Azme et al. (2003) Modelling the interaction between the Multiple linear palm oil prices and others vegetable oils. regression Azme and Zuhaimy Study the collinearity effect to the model Linear regression (2003) performance and principle component regression. Garcia (1983; 1988; Nonlinear growth model in forestry 1989; 1989) Kruse (1999) Nonlinear growth model Study yield growth in various contry Nonlinear growth model Harris and Kennedy Modelling cereal and maize yield (1999) Mohd Yunus (1999) Nonlinear growth model Fitting Gompertz curve to cocoa growth. Nonlinear growth model Zuhaimy et al. (2003) Fitting Gompertz curve to tobacco growth data. Nonlinear growth model 70 Author/year Research area Method used Azme et al. (2004) Comparative study on nonlinear growth Nonlinear growth models to tobacco leaf growth data. models. Lei and Zhang (2004) Study the forest growth and yield Nonlinear growth model Drummond et al. Compare several methods for predicting (1995) crop yield based on soil properties. Corne et al. (2000) Predict forest characteristics in southeast Neural network Neural network Alaska Liu et al. (2001) Build a corn yield prediction model for Neural network precision farming application Shrestha and Steward Modelling early growth stage corn plant Neural network Prediction of milk yield in dairy sheep Neural network Forecasting crude oil palm prices. Neural network (2002) Kominakis et al. (2002) Zuhaimy and Azme (2003) From this summary, we found that multiple linear regression is the most common method used in oil palm modelling. But nonlinear growth model and neural network have not been explored in this area, so neural network model to be the major contribution into the development of the oil palm yield modelling. This is partly attributed to the fact that the statistical methodology used for fitting nonlinear models to oil palm yield growth data is closely related to the mathematics of the models and has not yet been explored. Study shows that the neural networks model more superior compared to any other model earlier proposed by other researchers. 71 CHAPTER 3 RESEARCH METHODOLOGY 3.1 INTRODUCTION This chapter discusses the research methodology development for this study. It describes several approaches in the modelling of the oil palm yield. A detailed description of the modelling approaches in the study of oil palm yield currently the used of methods such as nonlinear growth model, multiple linear regression, neural networks modelling and response surface analysis. An extensive coverage of those models is given to describe the methodology developed to suit the modelling of oil palm yield. 3.2 DATA ANALYSIS One of the most important components in the success of any modelling approach is the data. The quality, availability, reliability, relevance, and potential for repetition of the data used to develop and run the system is critical to its success. Even a primitive model can perform well if the input data has been processed in such a way that it clearly reveals the important information. On the other hand, even the best model is worth very little if the necessary input information is presented in a complex and confusing way. 72 Data processing starts with the collection and analysis of the data, followed by pre-processing; after which the data is fed to the neural network. Finally, postprocessing is needed to fit the outputs of the network to the required outputs (Figure 3.1), if necessary. This data analysis procedure was applied to all modelling approaches in this study. The data sets used in this study were analysed using statistical packages such as the Statistical Package for Social Science (SPSSx), the Statistical Analysis System (SAS), the S-Plus and Matlab. The SPSSx package was used to model the oil palm yield using multiple linear regression analysis, while robust M-regression modelling was developed using the S-Plus package. The nonlinear growth model and response surface analysis were performed with the help of the SAS package. Matlab software was used to develop the neural network model. Data Collection Data Pre-Processing Data Analysis and Modelling - Nonlinear growth model - Multiple linear regression - Robust M-regression - Neural network - Response surface analysis No Data Post-processing Yes Output Figure 3.1: Data analysis procedure used in this study 73 3.3 MODELLING This is an exploratory research project for the purpose of selecting best model to describe the data. The data was analysed using nonlinear growth model, multiple linear regression, robust M-regression, the neural network model and response surface analysis. This study began by looking at previous work on the model of nonlinear growth applied to various types of data (refer to Figure 3.1). These models were studied separately, and the results are discussed. A comparative study on each model is also given. The nonlinear growth model is used to analyse the oil palm growth data, which is naturally nonlinear. It helps us to understand the biological process of the yield growth. In analysing the foliar data, multiple linear regression and robust Mregression are used to investigate the relationship between the foliar data and the oil palm yield. The neural network model is also used in the analysis of foliar data, in order to improve the model performance. The multiple linear regression and neural network models use the same data because the natures of both models are able to model the casual relationship between dependent and independent variables. Linear relationship can be modelled by using the multiple linear regression and neural network models, but the neural network model has the ability to adapt to nonlinear relationships depending on the activation function used in the architecture. Finally, response surface analysis is introduced to obtain the optimum level of fertiliser rate in order to generate the maximum profit. The goodness of fit for all models used in this study is measured using MSE, RMSE, MAPE, the correlation coefficient and the coefficient of determination. The details of those measurements are given in Chapter 1. The theory and concept of these models are given below. 3.3.1 The Nonlinear Growth Model Suppose that Y is the dependent variable and X is the independent variables. The relationship between Y and X is nonlinear if changes in X and Y are not 74 consistent within the range of independent variables. It is also nonlinear if the parameters in the model are not linear. A simple nonlinear regression model is as follows, Y = f(β, X) + ε ( 3.1) where Y is the dependent variable, β is an unknown parameter (β1, β2, β3,…, βp), X is the independent variable or exploratory variable and ε is the error term. If the relationship of Y and X is not linear, then the expected value of β is also not linear. Draper and Smith (1981) discussed that the nonlinearity in the relationship depends only on the value of the parameter’s expected characteristic in the independent variables and not in the dependent variable. For example, in the early stages of oil palm yield growth (Figure 3.2), the yield growth increases vigorously until year five, then the yield increases slowly until year ten. This is continued until achieves a consistent level of FFB yield that normally fluctuated. 45 FFB (ton/hec/year) 40 35 30 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Year Figure 3.2: FFB yield growth versus time (year of harvest) Draper and Smith (1981) and Ratkowsky (1983) discussed the details on the family of the growth models. Ratkowski (1983) also discussed five important points to consider when developing nonlinear regression modelling: (i) Parsimony: the model should contain as few parameters as possible; 75 Parameterisation: parameter with the best estimation properties should be (ii) used; (iii) Range of applicability: the data must cover the entire range described by the model; (iv) Stochastic specification: the error should structure also be modeled; (v) Interpretability: parameters with a physical meaning are preferred; Generally the growth rate does not steadily decline, but rather increases to a maximum value before declining to zero. This growth curve also known as the Sshaped model or sigmoidal, and the growth rate is given by df ( t ) ∝ g( f ){ h( α ) − h( f )} dt (3.2) where g and h are increasing functions with g(0) = h(0) = 0. 3.3.1.1 Nonlinear Methodology For the system of equation represented by the nonlinear model Y = F(X1, X2, …, Xn, β0, β1, …, βp,) + ε Yt = F(β, Xt) + εt (3.3) where X is a matrix of the independent variables, β is a vector of unknown parameters, ε is the random error vector, F is a function of the independent variables and the parameters and Yt is the observed value on the tth experiments (t =1, 2, …,n). The least squares estimate of β* , denoted by β̂ , minimises the sum squares of error S(β) = n ∑ [Y t =1 t − F( X t ,β )] 2 (3.4) where S(β) is the sum squares of error. It should be noted that this nonlinear least squares situation may have several relative minima in addition to the absolute minimum β̂ . We can estimate the value of β̂ using the linear approximation as given in equation (3.4). Let say that in a small neighborhood of β*, lies the true value of β. By using linear the Taylor expansion 76 fi(β) ≈ fi(β*) + k ∂f ∑ ∂βi r =1 ( β r − β r* ) (3.5) r β* we can also rewrite equation (3.5) as f(β) ≈ f(β*) + F(β - β*). Hence S(β) = ||z – F.θ||2, where z = Yt – f(β) and θ = β - β*. The properties of the linear model are minimised when θ is given by θ̂ = (F.TF.)-1F.Tz, where F. = ∂f ( β ) . ∂β r When n is large enough, β̂ is almost certain to be within a small neighborhood of β*. Hence β̂ - β* ≈ θ and β̂ - β* ≈ (F.TF.)-1F.Tz. In the nonlinear situation, both X and F(β) are functions of β, and generally a closed-form solution does not exist. Thus the nonlinear procedure of SAS package uses an iterative process. A starting value for β is chosen and continually improved until the error of sum-squares, εTε (SSE), is minimised. The iterative techniques involving the matrix X are used to evaluate the current values of β and ε = Y – F(β), and the residual is used to evaluate the current values of β. The iterative process begins at a point (starting/initial value) β0. X and Y are then used to compute a value of δ such that SSE(β0 + kδ) < SSE(β0). Most nonlinear models cannot be solved analytically, so the used of the iteration method is required. Let β(k) represent an approximation to the least squares estimate β̂ of a nonlinear model. For value of β close to β(k), we use the linear Taylor expansion f(β) ≈ f(β(k) + F.(k)( β-β(k). If r(β) is the residual vector, then r(β) = Y-f(β) ≈r(β(k) – F(k)( β-β(k)). By substituting S(β) = rT(β)r(β), this leads to the equation S(β)≈rT(β(k))r(β(k))-2rT(β(k))F(k)( β-β(k)) + (β-β(k))TF(k)TF(k)( β-β(k)). The right side is minimised with respect to β when β-β(k)={(F(k)TFk)-1F(k)Tr(β(k))}=δ(k). If β(k) is the starting value, the next approximation should be β(k+1) = β(k) + δ(k). This procedure provides an iterative scheme for obtaining the β̂ values. There are four different methods to determine how the value of δ is computed to change the vector(s) of parameters (SAS, 1992), these are; (i) Gradient, δ=XTe (ii) Gauss-Newton, δ=(XTX)-1XT e 77 (iii) Newton, δ=(G-1)XTe, where G is Moore-Panrose matrix (iv) Marquardt, δ= (XTX + λdiag(XTX)-1XTe. Many authors who discussed nonlinear least squares in their work, such as Seber and Wild (1989), Ralston and Jennrich (1978), Gallant (1987) and Ratkowsky (1983), recommended relative change convergence criteria based on changes in S(β) and the parameters in going from the ith to the (i+1)th iteration. That is, if the relative change in the sum of squares at the ith iteration, (S( β ) - S( β ( i +1 ) )) S( β( i ) ) (i) (3.6) falls in the interval of 0 to ξs, where ξs is a pre-selected tolerance level such as 10-4, then the reduction in the sum of squares error is considered insufficient to warrant further computation. This is usually accompanied by a parameter relative change criterion such as | β(ji +1 ) − β(ji ) | | β(ji ) | < ξ p , j = 1, 2, …, p. (3.7) When every relative parameter changes at the ith iteration is less than ξp, the parameter increments are too small to warrant further computation and the program terminates. Gallant (1987) and Seber and Wild (1989) showed that the confidence interval of β̂ is given by βi ± tα/2 s 2 ĉii (3.8) ( ) ˆ )F( β ˆ ) −1 . where ĉii is the i-th diagonal element of Ĉcii = F T ( β The starting value of β(1), which is the initial guess at the minimum of β̂ value, can sometimes be suggested by prior information. Occasionally, there will be a starting value that tends to work well for a class of problems. Fisher’s scoring algorithm for generalised linear models as an iterative re-weighted least square method suggests a uniform starting mechanism for the whole class of models (McCullagh and Nelder, 1983; Bates and Watts, 1988). 78 However, it is quite difficult to deduce anything about the production of good starting values in general. Methods that are sometimes suggested include a grid search or a random search over a defined rectangular region of the parameter space. If no sensible bounds can be suggested for a parameter βr, a transformed parameter can be used. For example, ϕ= eβ and ϕ = arctan(β) both satisfy 0 < ϕ < 1. 1 + eβ Draper and Smith (1981) and Ratkowsky (1983) gave detailed discussions on the starting value of nonlinear models. Table 3.1 lists the nonlinear growth models considered in this study. Table 3.1: Nonlinear mathematical models considered in the study Model Equation Source Logistic φ ( T ) = α /( 1 + β exp( − κ T )) + ε Draper and Smith (1981) Gompertz φ (T ) = α exp(− β exp(−κT )) + ε Draper and Smith (1981) Von Bertalanffy φ ( T ) = [α Negative φ (T ) = α (1 − exp(−κT )) + ε Philip (1994) Monomolecular φ (T ) = α (1 − β exp(−κT )) + ε Draper and Smith (1981) Log-logistic φ (T ) = α /(1 + β exp( −κ ln( T ))) + ε Tsoularis and Wallace 1− δ − β e −κT ] 1 1− δ +ε Ratkowsky (1983) exponential (2002) Richard’s 1 ⎡ ⎞⎤ ⎛ δ ⎟ ⎜ φ (T ) = α / ⎢1 + β ⎜ exp(−κT )) ⎟⎥ + ε ⎠⎦⎥ ⎝ ⎣⎢ Ratkowsky (1983) Weibull φ (T ) = α − β exp(−κT δ ) + ε Ratkowsky (1983 Schnute φ (T ) = (α + β exp(κT ) )δ + ε Schnute (1981), Yuancai et al. (1997) Morgan-Mercer- φ (T ) = ( βγ + αT δ ) / γ + T δ + ε Ratkowsky (1983) Flodin ChapmanRichards Stannard Morgan et al. (1975), φ (T ) = α (1 − β exp(−κT )) 1 1− δ +ε φ(T ) = α [1 + exp(− (β + κT δ ))]δ + ε Draper and Smith (1981) Tsoularis and Wallace (2002) 79 where α, β, κ and δ are unknown parameters, ε is random error and T is time. 3.3.2 Regression Analysis Regression analysis has been widely used in modelling oil palm yield. This technique is used to discover the relationship between foliar nutrient composition and oil palm yield. Although, it is a common practice in the modelling of oil palm yield to use regression analysis, this study focuses on different experimental locations. We also introduced robust regression analysis to overcome the outlier problem in the data set. One of the most popular techniques which can be used to find the relationship between two variables is the regression analysis. Regression analysis is carried out to fit an equation to observe variables at a certain degree of accuracy. Since the theoretical models can never exactly describe a real world process, modelling the oil palm yield from leaf nutrient composition can never be perfect. The regression analysis technique was used widely in oil palm industry by agronomists in their research to aid policy making and decision making. The models for a multiple regression are similar to a simple linear regression model, except that they contain more terms and can be used to propose relationships more complex than a straight line. 3.3.2.1 Least Squares Method Consider a linear equation Y = βX + ε Equation (3.9) can be written in matrix form as follows; (3.9) 80 ⎡ y1 ⎤ ⎡1 x11 L x1 p ⎤ ⎡ β 0 ⎤ ⎡ ε1 ⎤ ⎢ y ⎥ ⎢1 x L x ⎥ ⎢ β ⎥ ⎢ε ⎥ 21 2p⎥ ⎢ 1 ⎥ ⎢ 2⎥ = ⎢ + ⎢ 2⎥ M M ⎥ ⎢ M ⎥ ⎢M⎥ ⎢ M ⎥ ⎢M M ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎣ yn ⎦ ⎢⎣1 xn1 L xnp ⎥⎦ ⎢⎣ β p ⎥⎦ ⎣ε n ⎦ (3.10) In general, y is an (n x 1) vector of observations, x is an (n x p) matrix of the levels of the independent variables, β is (p x 1) vector of the regression coefficients, and ε is an (n x 1) vector of random errors. We wish to find the vector of least squares estimators, b, that minimizes n Q= ∑ε i =1 2 i ′ = ε′ε = [y − Xβ ] [y − Xβ ] (3.11) Q may be expressed as Q = y′y − 2β′X′y + β′X′Xβ (3.12) Since β′X′y is a (1 x 1) matrix, its transpose ( β′X′y )′ = y′Xβ is the scalar. So, the least squares estimators must satisfy ∂Q = −2 X′y + 2 X′Xb = 0 ∂β b (3.13) X′Xb = X′y (3.14) which implies Equation (3.13) is set of least squares normal equations in matrix form. To solve the normal equations, multiply both sides of equation (3.13) by the inverse of X′X. Thus, the least squares estimator of β is b = (X′X)-1X′y ( 3.15) and the correspondence weight variance-covariance matrix are given as Var( θ̂ ) = σ2(XT X)-1 (3.16) Let say we are interested in calculate the mean response of y for the fixed value of xr. Therefore, we have to calculate the mean and variance of the mean 81 response. If ŷ r represents the unbiased estimator of yr, the expectation of mean response can be given by E( ŷ r ) = E( X Tr θˆ ) = X Tr θ (3.17) and the variance of mean response by Var( ŷ r ) = σ 2 X Tr ( X T X) −1 X r (3.18) By substituting s2 to estimate σ2, a 100(1-α)% confidence interval of mean response can be given by ŷr ± tα / 2,( n − p −1) s X Tr ( X T X) −1 X r (3.19) Multiple linear regression can also be used to predict the future value of a dependent variable by a given input vector, Xf (Chatterjee and Price, 1991; Birkes and Dodge, 1993). These different unknown parameter vectors, θˆ will give different predictions. Let yp represent the prediction value on a new point, xp. The prediction value of yp can be obtained by yp = xp θ̂ + ep The mean and variance of this prediction are therefore E(yp) = E(xp θ̂ + ep) = xTpθˆ = xTpθ ( ) Var(yp) =Var(xp θ̂ + ep) = σ2 + xpT Var[θˆ ] x p = σ2 + σ2 xpT (XT X ) x p −1 = σ2(1 + xpT (XT X ) x p ) −1 A 100(1-α)% confidence interval of mean response for a predicted value is given as yˆ p ± tα / 2, ( n − p −1) s 1 + xTp (xT x) −1 x p (3.20) 3.3.3 Robust M-regression A statistical procedure is regarded as robust if it performs reasonably well even when the assumptions of the statistical model are not true. If we assume our 82 data follows standard linear regression, least squares estimates and test perform quite well, but they are not robust with the presence of outlier observation(s) in the data set (Rousseeuw and Leroy, 1987). In this case we are proposed the robust M-regression to model the yield data, since the Q-Q plot detected the present of outlier in the data set. Peter Huber introduced the idea of M-regression in year 1964 (Huber, 1981). In least squares estimation, values of β̂ are chosen so that ∑ηˆ 2 i is small as possible for i = 1, 2, .., n, where η is error. In the least absolute deviation estimation, the value of ∑ | ηˆ i | is smallest. In robust M-regression, this idea is generalized and values of β̂ are chosen so that ∑ ρ (ηˆ ) is minimised, where ρ(η̂ ) is some function i of η. Therefore, least squares and least absolute deviation estimation can be regarded as special cases of M-estimation where ρ(η) = η2 and ρ(η) = |η| respectively. Huber (1981) defined the function of ρ(e) as follows; ⎧ η2 ρ(η) = ⎨ 2 ⎩2 k | η | − k if if −k ≤η ≤ k η<k or k <η (3.21) Following Huber, let k = 1.5 σ̂ , where σ̂ is an estimate of the standard deviation σ of the population of random errors. In order to make ρ(η) a smooth function, 2k|η| - k2 is used instead of |η|, σ̂ = 1.483(MAD), where MAD is the median absolute deviation | η̂ |. The multiplier 1.483 was chosen to ensure that σ̂ would be a good estimate of σ if it were the case that the distribution of the random errors was normal. The Huber M-estimates of β̂ are the values of b that minimize ∑ ρ (y i − (b0 + b1 xi1 + ... + bp xip ) ) where ρ(η) is the function defined in (3.21). (3.22) 83 It is convenient to use vector notation. The vector β̂ of the Huber M-estimates is defined as the vector b that minimizes ∑ ρ(y i − b′x i ) . The vector of regression coefficients, denoted by β , is first estimated by the vector of least-squares estimates. This initial estimate of β is used to calculate deviations and an estimate of σ. The algorithm iterated in this way until a step is reached at which the improved estimate of β is the same (or at least approximately the same) as the previous estimate. For example, at any step suppose that b0 be the current estimate of β . Calculate the deviation yi − (b 0 )′ xi and from this result calculate σ̂ = 1.483(MAD). Now we make the following adjustment of the y values to get rid of any large deviations. The deviation of yi from the current estimated regression line is η i0 = y i − (b 0 )′ xi . Thus y i = (b 0 )′ xi + η i0 . Then, define y i* = (b 0 )′ xi + η i* , where η i* is the adjusted deviation obtained by truncating η i0 so that no deviation is larger than 1.5 σ̂ at its absolute value. Let the improved estimate of β the least squares estimate obtained from the adjusted data y1* , y 2* ,..., y n* . Huber M-estimates are obtainable from the S-Plus package (Becker et al., 1988). 3.3.4 Neural Network Model Use of the neural network model has influenced research activities in numerous fields. This method has been applied to estimate the yield of short term crops (Shearer et al., 1994; Drummond et al., 1995; Belanger et al., 2000; Liu et al., 2001; Welch et al., 2003), but from the literature it is found that the application of the neural network model in the oil palm yield modelling has not been explored. 84 3.3.4.1 Introduction to the Neural Network Neural network or popularly known as Artificial Neural Networks (ANN), are computational models that consist of a number of simple processing units which communicate by sending signals to each other over a large number of weighted connections. The original neural network design was inspired by the human brain. In human brains, a biological neuron collects signals from other neurons through a host of fine structures called dendrites. The neuron sends out spikes of electrical activity through a long, thin stand known as an axon, which splits into thousands of branches. At the end of each branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity in the connected neurons. When a neuron receives excitatory input that is sufficiently large in comparison to its inhibitory input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on another is changed. Like human brains, neural networks also consist of processing units (artificial neurons) and connections (weights) between those units. The processing units transport incoming information on their outgoing connections to other units. The "electrical" information is simulated with specific values stored in those weights that give these networks the capacity to learn, memorize, and create relationships between data. A very important feature of these networks is their adaptive nature where "learning by example" replaces "programming" in solving problems. This feature renders these computational models very appealing in applications where one has little, or an incomplete understanding, of the problems to be solved, but where training data is available. There are many different types of neural networks, and they are being used in many fields. New uses for neural networks are being devised daily by researchers. Some of the most traditional applications include (Master, 1993; Welstead, 1994; Bishop, 1995; Patterson, 1996): 85 • Classification – To determine military operations from satellite photographs; to distinguish among different types of radar returns (weather, birds, or aircraft); to identify diseases of the heart from electrocardiograms. • Noise reduction – To recognize a number of patterns (voice, images, etc.) corrupted by noise. • Prediction – To predict the value of a variable, given historic values. Examples include forecasting of various types of loads, market and stock forecasting, and weather forecasting. The model built in this study falls into the category of prediction. It may be classified into modelling using ANN to predict oil palm yield. 3.3.4.2 Fundamentals of Neural Networks Neural networks, sometimes referred to as connectionist models, are paralleldistributed models that have several distinguishing features (Patrick and Smagt, 1996; Patterson, 1996; Lai 1998; Hykin, 1999): a) A set of processing units; b) An activation state for each unit, which is equivalent to the output of the unit; c) Connections between the units. Generally each connection is defined by a weight wjk that determines the effect that the signal of unit j has on unit k; d) A propagation rule, which determines the effective input of the unit from its external inputs; e) An activation function, which determines the new level of activation based on the effective input and the current activation; f) An external input (bias, offset) for each unit; g) A method for information gathering (learning rule); h) An environment within which the system can operate, provide input signals and, if necessary, error signals. 86 3.3.4.3 Processing Unit A processing unit (Figure 3.3), also called a neuron or node, performs a relatively simple job; it receives inputs from neighbours or external sources and uses them to compute an output signal that is propagated to other units. x0 x1 θj w j0 w j1 Σ ... w jn xn n j aj aj = ∑ w jixi + θj g(a j) zj zj = g (aj ) i =1 Figure 3.3: Processing unit Within the neural systems there are three types of units: (i) Input units, which receive data from outside the network; (ii) Output units, which send data out of the network; (iii) Hidden units, whose input and output signals remain within the network. Each unit j can have one or more inputs x0, x1, x2, …, xn, but only one output zj. An input to a unit is either data from outside the network, the output of another unit, or its own output. 3.3.4.4 Combination Function Each non-input unit in a neural network combines values that are fed into it via synaptic connections from other units, producing a single value called net input. The function that combines those values is known as the combination function, which is defined by a certain propagation rule. In most neural networks it is assumed that each unit provides an additive contribution to the input of the units with which it is connected. The total input to unit j is simply the weighted sum of the separate outputs from the connected units plus a threshold or bias term θj : 87 n aj = ∑ wjixi + θj (3.23) i =1 The contribution for positive wji is considered as an excitation and the contribution for negative wji is considered an inhibition. We call units with the above propagation rule sigma units. In some cases more complex rules for combining inputs are used. One of the propagation rules known as sigma-pi has the following format as follows n m i =1 k =1 aj = ∑ wji∏ xik + θj (3.24) Many combination functions use a "bias" or "threshold" term in computing the net input to the unit. For a linear output unit, a bias term is equivalent to an intercept in a regression model. It is required in much the same way as the constant polynomial ‘1’ is required for approximation by polynomials. 3.3.4.5 Activation Function Most units in neural network transform their net inputs by using a scalar-toscalar function called an activation function, yielding a value known as the unit's activation. With the possible exception of output units, the activation value is fed to one or more other units. Activation functions with a bounded range are often called squashing functions. Some of the most commonly used activation functions are (Fausett, 1994): i) Identity function (Figure 3.4) g ( x) = x (3.25) It is clear that the input units use the identity function. Sometimes, a constant is multiplied by the net input to form a linear function. 88 g(x) 1 0 -1 0 1 x -1 Figure 3.4: Identity function ii) Binary step function (Figure 3.5) Also known as threshold function or Heaviside function. The output of this function is limited to one of the two values: ⎧1 g ( x) = ⎨ ⎩0 if( x ≥ θ ) if( x < θ ) (3.26) This kind of function is often used in single layer networks. g(x) 1 0 -1 x 0 1 2 3 Figure 3.5: Binary step function iii) Sigmoid function (Figure 3.6) g ( x) = 1 1 + e−x (3.27) This function is especially advantageous for use in neural networks trained by back-propagation, because it is easy to differentiate, and thus can dramatically reduce the computation burden for training. It applies to applications whose desired output values are between 0 and 1. 89 g(x) 1 0 -6 -4 -2 x 0 2 4 6 Figure 3.6: Sigmoid function iv) Bipolar sigmoid function (Figure 3.7) g ( x) = 1 − e− x 1 + e−x (3.28) This function has similar properties to the sigmoid function. It works well for applications that yield output values in the range of [-1,1]. g(x) 1 0 -6 -4 -2 0 2 4 6 x -1 Figure 3.7: Bipolar sigmoid function Activation functions for the hidden units are needed to introduce non-linearity into the networks. The reason for this is that a composite of linear functions is also a linear function. However, it is the non-linearity (i.e., the capability to represent nonlinear functions) that makes multi-layer networks so powerful. Almost any nonlinear function does the job, although for back-propagation learning it must be differentiable and it helps if the function is bounded. The sigmoid functions are the most common choices (Sarles, 1997). For the output units, activation functions should be chosen to be suited to the distribution of the target values. We have already seen that for binary (0,1) outputs, the sigmoid function is an excellent choice. For continuous-valued targets with a 90 bounded range, the sigmoid functions are again useful, provided that either the outputs or the targets can be scaled to the range of the output activation function. However, if the target values have no known bounded range, it is better to use an unbounded activation function. The identity function (which amounts to no activation function) is often used for this purpose. If the target values are positive but have no known upper bound, an exponential output activation function can be used (Sarles, 1998). 3.3.4.6 Network topologies The number of layers, the number of units per layer, and the interconnection patterns between layers defines the topology of a network. They are generally divided into two categories based on the pattern of connections: 1) Feed-forward networks (Figure 3.8), where the data flow from input units to output units is strictly feed-forward. The data processing can extend over multiple layers of units, but no feedback connections are present. That is, connections extending from outputs of units to inputs of units in the same layer or previous layers are not permitted. Feed-forward networks are the main focus of this study (Patterson, 1996; Hykin, 1999). x0 b ia s h0 x1 b ia s y1 h1 x2 w (1 ) ji hm H idd en L ayer … In pu t L aye r … … xl y2 h2 w (2) kj yn O utp ut La yer Figure 3.8: Feed-forward neural network 2) Recurrent networks (Figure 3.9), which contain feedback connections. Contrary to feed-forward networks, the dynamical properties of the network 91 are important. In some cases, the activation values of the units undergo a relaxation process to ensure that the network will evolve to a stable state in which activation does not change further. In other applications in which the dynamical behavior constitutes the output of the network, the changes of the activation values of the output units are significant (Patterson 1996 and Hykin 1999). h 0 x 1 x l h 1 … … h In p u t L a y e r 0 y 0 y 1 y n … x m H id d e n L a y e r O u tp u t L a y e r Figure 3.9: Recurrent neural network Alternatively, neural networks can be divided in two classes, supervised and unsupervised neural networks. Supervised neural networks, such as the perceptron, use a supervised learning algorithm, which means that input and output data is required during the training phase. The most common training algorithm is the backpropagation algorithm. On the other hand, unsupervised neural networks, such as the Kohonen Network, require only input data to be trained. They organise the input data themselves, according to a similarity metric. A recurrent neural network is a neural network where the connections between the units form a directed cycle. Recurrent neural networks must be approached differently than feedforward neural networks, both when analysing their behavior and training them. Recurrent neural networks behave chaotically. Usually, dynamical systems theory is used to model and analyse them. 92 3.3.4.7 Network Learning The functionality of a neural network is determined by the combination of the topology (number of layers, number of units per layer, and the interconnection pattern between the layers) and the weights of the connections within the network. The topology is usually held fixed, and a certain training algorithm determines the weights. The process of adjusting the weights to make the network learn the relationship between the inputs and targets is called learning, or training. Many learning algorithms have been invented to help find an optimum set of weights that results in the solution of the problems. They can roughly be divided into two main groups: (i) Supervised Learning - The network is trained by providing it with inputs and desired outputs (target values). These input-output pairs are provided by an external teacher, or by the system containing the network. The difference between the real outputs and the desired outputs is used by the algorithm to adapt the weights in the network (Figure 3.10). It is often posed as a function approximation problem - given training data consisting of pairs of input patterns x, and corresponding target t. The goal is to find a function f(x) that matches the desired response for each training input. Oil Palm Data Input Desired output Feedforward Neural Network + in out Weight changes error Objective Function Training Algoritham - Back-propagation - Training using Levernberg-Marquardt Figure 3.10: Supervised learning model for oil palm data 93 (ii) Unsupervised Learning - With unsupervised learning, there is no feedback from the environment to indicate if the outputs of the network are correct. The network must discover features, regulations, correlations, or categories in the input data automatically. In fact, for most varieties of unsupervised learning, the targets are the same as inputs. In other words, unsupervised learning usually performs the same task as an auto-associative network, compressing the information from the inputs. 3.3.4.8 Objective Function The objective function is used to determine how well the network performs. The function must be defined to provide an unambiguous numerical rating of system performance. The selection of an objective function is very important because the function represents the design goals and decides what training algorithm should be taken. Even though it is difficult to develop an objective functions that measures exactly, a few basic functions may be used, such as sum squares error function, E= 1 P ∑ NP p =1 N ∑ (t pi − ypi ) 2 (3.29) i =1 where p indexes the patterns in the training set, i indexes the output nodes, and tpi and ypi are, respectively, the target and actual network output for the ith output unit on the pth pattern. In real world applications, it may be necessary to complicate the function with additional terms to control the complexity of the model. 3.3.4.9 Basic Architecture Feed-Forward Neural Network A layered feed-forward network consists of a certain number of layers, and each layer contains a certain number of units. There is an input layer, an output layer, and one or more hidden layers between the input and the output layer. Each unit receives its inputs directly from the previous layer (except for the input units) and sends its output directly to units in the next layer (except for the output units). Unlike the Recurrent network, which contains feedback information, there are no 94 connections from any of the units to the inputs of the previous layers, nor to other units in the same layer, nor to units more than one layer ahead. Every unit only acts as an input to the next immediate layer. Obviously, this class of networks is easier to analyze theoretically than other general topologies because their outputs can be represented with explicit functions of the inputs and the weights. An example of a layered network with one hidden layer is shown in Figure 3.8. In this network there are l inputs, m hidden units, and n output units. The output of the jth hidden unit is obtained by first forming a weighted linear combination of the l input values, then adding a bias, l aj = ∑ w (ji1) xi + w (1) j 0 (3.30) i =1 where w (ji1) is the weight from input i to hidden unit j in the first layer and w (j10) is the bias for hidden unit j. If we consider the bias terms as being weights from an extra input x 0 = 1 , equation (3.30) can be rewritten in the form of, l aj = ∑ w (ji1) xi (3.31) i =0 The activation of hidden unit j, can then be obtained by transforming the linear sum using an activation function g (x) : hj = g (aj ) (3.32) The outputs of the network can be obtained by transforming the activation of the hidden units using a second layer of processing units. For each output unit k, we first acquire the linear combination of the output of the hidden units, m ak = ∑ wkj( 2 ) hj + wk( 20) (3.33) j =1 Again, we can absorb the bias, and rewrite the above equation as, m ak = ∑ wkj( 2) hj j =0 (3.34) 95 Next, by applying the activation function g 2 ( x) to equation (3.34) we can acquire the kth output yk = g 2 (ak ) (3.35) Combining equations (3.31), (3.32), (3.34) and (3.35) we acquire the complete representation of the network as ⎡m ⎛ l ⎞⎤ yk = g 2 ⎢∑ wkj( 2) g ⎜ ∑ w (ji1) xi ⎟⎥ ⎝ i =0 ⎠⎦ ⎣ j =0 (3.36) The network of Figure 3.8 is a network with one hidden layer. We can extend it to a network with two or more hidden layers easily as long as we continue with the above transformation. One thing we need to note is that the input units are very special units. They are hypothetical units that produce outputs equal to their supposed inputs. These input units do no processing. (a) Backpropagation Back-propagation is the most commonly used method for training multi-layer feed-forward networks. It can be applied to any feed-forward network with differentiable activation functions. This technique was popularized by Rumelhart, Hinton and Williams (Rumelhart et al., 1986). For most networks, the learning process is based on a suitable error function, which is then minimised with respect to the weights and bias. If a network has differential activation functions, then the activations of the output units become differentiable functions of input variables, the weights and bias. If there is a differentiable error function of the network outputs, such as the sum-of-squares error function, then the error function itself is a differentiable function of the weights. 96 Therefore, we can evaluate the derivative of the error with respect to the weights, and then by using either the popular gradient descent or other optimisation methods, we can use these derivatives to find weights that minimise the error function. The algorithm for evaluating the derivative of the error function is known as backpropagation, because it propagates the errors backwards through the network. (b) Error Function Derivative Calculation We consider a general feed-forward network with arbitrary differentiable nonlinear activation functions and a differential error function. From Section 3.3.4.9 (a), we know that each unit j is obtained by first forming a weighted sum of its inputs of the form, aj = ∑ wjizi (3.37) i where zi is the activation of an unit or input. We then apply the activation function zj = g (aj ) (3.38) Note that one or more of the variables zj in equation (3.37) could be an input, in which case we will denote it using xi. Similarly, the unit j in equation (3.38) could be an output unit, which we will denote using yk. The error function will be written as a sum, over all the patterns in the training set, of an error defined for each pattern separately, E = ∑ Ep , Ep = E (Y ;W ) , (3.39) p where p indexes the patterns, Y is the vector of outputs, and W is the vector of all weights. Ep can be expressed as a differentiable function of the output variable yk. 97 The goal is to find a way to evaluate the derivatives of the error function E with respect to the weights and bias. Using equation (3.39) we can express these derivatives separately for each pattern as sums over the training set patterns of the derivatives. For each pattern (with all the inputs) we can acquire the activations of all hidden and output units in the network by successive applications of equation (3.37) and equation (3.38). This process is called forward propagation or forward pass. Once we have the activations of all the outputs, together with the target values, we can calculate the full expression of the error function Ep. Now consider the evaluation of the derivative of Ep with respect to some weight wji . Application of the chain rule result in the partial derivatives ∂Ep ∂Ep ∂aj ∂aj = = δj = δjzi ∂wji ∂aj ∂wji ∂wji (3.40) where we define δj = ∂Ep ∂aj (3.41) From equation (3.40) it is easy to see that the derivative can be obtained by multiplying the value of δ (for the unit at the output end of the weight) by the value of z (for the unit at the input end). The task is now to find the value of δj for each of the hidden and output units in the network. For the output units, δk is very straightforward, δk = ∂Ep ∂Ep = g ′(ak ) ∂ak ∂yk (3.42) For a hidden unit, δk is obtained indirectly. Hidden units can influence the error only through their effects on the unit k to which they send output connections. δk is obtained using the equation below, δj = ∂Ep ∂Ep ∂ak =∑ ∂aj k ∂ak ∂aj (3.43) 98 The first factor is simply the δk of unit k, therefore we can write the equation as δj = ∂Ep ∂ak = ∑ δk ∂aj ∂aj k (3.44) For the second factor we know that if unit j connects directly to unit k then ∂ak ∂aj = g ′(aj ) wkj , otherwise it is zero. In this way, we can form the following backpropagation formula, δj = g ′(aj )∑ wkjδk (3.45) k which means that the values of δ for a particular hidden unit can be obtained by propagating the value of δ’s backwards from units later on in the network, as shown in Figure 3.11. By recursively applying the equation we obtain the values of δ’s for all of the hidden units in a feed-forward network, no matter how many layers it has. i wji wkj j δj = g ′( aj )∑ wkjδk k δk … δj k Figure 3.11: Backward propagation (c) Weight Adjustment with the Gradient Descent Method Once we obtain the derivatives of the error function with respect to weights, we can use them to update the weights so as to decrease the error. There are many varieties of gradient-based optimisation algorithms based on these derivatives. One of the simplest of such algorithms is called gradient descent or steepest descent. With this algorithm, the weights are updated in the direction in which the error E decreases most rapidly (along the negative gradient). The process of weight updation begins with an initial guess for weights (which may be chosen randomly), and then generates a sequence of weights using the following formula, ∆w (jiτ +1) = −η ∂E ∂wji (3.46) 99 where η is a small positive number called the learning rate (which is the increase in size required we need for the next step). Gradient descent only tells us the direction we should move to the step size, or learning rate, needs to be determined as well. Setting a learning rate too low result in slow development of the network, while with too high a learning rate will lead to oscillation. One way to avoid oscillation for large values of η is to make the weight change dependent on the past weight change by adding a momentum term, ∆w (jiτ +1) = −η ∂E + α∆w (jiτ ) ∂wji (3.47) That is, the weight change is a combination of a step down the negative gradient, plus a fraction α of the previous weight change, where 0 ≤ α < 1 and typically 0 ≤ α < 0.9 ( Reed and Mark, 1999). The role of the learning rate and the momentum term are shown in Figure 3.12 (Patrick and Smagt, 1996). When no momentum term is used, a low learning rate typically results in a long wait before the minimum is reached (a), whereas for large learning rates the minimum may be never reached because of oscillation (b). When adding a momentum term, the minimum will be reached faster (c). Figure 3.12: The descent vs. learning rate and momentum 100 There are two basic weight-update variations: batch learning and incremental learning. With batch learning, the weights are updated using all of the training data. The following loop is repeated: a) Process all the training data; b) Update the weights. Each loop through the training set is called an epoch. For incremental learning, the weights are updated separately for each sample. The following loop is repeated: a) Process one sample from the training data; b) Update the weights. 3.3.5 RESPONSE SURFACE ANALYSIS Response surface analysis has been widely used in oil palm research. The purpose of implementing this technique is to determine the optimum levels of fertiliser usage in order to optimise oil palm yield. Although, it is been a common practice in modelling of oil palm yield to use response surface analysis, this study focus on experimental location. Conclusions cannot be drawn if the stationary point is saddle. Hence, this study proposes the used of ridge analysis to offer an alternative solution for the problem mentioned above. 3.3.5.1 Introduction Response surface analysis (RSA), also known as response surface methodology (RSM), is a collection of statistical and mathematical techniques useful for developing, improving and optimisation processes (Myers and Montgomery, 1995). It also has important applications in the design, development and formulation of new products, as well as in the improvement of existing product designs. Usually, RSA applications deal with several input variables in which we assume there is a potential influence; some performance measure, or output of a certain product, or process, or oil palm yield in agronomy. The input variables are sometimes called independent variables, and they are subject to the control of the researcher or scientist (at least for purposes of an experiment). 101 The common characteristic in experimental design analysis is to find the influence of the various factors on the variables analysed. Response surface analysis still considers that characteristic, but the emphasis includes finding the particular treatment combination which causes the maximum or minimum response in yield or the variables analysed (Anderson and McClean, 1974). In general, the response surface will be a nonlinear function, which may be sufficiently approximated with a quadratic polynomial in a small region near the optimum operating condition. It should be remembered that just because a higher order effect is statistically significant, it does not follow that the effect is of practical importance. It may be possible to ignore statistically significant higher order effects if they have little practical effect on the estimated response surface (Christensen, 2001). 3.3.5.2 Response Surface: First Order Suppose that the scientist or experimenter is concerned with the maximum of oil palm yield, involving a response y that depends on the controllable variables ζ1, ζ2, …, ζp. The relationship can be defined as y = φ (ζ1+ ζ2 +…+ ζp) + ε (3.48) where the form of the true response function f is unknown and may be very complicated, and ε is a term that represents other sources of variability not accounted for in f. Therefore, ε includes effects such as the measurement error on the response and the effects of other variables, and so on. We often assume ε as a statistical error and which has a normal distribution with a mean of zero and variance σ2. If so, then the expectation of y, E(y) is E(y) = E[f(ζ1, ζ2, …, ζp)] + E(ε) = f(ζ1, ζ2, …, ζp) (3.49) The variables ζ1, ζ2, …, ζp are usually called natural variables (Myers and Montgomery, 1995), because they expressed the original unit. In RSA work it is convenient to transform the natural variables to coded variables x1, x2, …, xp, where 102 these coded variables are usually defined to be dimensionless with zero mean and same standard deviation. Equation (3.49) can now be written as φ(x) = φ(x1, x2, …, xp) (3.50) Because the form of the true function f is unknown, we must make some approximations. Usually, a low order polynomial in some relatively small region of the independent variables space is appropriate. If there is no interaction among independent variables, the model is known as the main effects model, because it includes only the main effect(s) of the variable. If the interaction among the independent variables is significant, as in equation (3.50), the interaction variables can be added to the model easily as φ(x) = φ(x1, x2, …, xp, x1x2, …, xp-1xp) (3.51) Equation (3.50) and equation (3.51) are also known as first order models, because they only include the main effect(s) and the interaction with polynomials order one. The first order Taylor approximation about the center vector (x = 0) of the data is written as p ⎡ ∂φ ( x) φ ( x) = φ (0) + ∑ ⎢ i =1 ⎢⎣ ∂xi ⎤ ⎥ xi = φ(0) + x′dφ(0) xi = 0 ⎥ ⎦ where dφ(0) is the vector of partial derivatives (3.52) ∂φ ( x) evaluated at the vector x = 0. ∂xi We do not know f(x), so we do not know the partial derivatives; they are simply unknown values. We first identify; ∂φ ( x) ∂x1 β0 = φ(0), β1= , …, βp = x =0 ∂φ ( x) ∂x p (3.53) x =0 then we can write the equation as φ(x) = β0 + p ∑β j =1 j x j = β0 + x′β, where β = (β1, …, βp)′ (3.54) Applying equation (3.48) for each observation i = 1, …, n to this result gives yi = β0 + p ∑β j =1 j xij + ε i (3.55) 103 Often in the real cases, the true response surface is strong enough that the first model (even with the interaction term included) is inadequate. A second order model will likely be required in these situations. A second order model can be expressed as follows, φ(x) = f(x1, x2, …, xp, x1x2, x12, x22, …, xp-1xp) (3.56) Myers and Montgomery (1995) had a long discussion about the first and second order models. They noted that the second order model is widely used in RSA research, because of three reasons. The second order model is very flexible; it can take on a wide variety of (i) functional forms. It is easy to estimate the parameter in the second order model (for (ii) example by using least square method). (iii) There is considerable practical experience indicating that second order models work well in solving real response surface problems. 3.3.5.3 Response Surface: Second Order In response surface analysis work it is assumed that the true functional relationship y = f(x, β) (3.57) is, in fact, unknown. Here (x1, x2, …, xp) are in a centered and scaled design unit. The genesis of the first order approximating model, or the model that contains first order terms and low order interaction terms, is the notation of the Taylor series approximation of equation (3.52). In general, second order response surface models can be written as y = β0 + β1x1 + β2x2 + … + βpxp + β11x12 + … + βppxp2 + β12x1x2 + β13x1x3 + … + βp-1, pxp-1xp + ε where βi is an unknown parameter and ε is a random error. (3.58) 104 The second order Taylor approximation about the center vector x = 0 of the data is written as φ(x) = φ(0) + x′dφ(0) + x′[d2φ(0)] x 2 (3.59) where dφ(0) was defined previously, and d2φ(0) is the p x p matrix of the second partial derivative evaluated at the vector x = 0. The element of d2φ(0) in the ith row ∂ 2φ ( x ) and j column is evaluated at x = 0. ∂xi x j th Again, we do not know φ(x), so we do not know the derivatives and we must write φ(x) = β0 + x′β + x′Bx, (3.60) where β = (β1, …, βp)′ = dφ(0) as before. We can now define B̂ as b1p ⎤ ⎡ b12 L ⎢ b11 ⎥ 2 2⎥ ⎢ b ˆ = ⎢ M b22 L 2 p ⎥ = 1 d 2φ (0) B ⎢ 2⎥ 2 ⎢ M O M ⎥ ⎢ ⎥ bpp ⎥⎦ ⎢⎣sym. (3.61) With this definition of B, the approximation becomes φ(x) = β0 + p p ∑ β j x j + ∑∑ β jk x j xk j =1 (3.62) j =1 k ≥ j Applying equation (3.48) to each observation to this result gives the equation yi = β0 + p p ∑ β j xij + ∑∑ β jk xij xik + ε i j =1 (3.63) j =1 k ≥ j This approximation is multiple linear regression, which we already know how to fit. 105 3.3.5.4 Stationary Point Consider the true functional relationship with the second order polynomial as yˆ = b0 + x′b + x′Bˆ x (3.64) where b0, b and B̂ contains estimates of the intercept, linear and second order coefficients, respectively. In fact, x′ = [x1, x2, …, xp], b′=[b1, b2, …, bk] and B̂ is the k x k symmetric matrix b1 p ⎤ ⎡ b12 L ⎢ b11 ⎥ 2 2 ⎥ ⎢ b2 p ⎥ b22 L Bˆ = ⎢ M ⎢ 2 ⎥ ⎢ M O M ⎥ ⎢ ⎥ b pp ⎦⎥ ⎣⎢ sym. (3.65) One can differentiate ŷ in equation (3.65) with respect to x and obtain ∂yˆ = b + 2Bˆ x ∂x (3.66) where b is the estimate of the linear coefficients and is the estimate of the second order coefficients. Allowing the derivative to be set to 0, we can solve for the stationary point, xs of the system. As a result, we obtain the solution xs as − Bˆ −1b xs = 2 (3.67) The sign of the stationary point is determined from the signs of the eigenvalues of the matrix B̂ . It turns out that the relative magnitudes of these eigenvalues can be helpful in the total interpretation. For example, let the k x k matrix G be the matrix whose columns are the normalised eigenvectors associated with the eigenvalues of Bˆ . We know that G′ Bˆ G = Λ, where Λ is a diagonal matrix containing the eigenvalues of B̂ as main diagonal elements. If we translate the model of equation (3.54) to a new centre, namely the stationary point, and rotate to axes corresponding to the principle axes of the contour system, we have v = x - xs and w = G′v (3.68) 106 This translation gives ŷ = b0 + (v + xs)′ b + (v + xs)′ B̂ (v + xs) = ŷ s + v′ B̂ v (3.69) The rotation gives ŷ = ŷ s + w′ G′ B̂ Gw = ŷ s + w′Λw (3.70) The w-axes are the principle axes of the contour system. Equation (3.60) can be written as p ŷ = ŷ s + ∑λ w i =1 i 2 i (3.71) where ŷ s is the estimated response at the stationary point, and λ1, λ2, λ3, …, λp are the eigenvalues of B̂ . The variables w1, w2, w3, …, wp are known as canonical variables. The sign of the λ values determine the nature of the stationary point, and the relative magnitude of eigenvalues help us to gain a better understanding of the response system. If all the λ values are negative, the stationary point is a point of maximum response. If all λ values are positive, the stationary point is a point of minimum response, and if the λ values have mixed signs, the stationary point is a saddle point (Myers and Montgomery, 1995 ; Christensen, 2001). 3.3.5.5 Ridge Analysis The main purpose of ridge analysis is to ensure that the stationary point is inside the experimental region. The output of the analysis is the set of coordinates of the maxima (or minima) along with the predicted response, ŷ , at each computed point on the path. This analysis provides useful information regarding the roles of the design variables inside the experimental region. Ridge analysis may provide some guidelines regarding where future experiments should be made in order to 107 achieve condition that are more desirable. However, ridge analysis is generally used when the practitioner feels that the point is near the region of the optimum. Consider equation (3.54), which maximize ŷ subject to the constraint x′x = H2, where x′= [x1, x2,…, xp] and the center of the design region is taken to be x1 = x2 = …= xp = 0. Using Langrange multipliers, differentiate, J = b0 + x′b + x′ B̂ x - κ( x′x – H2) with respect to the vector x. The derivative of J with respect to x is given by ∂J = b + 2Bˆ x − 2κx , and the constrained stationary point is determined by setting ∂x ∂J ∂x = 0. This gives the result (Bˆ − κI )x = − 12 b (3.72) As a result, for any fixed value of κ, a solution x of equation (3.62) is a stationary point on H = (x′x)1/2. However, the appropriate solution x is that which results in a maximum ŷ on H or a minimum ŷ on H, much depending on which is desired. The appropriate choice of κ depends on the eigenvalues of the B̂ matrix. Myers and Montgomery (1995) provided important rules on selecting the value of κ. (i) If κ exceeds the largest eigenvalue of B̂ , the solution x in equation (3.62) will result in an absolute maximum for ŷ on H = (x′x)1/2. (ii) If κ is smaller than the smallest eigenvalue of B̂ , the solution x in equation (3.62) will result in an absolute minimum for ŷ on H = (x′x)1/2. Appendix E provides some mathematical insight into (i) and (ii) above. Then we also examined the relationship between H and κ. The analyst desires to observe results on a locus of points. As a result, the solution of equation (3.72) should fall in the interval [0, Hb], where Hb is a radius approximately representing the boundary of the experimental region. The value H is actually controlled through the choice of κ. In the working region of κ namely κ > λm or κ < λ1, (where λ1 is the smallest eigenvalue of B̂ and λm is the largest eigenvalue of B̂ ), H is a monotonic function of κ. 108 3.3.5.6 Estimate of the standard error of predicted response We now consider the equation ŷ (x) = x(Ω)′b (3.73) where b = (X′X)-1X′y, x(Ω) is the function of the location at which one is predicting responses, and the Ω notation indicates ‘model space’, that is, the vector reflects the form of the model as X does. From the result of the multiple linear regression, with the assumption of constant error variance σ2, we have Var ŷ (x) = x(Ω)′(X′X)-1 x(Ω).σ2 (3.74) An estimated standard error of ŷ (x) can be given by s yˆ = s x ( Ω )′′ ( X′X) −1 x ( Ω ) (3.75) where s is the root mean square error of the fitted response surface. Then, we use equation (3.75) to determine the confidence limits around a predicted response. The 100(1-α)% confidence interval on the mean response E(y/x) is given by ŷ (x) ± tα / 2,n − p s x ( Ω )′′ ( X′X) −1 x ( Ω ) (3.76) 3.4. SUMMARY The methodology of this research can be categorised as an exploratory research. It explores the potential of various nonlinear growth models and the used of heuristic methods to model three types of data. In this study three types of data set are considered to be analyzed using different approaches. The summary of the data analyzed is given in Table 3.2. This chapter also describes the functions used to measure the goodness of fit for each model tested. 109 Table 3.2: Summary of the data set types and research approaches considers in this study Type of data 1. Oil palm growth data Research approach 1. Nonlinear growth model 1. Multiple linear regression 2. Foliar composition 2. Robust M-regression 3. Neural network model 3. Fertiliser treatment data 1. Response surface analysis 110 CHAPTER 4 MODELLING OIL PALM YIELD USING NONLINEAR GROWTH MODEL 4.1 INTRODUCTION Predicting the future growth and yield of oil palm is an essential part of the planning process for oil palm industry. Growth is measured as change in some characteristics (weight, basal area, volume, etc) over some specified amount of time and yield actually is the amount of some characteristic that can be harvested per period. Growth may also be defined as an increase in size, number, frequency and others; as, the growth of trade, the growth of power, the growth of yield, etc. Growth can be something which has grown or is growing, anything produced, product, consequence, effect or result. It is also defined as the rate of increase in size per unit time (Alder, 1980; Seber and Wild, 1988; Garcia, 1988; 1989; Meade and Islam,1995; Lei and Zhang, 2004). Generally, the maximum amount of output that oil palm trees can yield at any time is the growth that has accumulated up to that time. Growth and yield can be measured in physical units or in value. A growth model predicts future values of certain outputs (Garcia, 1993), example, timber volume, crop yield, population growth, etc. The inputs and outputs are function of time. Alder (1980) stated that growth models attempt to predict directly the course over time of the quantities of interest (volumes, weights, mean diameter etc.). 111 Seber and Wild (1988) defined empirical growth curve is a scatterplot of some measure of size of an object or individual against time, T. Growth curves or growth models have been widely used in modelling yield and many studies have investigated the pattern of yield growth models for crops. In Chapter 2, we have given the detail reviews on the application of growth model for predicting crop yield. Mean while, Zuhaimy et al. (2003) and Azme and Zuhaimy (2004) have studied on tobacco yield in Malaysia. Growth curves were used to model and forecast the development of many different fields such as economic growth (Bass, 1969; Oliver, 1970; Heeler and Hustad, 1980), marketing and sales development (Chadda and Chitgopekar, 1971; Meade, 1984; Rao, 1985; Meade, 1985; Amstrong et al., 1987; Bewley and Fiebig, 1988; Lee et al., 1992; Meade and Islam 1995), human populations (Meade, 1988) and transportation growth (Tanner, 1978). In this chapter, we presented our work on employing growth models to predict oil palm yield at any time, T. There are similarities in characteristics between the growth model and oil palm yield. Both have inflection points (where growth rate is maximum), a stable or saturated level and fitted in sigmoid or concave curve shape. It is therefore possible to use growth model in modelling oil palm yield. The growth model methodology described in the chapter 3, has been widely used to model plant growth. Since the growth of living things are normally nonlinear (Richards, 1969) it is reasonable to explore the use of the nonlinear growth model to oil palm yield. In this chapter a nonlinear growth model is developed, and the use of partial derivatives of twelve nonlinear growth models is proposed. Growth studies in many branches of science have demonstrated that more complex nonlinear functions are justified and required if the range of the independent variable encompasses juvenile, adolescent, mature and senescent stages of growth (Philip, 1994). In the oil palm industry, there are only a few theoretical models formulated specifically for oil palm industry applications. The modelling the growth model in other disciplines and applications has revealed considerable potential for the modelling of fresh fruit bunch growth and oil palm yield. This is partly attributed to the fact that the statistical methodology used for fitting nonlinear models to oil palm yield growth 112 data is closely related to the mathematics of the models and has not yet been explored. 4.2 THE NONLINEAR MODEL For a nonlinear regression model φi = f(Ti, β) + εi (4.1) i = 1, 2, …, 19, where φ is the response variable, t is the independent variable, β is the vector of parameters βj to be estimated (β1, β2, …, βk), εi is a random error term, k is the number of unknown parameters. Some of the nonlinear models explored in this study are as follows; 1. Logistic φ ( t ) = α /( 1 + β exp( − κ T )) + ε 2. Gompertz φ (t ) = α exp(− β exp(−κT )) + ε 3. Von Bertalanffy φ ( t ) = [α 4. Negative exponential φ (t ) = α (1 − exp(−κT )) + ε 5. Monomolecular φ (t ) = α (1 − β exp(−κT )) + ε 6. Log-logistic φ (t ) = α /(1 + β exp( −κ ln( T ))) + ε 7. Richard’s 1− δ − β e − kT ⎡ ⎛ ⎢⎣ ⎝ ] 1 1− δ 1 ⎞⎤ φ (t ) = α / ⎢1 + β ⎜⎜ exp(−κT )) δ ⎟⎟⎥ + ε ⎠⎥⎦ 8. Weibull φ (t ) = α − β exp(−κT δ ) + ε 9. Schnute φ (t ) = (α + β exp(κT ) )δ + ε 10. Morgan-Mercer-Flodin φ (t ) = ( βγ + αT δ ) / γ + T δ + ε 11. Chapman-Richards 12. Stannard +ε 1 φ (t ) = α (1 − β exp(−κT )) 1−δ + ε φ(t) = α [1 + exp(− (β + κT δ ))]δ + ε Where α, β, κ and δ are unknown parameters, ε is random error and T is time. Nonlinear models are more difficult to specify and to estimate than linear models as 113 the solutions are determined using the iterative procedure (Ratkowsky, 1983; Draper and Smith, 1981) as discussed in Chapter 3. This study explores the use of partial derivatives of twelve nonlinear growth models and demonstrates the method of parameter estimation using experimental oil palm yield growth data. A nonlinear procedure (NLIN) in Statistical Analysis System (SAS) will be employed to estimate the model’s parameters (SAS, 1992). 4.3 THE METHOD OF ESTIMATION The estimators of βj’s are found by minimizing the sum squares error (SSerr) function (equation 3.4), and can be rewritten as follows, SSerr = 2 19 ∑ [φ i − f (t i , β)] (4.2) i =1 under the assumption that the εi is normal and independent with a mean zero and a common variance σ2. Since φi and ti are fixed observations, the sum of squares error is a function of β. Least squares estimates of β are values which when substituted into equation (4.2) will make the SSerr a minimum, and are found by differentiating the equation (4.2) with respect to each parameter and setting the result to zero. This provides the k normal equations that must be solved for β̂ . These normal equations take the form 19 ⎡ ⎤ i =1 ⎣⎢ ∂β j ⎦⎥ ∂f (t , β) ∑ {φ i − f (t i , β)}⎢ i ⎥ = 0 (4.3) for j = 1, 2, ..., k. When the model is nonlinear in the parameters, the normal equations cannot be solved directly. Consequently, for the nonlinear models considered in this study, it is impossible to obtain a closed form solution to the least squares estimate of the parameters by solving the k normal equations described in equation (4.3). Hence an iterative method must be employed to minimize the SSerr (Draper and Smith, 1981; Ratkowsky, 1983; Gallant, 1989). 114 The Marquardt iterative method is an estimator method, which represents a compromise between the Gauss-Newton method and the steepest descent method. It is a method that combines the best features of both while avoiding their most serious limitations (Seber and Wild, 1983). Due to this characteristic we decided to use the Marquardt method. The Marquardt iterative method requires specification of the names and starting values of the parameters to be estimated. This model uses a single dependent variable, and partial derivatives of the model with respect to each parameter (SAS, 1992). The usual statistical test which is appropriate in the case of a general linear model is in general, not appropriate when the model is nonlinear. One cannot simply use the F statistic to obtain conclusions at any stated level of significance (Draper and Smith, 1981; Ratkowsky, 1983). This study considers several procedures to test the goodness of fit for the nonlinear model, such as confidence interval of parameter estimates, asymptotic correlation matrix and residual analysis. The normality probability plot was carried out. We also consider the four measurements commonly use in any research on model fitting, namely, mean squares error (MSE), mean absolute error (MAE), correlation coefficient, r between actual values and estimated values, and mean absolute percentage error (MAPE) as describe in section 1.6.3. 4.4 PARTIAL DERIVATIVES FOR THE NONLINEAR MODEL Let the symbols of the parameters α, β, κ and δ, in the non-linear model be replaced by new symbols α0, α1, α2 and α3 respectively. The parameters for all models considered here are defined as follows: α0 is the asymptote or the potential maximum of the response variable; α1 is the biological constant; α2 is the parameter governing the rate at which the response variable approaches its potential maximum; and α3 is the allometric constant. The partial derivatives of the models with respect to each parameter ( ∂φ ∂α j ) are given in Table 4.1 to Table 4.4. The NLIN procedure in SAS (1992) requires the integral form (s) and the partial derivatives of the nonlinear models must be entered in the program using a valid SAS syntax. 115 Table 4.1: Partial derivatives of the Logistic, Gompertz and von Bertalanffy growth models Model and partial derivatives Logistic: φ(t) = α0 /(1 + α1 exp(−α2t))+ ε ∂φ ∂α 0 = 1/(1 + α1 exp(-α2t)) ∂φ ∂α1 = (-α0 exp(-α2t))/(1 + α1 exp(-α2t))2 ∂φ ∂α 2 = (α0α1t)/(1 + α1 exp(-α2t))2)(exp(-α2t)) Gompertz: φ (t ) = α 0 exp(−α 1 exp(−α 2 t )) + ε ∂φ ∂α 0 = exp(α1 exp(-α2t)) ∂φ ∂α1 = -α0 exp(-α1 exp(-α2t))(exp(-α2t)) ∂φ ∂α 2 = α0α1t exp(-α1 exp(-α2t))(exp(-α2t)) [ Von Bertalanffy: φ (t ) = α 0 ∂φ ∂α 0 ∂φ ∂α1 ∂φ ∂α 2 = (α 0 −α 3 [ )α0 (1−α 3 ) (1−α 3 ) − α 1 exp(−α 2 t ) ( = (− exp(−α 2 t ) /(1 − α 3 ) ) α 0 (1−α 3 ) [ = (α 1t /(1 − α 3 ) )(exp(−α 2 t ) ) α 0 ∂φ ∂α 3 − α 1 exp( −α 2 t ) (1−α 3 ) ) +ε 1 −1 (1−α 3 ) − α 1 exp(−α 2 t ) (1−α 3 ) ) 1 −1 (1−α 3 ) − α 1 exp(−α 2 t ) ( 0 1 1−α 3 ] ⎧⎪ ⎛ (1−α ) = ⎨exp⎜⎜ (1 /(1 − α 3 )) ln α 0 3 − α 1 exp(−α 2 t ⎪⎩ ⎝ {(ln(α ] ) ( ) ] 1 −1 (1−α 3 ) ( − α 1 exp(−α 2 t ) /(1 − α 3 ) − ln(α 0 ) α 0 1 −1 (1−α 3 ) ⎫⎪ ⎞ ⎟⎟ (1 − α 3 )⎬ * ⎪⎭ ⎠ (1−α 3 ) )/(α (1−α 3 ) 0 − α 1 exp(−α 2 t ) ))} 116 Table 4.2: Partial derivatives of the Negative exponential, Monomolecular, loglogistic and Richard’s growth models Negative exponential: φ (t ) = α 0 (1 − exp(−α 2 t )) + ε ∂φ ∂α 0 = (1-exp(-α2t)) ∂φ ∂α1 - does not exist ∂φ ∂α 2 = (α0t exp(-α2t)) Monomolecular: φ (t ) = α 0 (1 − α 1 exp(−α 2 t )) + ε ∂φ ∂α 0 = (1- α1 exp(-α2t)) ∂φ ∂α1 = (-α0 exp(-α2t)) ∂φ ∂α 2 = (α0α1t exp(-α2t)) Log-logistic: α0/(1 + α1 exp(-α2ln(t))) + ε ∂φ ∂α 0 = 1/(1 + α1 exp(-α2ln(t))) ∂φ ∂α1 = [α0 exp(-α2ln(t))][ 1 + α1 exp(-α2ln(t))]2 ∂φ ∂α 2 = [α0α1ln(t) exp(-α2ln(t))]/ (1 + α1 exp(-α2ln(t)))2 1 ⎡ ⎤ Richard’s: φ (t ) = α 0 / ⎢(1 + α 1 exp(−α 2 t )) )α 3 ⎥ + ε ⎣ ⎦ ∂φ ∂α 0 ∂φ ∂α1 ∂φ ∂α 2 −1 = (1 + α 1 exp(−α 2 t ) )α 3 −1 = (− α 0 / α 3 )(1 + α 1 exp(−α 2 t ) )α 3 −1 −1 = (α 0α 1t / α 3 )(1 + α 1 exp(−α 2 t ) )α 3 ∂φ ∂α 3 −1 (exp(−α 2 t ) ) −1 (exp(−α 2 t ) ) = α 0 (1 + α 1 exp(−α 2 t ) )α 3 ln(1 + α 1 exp(−α 2 t ) )α 3− 2 117 Table 4.3: Partial derivatives of the Weibull, Schnute and Morgan-Mercer-Flodin growth models Weibull: φ (t ) = α 0 − α 1 exp(−α 2 t α 3 ) + ε ∂φ ∂α 0 = 1.0 ∂φ ∂α1 = − exp − α 2 t α 3 ∂φ ∂α 2 ( )( ) = exp(− α t )α α ln(t )t ∂φ ∂α 3 ( ) = exp − α 2 t α 3 α 1t α 3 2 α3 1 2 α3 Schnute: φ (t ) = (α 0 + α 1 exp(α 2 t ) ) 3 + ε α ∂φ ∂α 0 ∂φ ∂α1 ∂φ ∂α 2 ∂φ ∂α 3 ( = α 3 (α 0 + α 1 exp(α 2 t ) ) α 3 −1 ) = (α 3 exp(α 2 t ) )(α 0 + α 1 exp(α 2 t ) ) α 3 −1 = (α 1α 3 t exp(α 2 t ) )(α 0 + α 1 exp(α 2 t ) ) α 3 −1 = (α 0 + α 1 exp(α 2 t ) ) 3 ln (α 0 + α 1 exp(α 2 t ) ) α Morgan-Mercer-Flodin: φ (t ) = α 0 − (α 0 − α 1 ) /(1 + (α 2 t )α 3 ) + ε ∂φ ∂α 0 ∂φ ∂α1 ∂φ ∂α 2 ∂φ ∂α 3 = 1 − (1 + (α 2 t )α 3 = (1 + (α 2 t ) −α 3 ( )( = α 3 (α 0 − α 1 )(α 3t )α 3 / α 2 ((1 + (α 2 t )α 3 ) 2 ( ) ( ) = (α 0 − α 1 ) ln(α 2 t )(α 2 t ) α 3 / α 2 1 + (α 2 t ) α 3 ) 2 118 Table 4.4: Partial derivatives of the Champan-Richard and Stannard growth models. Chapman-Richard: φ (t ) = α 0 (1 − α 1 exp(−α 2 t )) = (1 − α 1 exp(−α 2 t ) )1−α 3 ∂φ ∂α1 = (− α 0 /(1 − α 3 ) )(1 − α 1 exp(−α 2 t ) )1−α 3 ∂φ ∂α 3 +ε 1 ∂φ ∂α0 ∂φ ∂α 2 1 1−α 3 1 1 −1 = (α 0α 1t /(1 − α 3 ) )(1 − α 1 exp(−α 2 t ) )1−α 3 ( ) (exp(−α 2 t ) ) −1 (exp(−α 2 t ) ) 1 = α 0 /(1 − α 3 ) 2 (1 − α 1 exp(−α 2 t ) )1−α 3 ln (1 − α 1 exp(−α 2 t ) ) Stannard: φ (t ) = α 0 [1 + exp(− ((α 1 + α 2 t ) α 3 ))] α3 ∂φ ∂α 0 ∂φ ∂α1 ∂φ ∂α 2 ∂φ ∂α 3 = (1 + exp(− ((α 1 + α 2 t ) α 3 ))) −α 3 = α 0 exp(− (α 1 + α 2 t α 3 ))(1 + exp(− (α 1 + α 2 t α 3 ))) 1−α 3 = α 0 t exp(− ((α 1 + α 2 ) / α 3 ))(1 + exp(− ((α 1 + α 2 ) / α 3 ))) 1−α 3 α 0 [1 + exp(− ((α 1 + α 2 t ) / α 3 ))]−α ln[1 + exp(− ((α 1 + α 2 t ) / α 3 ))] * {− (((α 1 + α 2 t ) exp(− ((α 1 + α 2 t ) / α 3 ) )) / α 3 ) + ((α 1 + α 2 t ) / α 3 )} 3 = The Marquardt algorithm requires a starting value for each parameter to be estimated. Starting value specification is one of the most difficult problems encountered in estimating the parameters of nonlinear models (Draper and Smith, 1981). Inappropriate starting values will result in longer iteration, greater execution time, non-convergence of the iteration, and possibly convergence to unwanted local minimum sum squares residual. The simplest parameter to specify is the α0. This is attributed to the clarity of its definition. The parameter α0 is defined as the maximum possible value of the dependent variable determined by the productive capacity of the experimental site. Therefore, in our case α0 was specified as the maximum value of the response variable in the data. Furthermore, the α2 parameter is defined as the constant rate at which the response variable approaches its maximum possible value α0. For modelling biological growth variable, the allometric constant α3 lies between zero and one for the Chapman-Richards growth model and is also positive for the von 119 Bertalanffy, Richard’s, Weibull and Morgan-Mercer-Flodin growth models. Finally, α1 parameter can be specified by evaluating the models at the start of growth when the predictor variable is zero. The computation of the initial parameters estimate for logistic growth is given below. Consider logistic growth model; φ = α /( 1 + β exp( − κ T )) + ε (4.4) By taking natural logarithmas in equation (4.4), and setting η = ln φ and τ = lnα, we obtained η = τ − ln (1 + βeκT ) (4.5) Nonlinear estimation procedures require initial parameter estimates and the better these initial estimates are, the faster will be the convergence to the fitted values. In fact, experience with growth models shows that, if the initial estimates are poor, convergence to the wrong final values can easily occur. There is no general method for obtaining initial estimates. One uses whatever information is available. For example, for the logistic model (equation (4.4)), we can argue in this manner; (i) When T = ∞ , η = τ. So take τ0 = ymax (ii) For any two other observations, the ith and jth say, set ( yi = τ 0 − ln 1 + β 0e −κ 0Ti ) ( and y j = τ 0 − ln 1 + β 0 e −κ 0T j ) acting as though, equation (4.5) were true without error for these observations. Then, developing, we find that exp(τ 0 − y i ) − 1 = β 0 exp(−κ 0Ti ) and exp(τ 0 − y j ) − 1 = β 0 exp(−κ 0T j ) whereupon by division, taking natural logarithms, and rearrangement, we obtain κ0 = ⎡ exp(τ 0 − y i ) − 1 ⎤ 1 ln ⎢ ⎥. T j − Ti ⎢⎣ exp(τ 0 − y j ) − 1⎥⎦ In general i and j should be more widely spaced rather than otherwise, to lead to stable estimates. (iii) From the ith equation above we can evaluate β 0 = exp(κ 0Ti )[exp(τ 0 − y i ) − 1] 120 (iv) Substitution of τ0 = ymax in the two foregoing equations provides us with values for κ0 and β0 Initial estimates for fitting the model in equation (4.4) can be similarly obtained in the order α0, κ0 , β0 from the equations below; α0 = ymax, ⎡ (α − wi ) ⎤ β0 = ⎢ 0 ⎥ exp(κ 0Ti ) , ⎦ ⎣ α0 κ0 = ⎡α 0 − w j ⎤ 1 ln ⎢ ⎥, Ti − T j ⎣ α o − wi ⎦ where yi = ln(wi) and wi are growth observations, have constant variance is usually a sensible one for the case of growing yield. In fitting logistic growth model to oil palm yield growth data, we set α0 = 27.7, β0 = 0.55 and κ0 = 1.25 as the initial values. The convergence criterion met at the 20th iteration, which led α0 = 37.0806, β0 = 4.8148 and κ0 = 0.7817. The above steps ((i) to (iv)) were also applied to other growth models in determining the initial values. 4.5 RESULTS AND DISCUSSION The statistical significance of the parameters of the nonlinear models were determined by evaluating the 95% asymptotic confidence intervals of the estimated parameters. The null hypothesis H0: αj = 0 was rejected when the 95% asymptotic confidence interval of αj did not include zero. The 95% asymptotic confidence intervals for each growth model are presented in the last column(s) of Table 4.5 and Table 4.6. 121 Table 4.5: Parameter estimates of the logistic, Gompertz, negative exponential, monomolecular, log-logistic, Richard’s and Weibull growth model for yield-age relationship Model parameter Logistic Parameter estimated Asymptotic standard error 37.0806 4.8149 0.7817 0.5327 1.3115 0.1087 35.9514 2.0345 0.5511 38.2098 7.5952 1.0122 37.1788 2.2683 0.6132 0.5701 0.4265 0.0854 35.9703 1.3642 0.4321 38.3874 3.1724 0.7943 37.5017 0.4046 0.6643 0.0362 36.1001 0.3282 38.9033 0.4811 37.3235 1.1408 0.4592 0.6565 0.1367 0.0689 35.9317 0.8511 0.3130 38.7151 1.4305 0.6055 α0 α1 α2 38.1172 3.1947 1.8874 1.0667 0.8678 0.3245 35.8559 1.3549 1.1995 40.3785 5.0344 2.5754 α0 α1 α2 α3 37.0418 11.0433 0.8729 1.5205 0.5698 33.6452 0.4059 2.0391 37.0418 -60.6695 0.0076 -2.8257 38.2564 82.7561 1.7383 5.8667 α0 α1 α2 α3 37.3234 -5.2452 0.3415 1.3442 0.8887 8.9982 0.0906 0.0014 35.4291 -24.4245 0.1483 1.3411 39.2178 13.9339 0.5347 1.3472 α0 α1 α2 Asymptotic confidence interval lower upper Gompertz α0 α1 α2 Negative Exponential α0 α1 α2 Monomolecular α0 α1 α2 Log-Logistic Richard’s Weibull The least squares estimates for the parameters of the nonlinear models for oil palm yield-age relationship are given in Table 4.5 and Table 4.6. The parameter estimates for the logistic, Gompertz, negative exponential, monomolecular and Morgan-Mercer-Flodin growth functions and are all statistically significant at the 5% level. Estimates of α1 and α3 for the von Bertalanffy, Richard’s and Chapman- 122 Richard growths model are not statistically significant at the 5% level, because zero value is included in the confidence interval of the parameters estimated. The parameter estimates of the Weibull and Stannard growth models, except for α1, are statistically significant at 5% level. The convergence criteria meet using the Marquardt iteration procedure and all growth models have various number of iteration. The minimum iteration is 8 for the negative exponential growth model however Chapman-Richard growth model recorded the highest iteration at 43 iterations. (Appendix F). Table 4.6: Parameter estimates of the MMF, von Bertalanffy, Chapman-Richard and Stannard growth models for yield-age relationship Model parameter Parameter estimated Asymptotic standard error Asymptotic confidence interval lower upper 37.2032 11.5236 0.3534 3.4347 0.6724 2.5198 0.0355 0.8877 35.7700 6.1525 0.2776 1.5425 38.6365 16.8943 0.4292 5.3270 37.0416 -0.0455 0.8731 2.5203 0.5698 0.1979 0.4058 2.0388 35.8270 -0.4673 0.0080 -1.8254 38.2562 0.3763 1.7382 6.8661 α0 α1 α2 α3 35.8502 0.4927 0.4488 0.6155 0.8162 2.3322 0.1449 2.8893 34.1106 -4.4783 0.1397 -5.5430 37.5898 5.4637 0.7579 6.7740 α0 α1 α2 α3 37.0415 -1.5799 0.5743 0.6577 0.56598 0.2544 0.5236 0.8825 35.8269 -2.1222 -0.5417 -1.2232 38.2561 -1.0376 1.6904 2.5388 Morgan-Mercer-Flodin α0 α1 α2 α3 Von Bertalanffy α0 α1 α2 α3 Chapman-Richard (without initial stage) Stannard 123 Table 4.7: Asymptotic correlation for each nonlinear growth model fitted Model Asymptotic correlation Logistic (α0, α1) = -0.1743; (α0, α2) = -0.3631; (α1, α2) = 0.8863 Gompertz (α0, α1) = -0.2324; (α0, α2) = -0.4398; (α1, α2) = 0.8675 Von Bertalanffy (α0, α1) = -0.2911; (α0, α2) = -0.3891; (α0, α3) = -0.3073; (α1, α2) = 0.9248; (α1, α3) = 0.9970; (α2, α3) = 0.9496. Negative exponential (α0, α2) = -0.5911 Monomolecular (α0, α1) = -0.3532; (α0, α2) = -0.5552; (α1, α2) = 0.8536 Log-logistic (α0, α1) = -0.3162; (α0, α2) = -0.7245; (α1, α2) = 0.7799 Richard’s (α0, α1) = -0.3212; (α0, α2) = -0.3892; (α0, α3) = -0.3072; (α1, α2) = 0.9752; (α1, α3) = 0.9937; (α2, α3) = 0.9495. Weibull (α0, α1) = -0.3766; (α0, α2) = 0.2781; (α0, α3) = -0.9999; (α1, α2) = -0.9475; (α1, α3) = 0.3763; (α2, α3) = -0.2778. Morgan-Mercer-Flodin (α0, α1) = -0.2426; (α0, α2) = -0.0015; (α0, α3) = -0.5212; (α1, α2) = -0.7558; (α1, α3) = 0.6085; (α2, α3) = -0.5218. Chapman-Richard (α0, α1) = -0.7459; (α0, α2) = 0.4445; (α0, α3) = -0.6289; (α1, α2) = -0.6471; (α1, α3) = 0.9104; (α2, α3) = -0.2844. Stannard (α0, α1) = -0.0364; (α0, α2) = 0.2542; (α0, α3) = 0.3077; (α1, α2) = -0.6204; (α1, α3) = -0.5031; (α2, α3) = 0.9871. Asymptotic correlations are used to measure the correlation among the parameter estimated. In growth model, the parameters estimated are assumed not correlated (Draper and Smith, 1981; Ratkowsky, 1983). If the asymptotic correlation very high (near to plus and minus one), so that the models are not suitable for modelling the growth data. The asymptotic correlation of the parameters was 124 obtained after the iteration was converged. Table 4.7 presents the asymptotic correlation coefficient among the parameters estimated. All asymptotic correlation coefficients are relatively small, except for the von Bertalanffy’s growth model {(α1, α2)=0.9248; (α1, α3)=0.9970; (α2, α3)=0.9496}, the Richard’s growth model {(α1, α2) =0.9752; (α1, α3) =0.9937; (α2, α3)=0.9495} and the Weibull’s growth model {(α0, α3)=-0.9999; (α1, α2)=-0.9475}. When nonlinear models are fitted to a biological growth data set, the estimated parameters’ lack of statistical significance might imply one of the following: (i) One or more parameters in the model may not be useful, or more accurately, a reparameterised model involving fewer parameters might be more appropriate; (ii) The biological growth data used for fitting the model is not adequate for estimating all the parameters; or (iii) The model assumptions do not conform with the modeled biological system. 125 Table 4.8: The actual and predicted values of FFB yield, the associated measurement error and correlation coefficient between the actual and predicted values for Logistic, Gompertz, von Bertalanffy, negative exponential, mono molecular and log-logistic growth models FFB Year yield Logistic Gom Von Ber- Negative Mono- Log pertz talanffy Exponential molecular Logistic 1 11.78 11.58 10.88 11.91 12.48 10.42 9.09 2 18.43 18.46 19.11 18.28 20.81 20.33 20.46 3 25.21 25.37 25.93 25.12 26.36 26.59 27.19 4 30.78 30.62 30.59 30.61 30.07 30.54 30.90 5 33.03 33.81 33.45 33.98 32.54 33.04 33.05 6 35.66 35.51 35.11 35.68 34.19 34.62 34.38 7 36.96 36.35 36.04 36.46 35.29 35.61 35.26 8 37.97 36.74 36.56 36.79 36.03 36.24 35.86 9 38.04 36.92 36.84 36.94 36.52 36.64 36.28 10 39.20 37.01 37.00 37.00 36.85 36.89 36.60 11 36.50 37.05 37.08 37.02 37.06 37.05 36.84 12 37.21 37.07 37.13 37.03 37.21 37.15 37.03 13 39.97 37.07 37.15 37.04 37.31 37.21 37.18 14 38.45 37.08 37.16 37.04 37.37 37.25 37.30 15 33.65 37.08 37.17 37.04 37.42 37.28 37.40 16 34.71 37.08 37.17 37.04 37.44 37.30 37.48 17 37.75 37.08 37.18 37.04 37.46 37.31 37.55 18 32.81 37.08 37.18 37.04 37.48 37.31 37.60 19 37.99 37.08 37.18 37.04 37.48 37.32 37.65 MSE 2.96 3.15 2.94 4.06 3.72 4.64 MAE 1.22 1.35 1.22 1.61 1.53 1.72 MAPE 0.03 0.04 0.03 0.05 0.05 0.06 r 0.97 0.97 0.97 0.96 0.96 0.96 126 Table 4.9: The actual and predicted values of FFB yield, the associated measurement error and correlation coefficient between the actual and predicted values for Richard’s , Weibull, MMF, Chapman-Richard, Chapman-Richard* (with initial) and Stannard growth models FFB Chapman- Chapman- Year yield Richard’s Weibull MMF Richard Richard* Stannard 0 0 - - - - 0.92 - 1 11.78 11.91 10.43 12.23 13.43 10.05 11.91 2 18.43 18.28 20.33 17.51 20.01 19.51 18.28 3 25.21 25.12 26.59 25.65 25.09 26.33 25.12 4 30.78 30.61 30.54 31.21 28.71 30.71 30.61 5 33.03 33.98 33.04 34.02 31.18 33.38 33.98 6 35.66 35.68 34.61 35.40 32.82 34.97 35.68 7 36.96 36.46 35.61 36.11 33.90 35.91 36.46 8 37.97 36.79 36.24 36.50 34.60 36.45 36.79 9 38.04 36.94 36.64 36.73 35.05 36.77 36.94 10 39.20 37.00 36.89 36.87 35.34 36.95 37.00 11 36.50 37.02 37.05 36.96 35.52 37.06 37.02 12 37.21 37.03 37.15 37.02 35.64 37.12 37.03 13 39.97 37.04 37.21 37.07 35.72 37.16 37.04 14 38.45 37.04 37.25 37.10 35.76 37.18 37.04 15 33.65 37.04 37.28 37.12 35.80 37.19 37.04 16 34.71 37.04 37.30 37.14 35.82 37.19 37.04 17 37.75 37.04 37.31 37.15 35.83 37.20 37.04 18 32.81 37.04 37.31 37.16 35.84 37.20 37.04 19 37.99 37.04 37.32 37.17 35.84 37.20 37.04 MSE 2.94 3.72 3.20 6.19 3.41 2.94 MAE 1.22 1.53 1.37 2.27 1.45 1.22 MAPE 0.03 0.05 0.04 0.07 0.05 0.03 r 0.97 0.96 0.97 0.96 0.97 0.97 127 4.00 4.00 3.00 3.00 2.00 2.00 1.00 1.00 0.00 -1.00 0 5 10 15 0.00 -1.00 0 20 -2.00 -2.00 -3.00 -3.00 -4.00 -4.00 -5.00 5 10 15 20 Y ear Logistic Gompertz 4.00 3.00 4.00 3.00 2.00 1.00 2.00 1.00 0.00 5 10 15 20 -2.00 0.00 -1.00 0 5 10 -2.00 -3.00 -3.00 -4.00 -4.00 -5.00 -6.00 -5.00 Y ear Y ear Von Bertalanffy Negative exponential 4.00 4.00 3.00 3.00 2.00 2.00 1.00 1.00 0.00 0.00 -1.00 0 20 -5.00 Y ear -1.00 0 15 5 10 15 20 -1.00 0 5 10 15 -2.00 -2.00 -3.00 -3.00 -4.00 -4.00 -5.00 -6.00 -5.00 Ye a r Y ear Monomolecular Log logistic Figure 4.1: Residual plot for the Logistic, Gompertz, von Bertalanffy, Negative exponential, Monomolecular and Log-logistic growth models 20 128 4.00 4.00 3.00 R e s id u a l 2.00 2.00 1.00 0.00 0.00 0 -2.00 5 10 15 20 -1.00 0 5 10 15 20 -2.00 -4.00 -3.00 -4.00 -6.00 -5.00 Year Ye a r Richard’s Weibull 4.00 6.00 3.00 2.00 4.00 1.00 2.00 0.00 -1.00 0 5 10 15 20 0.00 -2.00 0 -3.00 5 10 15 20 -2.00 -4.00 -5.00 -4.00 Ye a r Ye a r Morgan-Mercer-Flodin Chapman-Richard 4.00 4.00 3.00 3.00 2.00 2.00 1.00 1.00 0.00 0.00 -1.00 0 5 10 15 20 -1.00 0 -2.00 -2.00 -3.00 -3.00 -4.00 -4.00 -5.00 5 10 15 -5.00 Ye a r Chapman-Richard* Ye a r Stannard Figure 4.2: Residual plot for the Richard’s, Weibull, Morgan-Mercer-Flodin, Chapman-Richard, Chapman-Richard* and Stannard growth models 20 129 The argument in (ii) is also applicable with the von Bertalanffy and the Chapman-Richard growth models. The investigation of the differential forms and second derivatives of the von Bertalanffy and the Chapman-Richard models indicated that the functions are suitable to model a system that encompasses the entire range of cycles (i.e. juvenile, adolescent, mature and senescent stages) of the biological response variable. However, the FFB yield growth measurements considered in this study lack data on juvenile stages of growth. Hence, two of the parameters from the two models rendered insignificant. To support this argument we have included an initial data point (age = 0, FFB = 0) to the data and refitted the von Bertalanffy and the Chapman-Richard models for each parameter from each model. Table 4.10 shows the parameter estimates, asymptotic standard error and asymptotic 95% confidence intervals for each parameter of these two models. The predicted FFB value at initial age using Chapman-Richard model (with initial value) is 0.92 ton per hectare per year. Two of the parameters (α1 and α3) are statistically insignificant (based on asymptotic confidence interval of parameters estimated) if the initial data point is not included (Table 4.6). However, inclusion of the initial data point caused only the Chapman-Richard growth model to show statistically significant estimates of the three parameters. Meanwhile, the von Bertalanffy growth model did not show any improvement. Including of any additional data points from an early stage of growth will result in a significant improvement in the estimate of the parameters of the Chapman-Richard model. Table 4.9 also indicated that with initial values, the MAPE was reduced from 0.07 to 0.05. This clearly illustrates that the significance of the parameters of the Chapman-Richard growth model depends on the range of the growth data. 130 Table 4.10: The parameter estimates an asymptotic correlation for the von Bertalanffy and Chapman-Richard models when an initial growth response data point is added Model and parameter Parameter estimated Asymptotic standard error Asymptotic confidence interval lower upper Von Bertalanffy α0 α1 α2 α3 37.2017 5.5385 0.5498 0.4826 0.6140 9.5941 0.1238 0.3959 39.9001 -14.7999 0.2873 -0.3566 37.2036 0.8530 0.5498 0.4822 0.6256 0.3581 0.1265 0.1018 35.8773 0.0937 0.2816 0.3695 38. 5033 25.8769 0.8124 1.3218 Chapman-Richard α0 α1 α2 α3 Model Von Bertalanffy 38.5298 1.6122 0.8181 1.3340 Asymptotic correlation (α0, α1) = 0.1127; (α0, α2) =-0.4229 ; (α0, α3) = -0.2532 ; (α1, α2) = -0.8174; (α1, α3) = -0.9887; (α2, α3) = 0.8428. Chapman-Richard (α0, α1) = 0.2264; (α0, α2) = -0.4879; (α0, α3) = -0.2988; (α1, α2) = -0.6940; (α1, α3) = -0.9571; (α2, α3) = 0.8516. This study provide the statistical requirement for estimating parameters of nonlinear growth models, the statistical testing for the parameters estimate and interpretation of relevant statistical output from the perspective of oil palm. The NLIN procedure in SAS does not guarantee that the iteration converges to a global minimum sum of squares residual (SAS, 1992). Hence, an alternative approach for avoiding the problem of non-convergence or convergence to an unwanted local minimum sum of squares residual is to specify values for each parameter. NLIN procedure then evaluates the residual of the sum of squares at each combination of values to determine the best starting values for the iteration process. Initial values may be intelligent guesses or preliminary estimates based on available information. Initial values may, for example, be values suggested by the information gained in 131 fitting a similar equation in a different place. Based on a meaningful biological definition of the parameters of the nonlinear models, expressions to specify initial values for the asymptote and the biological constant were developed. These expressions were found useful to specify initial values of the parameters for modeling the sample FFB yield data used in the study. 4.6 CONCLUSION It is important to note that some of the models such as the negative exponential and monomolecular have no point of inflection (that is, there is no change in sign of the second derivative for any T, and it climbs steadily at a decreasing rate) and are not of sigmoid shape (Draper and Smith, 1981). Hence, the models are not appropriate for modelling the entire life cycle range of biological response variables such as oil palm yield growth that exhibit a sigmoid pattern over time (reason (iii)) as in the previous section). This study found that the Gompertz, logistic, log-logistic, Morgan-Mercer-Flodin and Chapman-Richard growth models have the ability and are suitable for quantifying a growth phenomenon that exhibits a sigmoid pattern over time (Draper and Smith, 1981; Ratkowsky, 1983). Based on the statistical testing and root mean squares error (Table 4.11), the model in the first rank is the Logistic model, the second rank is the Gompertz model, followed by the Morgan-Mercer-Flodin, the Chapman-Richard (with initial stage) and the Log-logistic growth models. However, the von Bertalanffy, Richard’s, Weibull and Stannard growth models were found not statistically significance to fit the oil palm yield growth data. 132 Table 4.11: The number of iteration and the root mean squares error for nonlinear growth models consider in this study Model Number of Root Mean squares error iterative Logistic 20 1.7204 Gompertz 22 1.7748 Von Bertalanffy 36 1.7146 Negative exponential 8 2.0149 Monomolecular 26 1.9287 Log logistics 22 2.1541 Richard’s 26 1.7146 Weibull 18 1.9287 Morgan-Mercer-Flodin 21 1.7888 Chapman-Richards 34 2.4879 Chapman-Richards (initial 43 1.8466 42 1.7146 stage) Stannard 133 CHAPTER 5 MODELLING OIL PALM YIELD USING MULTIPLE LINEAR REGRESSION AND ROBUST M-REGRESSION 5.1 INTRODUCTION Multiple linear regression is being used widely in research in determining the linear relationship between factors. Earlier researches believe that foliar nutrient composition has a significant correlation with oil palm yield, but the data used has not been analyzed in detail using the proposed method. We consider the multiple linear regression method as one way to understand the relationship between foliar nutrient composition and oil palm yield. In this study, we also proposed the use of the foliar nutrient composition ratio (NBR) in leaves as the independent variable. This modelling approach act as the preliminary study which will further enhanced understanding of the issues in further motivate research in modelling of oil palm yield. 5.2 MODEL DEVELOPMENT Let us consider that the data consists of n observations on a dependent or endogenous variable y and five independent or exogenous variables N, P, K, Ca and Mg. The observations are usually represented as follows; 134 Obs. No. y N P K Ca Mg 1 y1 N11 P21 K31 Ca41 Mg51 2 y2 N12 P22 K32 Ca42 Mg52 3 y3 N13 P23 K33 Ca43 Mg53 : : : : : : : : : : n yn N1n : P2n : K3n : Ca4n : Mg5n The relationship between the dependent and independent variables are formulated as a linear model; yi = θ0 + θ1N1i + θ2P2i + θ3K3i + θ4Ca4i + θ5Mg5i + εI (5.1) where θ0, θ1, θ2, θ3,θ4, θ5 are regression coefficients and εi is the random disturbance. It is assumed that for any set of fixed values of independent variables that fall within the range of the data, the linear equation (5.1) provides an acceptable approximation of the true relationship between the dependent and independent variables. The θ’s are estimated by minimizing the sum of squares error as in equation (3.11) and equation (3.12). Then, the estimated value of θ’s can be obtained using equation (3.13). The standard procedure in regression analysis such as confidence interval of parameter estimated, normal probability plot and error distribution was then undertaken as mentioned in Chapter 3. In Chapter 1, we discussed the importance of the nutrient balance ratio (NBR) in the foliar composition to the FFB production. Thus, we proposed to use NBR, critical leaf phosphorus concentration, CLP, K deficiency and Mg deficiency as the independent variables to estimate the FFB production using multiple regression analysis. Then we consider the regression equation as follows; yi = θ0 + θ1N1i + θ2P2i + θ3K3i + θ4Ca4i + θ5Mg5i + θ6N-P6i + θ7N-K7i +θ8N-Ca + θ9N-Mg9i + θ10P-K10i +θ11P-Ca11i + θ12P-Mg12i + θ13K-Ca13i + θ14K-Mg14i + θ15CaMg15i + θ16defK16i + θ17 defMg17i + θ18CLP18i + θ19TLB19i +εI (5.2) 135 for i = 1, 2, …, n, where θ0, θ1, …, θ19 are regression coefficients and εi is the random disturbance, N-P is the ratio between N and P, N-K is the ratio between N and K and so on; defK is the deficiency of K in the leaf, defMg is the deficiency of Mg in the leaf, CLP is the critical leaf P concentration and TLB is the total amount of bases in the leaf. Then we apply the stepwise procedure in regression to select the significant independent variables in the regression model. There are nineteen independent variables were considered in equation (5.2). In the next stage we analyze the same data set using the robust M-regression. Here we also consider equation (5.1). We perform the analysis using S-Plus 2000 under the robust MM procedure. The details of the robust M-regression are given in Chapter 3. 5.3 RESULTS AND DISCUSSION The results are presented into two sections; the first section will discuss the result from the MLR analysis, and the results from the robust M-regression will be discussed in the second section. The residual analysis was also performed in this section, and finally we conducted a comparative study between the MLR and RMR performance. 5.3.1 Multiple linear regression (MLR) As mentioned earlier, the first section introduced the use of the major foliar nutrient composition, N, P, K, Ca and Mg in the model. The stepwise procedure in MLR was applied in order to select the significant variables in the model. Table 5.1 lists the summarized regression equation for each station for the inland and coastal areas. 136 The coefficient of determination is a measure of how well the explanatory variables explain the response variable (Birkes and Dodge, 1993). By considering the R2 value as the indicator to the fit of the model, all stations recorded quite a low value of R2. For the inland area, the highest R2 is 0.422, recorded at ILD2, and the lowest was recorded at station ILD7, was 0.118. This indicates that 42.20% of the variability of the FFB yield about their predicted values is explained by the linear relationship between the FFB yield and the foliar nutrient composition at ILD2 station. The regression equation for station ILD2 is given as FFB = 5.707 - 53.642*Mg + 201.609*P - 6.298*K + 3.039*N There is a positive relationship between FFB and N and P concentration, and negative relationship between FFB yield and K and Mg concentration. The coefficient of P concentration recorded the highest value; it means that P concentration has a great influenced to the FFB yield. Meanwhile, for the coastal area, the highest R2 value, 0.687 was recorded at station CLD3 and the lowest value, 0.043 was recorded at station CLD6. This indicates that 68.70% of the variability of the FFB yield about their predicted values is explained by the linear relationship between the FFB yield and the foliar nutrient composition at CLD3 station. The regression equation for CLD3 station is written as below: FFB = -9.901 + 17.664*N - 18.550*Mg. There is a positive relationship between FFB yield and N concentration, but negative relationship with Mg concentration. The similar interpretation can be made for other stations. Detailed results of MLR can be referred to in Appendix G and Appendix H. 137 Table 5.1: The regression equation and R2 values for inland and coastal areas Inland Stations Station R2 Regression equation ILD1 -20.311 + 342.077*P - 14.697*Ca 0.392 ILD2 5.707 - 53.642*Mg + 201.609*P - 6.298*K + 3.039*N 0.422 ILD3 -30.657 + 21.679*N + 7.863*Ca - 7.738*K + 15.727*Mg 0.404 ILD4 -5.806 + 8.207*N + 11.621*Ca + 41.336*P - 4.932*K - 0.185 7.476*Mg ILD5 -15.529 + 12.466*N + 4.321*K 0.400 ILD6 26.527 - 14.099*Ca + 286.344*P – 14.127*N 0.317 ILD7 7.008 + 7.131*N - 37.198*P + 5.650*Ca 0.118 ILDT -1.007 + 6.782*N + 48.554*P - 11.788*Mg + 4.130*Ca 0.148 Coastal Stations CLD1 -44.721 + 30.526*K + 71.737*Mg + 198.541*P - 11.440*Ca 0.380 CLD2 0.189 + 22.750*Mg + 10.443*N - 7.509*Ca 0.171 CLD3 -9.901 + 17.664*N - 18.550*Mg 0.687 CLD4 18.262 + 12.794*Ca 0.050 CLD5 16.528 + 11.367*Ca - 10.294*N + 185.701*P 0.111 CLD6 40.988 - 12.548*Ca - 3.330*N 0.043 ILD7 6.810 + 66.648*P + 8.788*K - 25.001*Mg + 4.804*N 0.231 CLDT 8.998 + 128.949*P - 6.424*Mg 0.044 For station ILD1, the P and Ca concentration was found significantly affected by FFB yield. Only two stations recorded all the foliar nutrients’ concentration, which was included in the model for the inland areas. However there is no station in the coastal area, which was significantly affected by the concentration of all the nutrients. In general, the response of the foliar nutrient composition to FFB yield in the inland areas is better than in the coastal areas. It shows that the composition of the foliar nutrients created a greater influence and impact in the inland area than in the coastal area. We found that there was no consistent variable in the regression model; for example the variable which is statistically significant in station ILD1 is not significant at the other stations. The reason for different foliar nutrient 138 compositions in the model is very difficult to ascertain, but we believe it is most probably due to the soil factor. 5.3.2 Residual Analysis for MLR Least squares tests and estimates are optimal if the population of errors can be assumed to have a normal distribution. If the normality assumption is not satisfied, then least squares procedures are still valid but they may be far from optimal. Residual analysis was performed to investigate the distribution of error modelling. We found that the error distribution for all stations were scattered within the error mean. The normal probability plot was also performed in this study and the results are shown in Appendix I for the inland area and Appendix J for the coastal area. A number of plots and tests, based on residuals, have been developed for checking the normality of the errors, but here we mention only the normal probability plot. The standardised residuals are put an increasing order and are plotted against what their expected values would be if they came from a sample of n independent standard normal random variables. The plot should look nearly linear if the assumption of normality is valid. When a normal probability plot is very nonlinear, the data can sometimes be transformed so that normality is more closely approximated. . Figure 5.1 and Figure 5.2 display the scatter plot of error distribution for both inland and coastal stations. For ILD1 station, the errors scattered randomly within the mean. The normal probability plot showed that the line is linear (Appendix I). So, these procedures were performed to ensure that the normality assumption of errors is followed. Then, using the same procedure, others were obtained and found that all the plots have straight lines which clearly indicated that the error is normally distributed. 15 15 10 10 5 5 0 0 50 100 150 200 250 300 Error Error 139 0 0 -5 -5 -1 0 -1 0 -1 5 100 200 300 500 600 -1 5 O b se r v a ti o n O b se r v a ti o n ILD1 ILD2 15 20 10 15 5 10 0 0 50 100 150 200 250 300 350 Error Error 400 5 0 -5 0 100 200 300 400 500 600 700 -5 -1 0 -1 0 -1 5 -1 5 O b se r v a ti o n O b se r v a ti o n ILD4 15 15 10 10 5 5 0 0 50 100 150 200 250 Error Error ILD3 0 0 -5 -5 -1 0 -1 0 50 100 150 200 -1 5 -1 5 O b se r v a ti o n O b se r v a ti o n ILD5 ILD6 20 25 15 20 15 10 Error Error 10 5 0 0 100 200 300 -5 400 500 600 5 0 -5 0 500 1000 1500 2000 2500 -10 -10 -15 -15 -20 O b se rva tio n O b se rva tio n ILD7 ILDT Figure 5.3: The error distribution plots of the RMR model for the inland stations 3000 140 20 10 15 5 10 0 0 -5 0 50 100 150 200 250 Error Error 5 0 50 100 150 200 250 -5 -1 0 -1 0 -1 5 -2 0 -1 5 O b se r v a ti o n O b se r v a ti o n CLD2 10 15 8 10 6 5 4 0 2 -5 Error Error CLD1 0 -2 0 20 40 60 80 0 50 100 15 0 2 50 300 -1 5 100 -2 0 -4 -2 5 -6 -3 0 -8 O b se rva tio n O b se rv a tio n CLD4 CLD3 15 15 10 10 5 5 0 0 50 100 150 200 250 Error Error 20 0 -1 0 0 0 -5 -5 -10 -10 -15 100 200 300 400 500 60 0 -15 O b se rva tio n O b se rva tio n CLD5 CLD6 10 15 10 5 5 0 100 200 300 -5 400 500 600 0 Error Error 0 -5 0 500 1000 1500 2000 -1 0 -1 0 -1 5 -1 5 -2 0 -2 0 -2 5 O b se rv a ti o n CLD7 O b se rv a ti o n CLDT Figure 5.4: The error distribution plots of the RMR model for the coastal stations 2500 141 The second stage in this study is applying the MLR to the major nutrient compositions (MNC) and the nutrient balance ratio (NBR) as independent variables. The details of the F test for the model and the t tests for individual test are displayed in Appendix K and Appendix L. The regression equation and the R2 value for model fitting are displayed in Table 5.2. For stations ILD1 and CLD4, the variables in the models were not changed, but the R2 values had decreased from 0.392 to 0.379 and from 0.050 to 0.035. In station ILD2, the variables included in the model were changed from N, P, K and Mg to N, K, N-P ratio and N-Mg ratio. The R2 recorded at station ILD3 is 0.478. This indicates that 47.80% of the variability of the FFB yield about their predicted values is explained by the linear relationship between the FFB yield and N, K, N-P ratio, N-K ration and K-Mg ratio. The deficiency of K and Mg in foliar nutrient composition is examined significantly in the regression model at station ILD4, which recorded R2 value at 0.215. This means that the deficiency in K and Mg will reflect in the FFB yield production. Only two stations significantly recorded the effects of TLB to FFB yield specifically at station CLD3 and CLD6. Critical leaf phosphorus (CLP) displayed a positive response to FFB yield, except for station ILD7; in terms of the contribution, TLB has the largest weight compared to the other variables. Generally, by introducing the NBR in the regression model, only a slight improvement can be achieved in model fitting. On the other hand, we can say that the NBR information does not give us more information for the purpose of improving the modelling whereas the MNC information is sufficient for this purpose due to the complex interpretation of the NBR. 142 Table 5.2: The regression equation for inland and coastal stations using MNC and NBR as independent variables Inland Station Station Regression equation R2 ILD1 -20.311 + 342.077*P - 14.697*Ca 0.379 ILD2 4.150 + 130.174*P - 4.285*K - 0.952*N-P + 1.990*N-Mg 0.433 ILD3 19.393 + 35.888*N - 41.774*K + 0.712*N-P - 20.692*N-K - 0.478 0.484*K-Mg ILD4 -23.297 + 234.930*CLP - 14.829*K-Ca – 0.706*N-P + 0.215 1.019*Def-K + 0.347*Def-Mg ILD5 262.286 - 2621.417*P - 27.160*N-P + 3724.885*CLP + 0.474 5.735*Mg-Ca ILD6 64.016 - 3.054*N-P + 3.013*N-Ca 0.325 ILD7 10.776 - 43.857*P- 33.313*Mg + 188.723*CLP - 0.551*N-Ca - 0.132 0.236*N-Mg ILDT 7.622 + 8.214*N + 42.607*P - 24.641*Mg - 0.254*N-Mg - 0.168 0.591*N-Ca Coastal Station CLD1 -75.969 + 185.968*P + 47.134*K +1.358*Def-Mg – 0.388 19.911*Mg-Ca CLD2 -18.975 + 257.755*CLP + 0.338*Def-Mg - 1.589*N-K 0.181 CLD3 -18.173 + 356.146*CLP – 0.152*TLB 0.676 CLD4 18.262 + 12.794*Ca 0.035 CLD5 46.292 + 11.528*Ca – 1.658*N-P 0.108 CLD6 36.039 – 0.137*TLB 0.031 CLD7 -0.132 + 3.587*Def-K + 529.187*P + 4.341*N-P - 32.683*K-Ca 0.315 - 15.947*K-Mg - 2.769*Def-Mg + 84.381*Mg-Ca 530.956*CLP CLDT -64.453 - 24.419*N + 701.608*P - 63.647*Mg + 5.148*N-P 2.371*N-Mg 0.094 143 5.3.3 Robust M-Regression (RMR) The purpose of introducing robust M-regression in this study is to improve modelling accuracy. The quantile-quantile plot in Appendix M and Appendix N prove the existing of outlier observation in the data set. Barnett and Lewis (1996) defined an outlier in a set of data to be an observation (or set of observations) which appears to be inconsistent with the remainder of that set of data. Table 5.3 shows the results of the M-regression for all stations. The presence of outliers in the data set had influence on the model fitting and therefore the overall performance. By using the RMR, we found that the inland stations gave quite a high overall performance when compared to coastal areas. This result is similar to that of the MLR regression. The highest R2 value was recorded at station ILD2 (0.598), followed by station ILD1 (0.571), and the lowest R2 value recorded was 0.127 at station ILD7. The regression equation for ILD1 can be written as below; FFB = -16.790 + 331.546*P - 5.466*K - 19.296*Ca This indicates that the regression coefficients of P, K and Ca concentration are 331.546, -5.466 and -19.296, respectively. It shows that P concentration give the great influenced to the FFB yield. The R2 value corresponds to the variance explained by the independent variables in the model. For example, the concentration of P, Ca and Mg explain about 59.80% of the variance in the model and the rest can be are explained by unobserved variables at station ILD2. The regression line for ILD2 is; FFB = -5.5279 + 329.027*P – 6.7802*Ca –31.283*Mg As in ILD1 station, the P concentration also recorded the highest regression coefficient compared to Ca and Mg. It means that P concentration is the most important nutrient compared to other nutrient in order to produce FFB yield. 5.3.4 Residual Analysis for RMR As in MLR we also investigate the distribution of the error. Figure 5.3 displays the scattered plot for inland area. In general, the error was normally distributed. The distribution error plot for coastal area is shown in Figure 5.4. Using 144 the same procedure as in section 5.3.2 the explanations were obtained and found that the errors are normally distributed. The Q-Q plots for inland and coastal stations are presented in Appendix M and N. The Q-Q plot was performed for the outlier’s detection in the data set. The plot should look nearly linear if the assumption of normality is valid. All stations showed the plot nearly linear except for station CLD4, therefore further research should be made to investigate the problem. Those plots are not much different with the plot in MLR. Thus conclusion can be made that the model fitted is valid. Table 5.3: Regression equation using robust M-regression for inland and coastal areas Inland Station Station Regression equation R2 ILD1 -16.790 + 331.546*P - 5.466*K - 19.296*Ca 0.571 ILD2 -5.5279 + 329.027*P – 6.7802*Ca –31.283*Mg 0.598 ILD3 -29.839 + 25.6112*N – 92.915*P – 865*K + 7.2804*Ca + 0.381 22.893*Mg ILD4 -11.733 + 8.285*N + 95.255*P – 6.513*K + 10.069*Ca – 0.199 8.955*Mg ILD5 -24.345 + 11.6264*N 0.323 ILD6 22.876 – 13.974*N + 300.489*P – 12.764*Ca 0.313 ILD7 5.947 + 7.985*N – 47.678*P + 7.487*Ca 0.127 ILDT -7.850 + 6.115*N + 128.369*P + 7.274*Ca – 26.476*Mg 0.243 Coastal Station CLD1 -88.643 + 47.558*K + 113.074*Mg 0.389 CLD2 -4.914 + 15.754*N + 38.759*Mg 0.307 CLD3 -0.529 + 21.546*N 0.616 CLD4 98.730 – 21.838*N + 280.158*P – 26.190*K – 7.804*Ca – 0.225 56.851*Mg CLD5 23.527 – 12.813*N + 194.641*P + 13.364*Ca 0.115 CLD6 37.175 – 3.613*N – 12.729*Ca 0.049 CLD7 14.811 + 5.248*N + 58.013*P – 8.066*Ca 0.151 CLDT -7.368 + 7.535*N + 113.067*P 0.140 15 15 10 10 5 5 0 0 50 100 150 200 250 300 Error Error 145 0 0 -5 -5 -1 0 -1 0 -1 5 100 200 300 500 600 -1 5 O b se r v a ti o n O b se r v a ti o n ILD1 ILD2 15 20 10 15 5 10 0 0 50 100 150 200 250 300 350 Error Error 400 5 0 -5 0 100 200 300 400 500 600 700 -5 -1 0 -1 0 -1 5 -1 5 O b se r v a ti o n O b se r v a ti o n ILD4 15 15 10 10 5 5 0 0 50 100 150 200 250 Error Error ILD3 0 0 -5 -5 -1 0 -1 0 50 100 150 200 -1 5 -1 5 O b se r v a ti o n O b se r v a ti o n ILD5 ILD6 20 25 15 20 15 10 Error Error 10 5 0 0 100 200 300 -5 400 500 600 5 0 -5 0 500 1000 1500 2000 2500 -10 -10 -15 -15 -20 O bse rva tio n O b se rva tio n ILD7 ILDT Figure 5.3: The error distribution plots of the RMR model for the inland stations 3000 146 20 10 15 5 10 0 0 -5 0 50 100 150 200 250 Error Error 5 0 50 100 150 200 250 -5 -1 0 -1 0 -1 5 -2 0 -1 5 O b se r v a ti o n O b se r v a ti o n CLD2 10 15 8 10 6 5 4 0 2 -5 Error Error CLD1 0 -2 0 20 40 60 80 0 50 100 15 0 2 50 300 -1 5 100 -2 0 -4 -2 5 -6 -3 0 -8 O b se rva tio n O b se rva tio n CLD4 CLD3 15 15 10 10 5 5 0 0 50 100 150 200 250 Error Error 20 0 -1 0 0 0 -5 -5 -10 -10 -15 100 200 300 400 500 600 -15 O b se rva tio n O b se rva tio n CLD5 CLD6 10 15 10 5 5 0 100 200 300 -5 400 500 600 0 Error Error 0 -5 0 500 1000 1500 2000 -1 0 -1 0 -1 5 -1 5 -2 0 -2 0 -2 5 O b se rv a ti o n CLD7 O b se rv a ti o n CLDT Figure 5.4: The error distribution plots of the RMR model for the coastal stations 2500 147 5.4 CONCLUSION In this section, discussion will be focused on the performance between MLR(MNC) model and MLR(NBC) model. From that, we can see whether inclusion NBR, TLB, deficiency of K and deficiency of Mg as independent variables can improve the model R2 or not. Then the performance between MLR model and RMR will discuss. 0.6 0.5 0.4 R2 0.3 0.2 0.1 0 ILD1 ILD2 ILD3 ILD4 ILD5 ILD6 ILD7 ILDT Stations MLR(MNC) MLR(NBR) RMR Figure 5.5: The R2 value for each model proposed for inland area In Figure 5.5, the bar chart gave the value of R2 for different station. We compare the R2 value between MLR (MNC), MLR (NBC) and RMR models. In MLR (NBC) model, five out of seven inland stations recorded R2 values higher than MLR method. It shows that by including nutrient balance ratio as independent variables, the R2 values were increased. So that, the nutrient balance ratio also can be used to explain the behavior of oil palm yield. The third bar in the figure represents the R2 values for RMR method. By using RMR method, the R2 values were increase from 0.392 to 0.571 and from 0.422 to 0.598 at station ILD1 and ILD2 148 respectively. It also recorded at the combination data set, ILDT from 0.148 to 0.243. Thus we can deduce that RMR method manage to increase accuracy level of oil palm estimation. 0.7 0.6 0.5 R2 0.4 0.3 0.2 0.1 0 CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 CLDT Stations MLR(MNC) MLR(NBR) RMR Figure 5.6: The R2 value for each model proposed for coastal area For coastal area (Figure 5.6), the result shows that MLR (MNC) model and model MLR (NBC) model recorded approximately equal R2 values except at station CLD7. This station recorded the increment of R2 value from 0.231 to 0.315. It can be concluded that by entering nutrient balance ratio in the model does not effect the model accuracy. With RMR method, R2 values recorded higher than MLR method at five out of seven coastal stations. For example, at station CLD2 the value changes from 0.171 to 0.307 and at station CLD4 the value changes from 0.050 to 0.225. That’s mean RMR method can increase the ability in oil palm yield estimation. 149 This study found that statistical approach like regression model, the accuracy in the estimation is around 80 to 85% and the error of estimation is about 15 to 20%. This figure indicates that the accuracy percentage is quite low and the error estimation is still high. Even though the proposed method has an ability to increase the accuracy, must be some space we can explore to get the best model. The best model with higher accuracy to estimate oil palm yield is the major goal of this study. Thus we proposed to explore the most currently popular heuristic approach namely neural network model. It is because neural networks model offers the following useful properties and capabilities such as (i) non-linearity, an artificial neuron can be linear or nonlinear. A neural network, made up of an interconnection of nonlinear neurons, is itself nonlinear. (ii) input-output mapping. The network is presented with an example picked at random from the set, and the synaptic weights (free parameters) of the network are modified to minimize the difference between the desired response and the actual response of the network produced by the input signal in accordance with an appropriate statistical criterion. The training of the network is repeated for many examples in the set until the network reaches a steady state where there are no significant changes in the synaptic weights (Hykins, 1999). 150 CHAPTER 6 MODELLING OIL PALM YIELD USING NEURAL NETWORK MODEL 6.1 INTRODUCTION The neural network model is a causal model which is widely used to solve complex problems. This chapter discusses the implementation of the neural network approach to modelling oil palm yield. The implementation of this model using Matlab, a mathematical software, requires data preparation and calculation of the degree of freedom which is necessary for the neural networks architecture. This chapter also considers the various combinations of activation functions in input layer to hidden layer and hidden layer to output layer, and the conducting of the analysis of variance and multiple comparison using Duncan’s tests to analyse the neural network’s performance. Three experiments were conducted to investigate the effect of number of runs, number of hidden nodes, learning rate, momentum terms and outliers to the neural network’s performance. Results from this model shows that the neural network approach produces a good outcome. A comparative study was also conducted between multiple linear regression, robust M-regression and the neural network models, regarding their performance in oil palm yield modelling. 151 6.2 NEURAL NETWORK PROCEDURE The procedure to develop NN modelling requires two stages. These are important to ensure that the results and outcomes are valid and relevant. The two stages are data preparation and calculating the degrees of freedom. 6.2.1 Data preparation When the raw data has been collected, it may need converting into a more suitable format depend on the software requirements. At this stage, we should follow two stages, i.e. data validity checks and data partitioning. Data validity checks: Data validity checks will reveal unacceptable data that, if retained, would produce poor results. A simple data range check is an example of validity checking. For example, if we collected fresh fruit bunch data in tonnes per hectare per year, we would expect values to be far than zero and do not exceed 50 tonnes/hec/year. A value of -5 or 100 for instance, is clearly wrong. If there is a pattern to the distribution of faulty data, then the patterns cause is required to be diagnosed. Depending on the nature of the fault, we may need to discard the data altogether (Bansal et al., 1993). Data partitioning: Partitioning is the process of dividing the data into training sets, validation sets and testing sets. By definition, training sets are used to actually update the weights in a network, validation sets are used to decide the architecture of the network, and testing sets are used to examine the final performance of the network. The primary concerns should be to ensure that (i) the training set contains enough data, and a suitable data distribution to adequately demonstrate the properties we wish the network to learn; (ii) there is no unwarranted similarity between data in different data sets (Hornik et al., 1989). 152 6.2.2 Calculating the degrees of freedom A parametric analysis would be impossible without a discussion of the degrees of freedom (df) of the network. Because networks have historically been limited to nonparametric modelling, this topic has been conspicuously absent from the literature (Ripley, 1996). In any parametric analysis, the number of degrees of freedom is defined as the number of observations minus the number of parameters that are free to vary (Gujarati, 1988; Adam, 1999). If N represents the number of observations and k the number of estimated parameters (or the weight of neural network connectionist from input layer to hidden layer and from hidden layer to output layer), then the degrees of freedom, df, can be calculated using df = N – k (6.1) This approach to the degrees of freedom can be applied directly to the feedforward neural network if the network has only a single output. In this case, let k represent the number of estimated parameters. These estimated parameters include not only the connection weights that feed into the output and the output’s intercept parameter, but also the connection weights that interconnect the hidden layers. It also calculates the bias weights that correspond to each of the hidden layers’ transformation nodes. So, the numbers of parameters estimated in the feed-forward neural networks with one hidden layer are calculated as k = nh (ni + 2) + 1 (6.2) where nh is the number of hidden node and ni is number of input nodes. A necessary condition in any parametric model is that the number of available degrees of freedom must be positive. This constraint imposes an upper limit on the size of the network. If there are N observations, then the maximum size of the hidden nodes can be calculated using nh (max) = N −1 ni + 2 (6.3) As shown in Figure 6.1, the input nodes are nitrogen concentration N, phosphorus concentration P, potassium concentration K, calcium concentration Ca and magnesium concentration Mg and the output node is FFB yield which is measured in tonnes per hectare per year. 153 This was needed to transform the actual data into the range (0, 1) when the activation function was applied. Thus we use formula Zi = ( xi − xmin ) + 0.01 to ( xmax − xmin ) + 0.01 transform the input and output value (Hsu, 1993). The value of 0.01 added in the formula is to ensure the value transformation will not zero. The actual value can easily be transformed by manipulation from the above equation as xi = xmin + Z i (( xmax − xmin ) + 0.01) − 0.01 . In this study, we used a randomized procedure to avoid the site plot effects or bias (Hsu, 1993). 6.3 COMPUTER APPLICATION In this study we ran neural networks using the neural networks toolbox in Matlab package 6.15. Matlab’s build in procedure provides a very simple neural networks program. The user only has to write a simple program to call the neural networks built-in procedure. Each procedure has its own specific name. In the first part we only considered N, P, K, Ca and Mg concentration from foliar analysis as input nodes, and FFB yield as the output node (Figure 6.1). The number of hidden nodes varies from one station to another because of the different number of observations. The maximum number of hidden nodes is obtained from equation (6.3) to ensure that the degrees of freedom of the model is always positive. Input layer ⎧N ⎪ ⎪ ⎪ P ⎪ ⎪ ⎪K Foliar composition ⎨ ⎪ ⎪ Ca ⎪ ⎪ ⎪Mg ⎪ ⎩ Hidden layer Output layer O O O FFB yield O Figure 6.1: Three layers of fully connected neural network with five input nodes and one output node 154 In our case we consider the fully connected feed-forward neural network and supervised neural networks because we have input and target data set as shown in Figure 6.1. We also assume that all the inputs have a significant influence on the production of oil palm yield. We start the network with a small number of hidden nodes, which are added one by one until the maximum number of hidden nodes, which is defined from equation (6.3), is reached. The first step in training a feed-forward network is to create the network object. The function newff creates a trainable feed-forward network (Hagan et al., 1996). The user should determine the transfer function in the first and second layers as presented in Chapter 3. When the transfer function was obtained and the command newff used, the network was ready for training. Before training a feedforward network, the weights and biases must be initialized. The initial weights and biases are created with the command init. This function takes a network object as input and returns a network object with all weights and biases initialized. For feedforward networks, the weights’ initialization is usually set to random (rands), which sets weights to random values between –1 and 1. Once the network weights and biases have been initialized, the network is ready for training. The network can be trained for function approximation, pattern association or pattern classification. This study considers the training networks for function approximation. The training process requires networks inputs Input and target outputs Target. During training, network’s weights and biases are iteratively adjusted to minimize the network performance function using mean squares error, mse until the maximum number of hidden nodes which obtained from equation (6.3). The training algorithm used in our study is Leverberg-Marquardt (trainlm) because this algorithm appears to be the fastest method for training moderate-sized feedforward neural networks, and it is also very efficient (Hagan et al., 1996). An example of the algorithm is given in Appendix O. An early stopping technique was used to avoid overtraining (where the mse values will constant) the neural networks and to improve generalization of the networks. This technique requires the data to be divided into three sets of data. The 155 first set is the training set. This is used to compute the gradient and update the network weights and biases. The second set of data is the validation set. The error on the validation set is monitored during the training process. The validation error will normally decrease during the initial phase of training. However, when the network begins to overfit the data, the error on the validation set will typically begin to rise. When the validation error increases for a specified number of iteration, the training is stopped, and the weights and biases at the minimum of the validation error are returned as shown in Figure 6.2. Figure 6.3 presents the MSE value for each phase. The MSE values will decreases vigorously when the number of epoch is less then five. It remains consistent until epoch fifteen, when the training stops. We divided the data into 3 sets, the training set, validation set and testing set of data with ratio 70%, 15% and 15% respectively (Zhang et al., 2001). The correlation coefficient was used as a method of measuring the correlation between the actual value and the predicted value. When the correlation value approaches one, it shows that the actual and predicted values are close, and that the model fits the data well. Let X and X̂ be the actual value and predicted value from the specified model, and σ X2 and σ X̂2 be the variance of the actual observation and predicted observation. X and X̂ are the mean actual observation and mean predicted. So, the correlation coefficient between the actual and predicted values is defined as n ( X i − X )( Xˆ − Xˆ ) i =1 σ X2 .σ X2ˆ r=∑ (6.4) Matlab also provides the graphically best fitted line between the actual and target values, as shown in Figure 6.4. 156 Figure 6.2: The early stopping procedure for the feedforward neural network Training (blue = training, green = validation and red = testing) Figure 6.3: The mean squares error for training, validation and testing 157 Figure 6.4: The correlation coefficient between the actual (A) and predicted (T) values 6.4 EXPERIMENTAL DESIGN FOR NEURAL NETWORK The objective in this section is to investigate the effect of the number of hidden nodes, the number of runs, momentum terms and learning rate to the NN performance. We also performed an experiment to investigate the effect of outliers data on the neural network performance. We must first clarify how the experiment was designed. In the first stage, we considered three factors, which from the literature should produce a significant effect in NN performance, i.e. number of hidden nodes HN, number of runs NR, and momentum rate MR. The number of hidden nodes has eight levels, between 3 and 10, there are six levels for the number of runs, (3, 5, 7, 10, 15, and 20) and four levels for the momentum terms, which are 0.25, 0.5, 0.75, and 0.95. 158 We combined the information from fertilizer trials (nitrogen (N), phosphorus, (P), potassium (K), and magnesium (Mg) fertilisers), and information from foliar analysis the nitrogen concentration, phosphorus concentration, potassium concentration, calcium concentration, and magnesium concentration. Therefore, this neural network architecture involves nine inputs, and one output as shown in Figure 6.5. We conducted two experiments known as experiment 1 and experiment 2. Experiment 3 was conducted purposely to investigate the outliers effects on neural network performance. Input layer ⎧N ⎪ ⎪ ⎪ P ⎪ ⎪ ⎪K Foliar composition ⎨ ⎪ ⎪ Ca ⎪ ⎪ ⎪Mg ⎪ ⎩ Hidden layer Output layer O O O O FFB yield Fertilizer trial ⎧N ⎪ ⎪ ⎪ P ⎪ ⎨ ⎪K ⎪ ⎪ ⎪⎩Mg O O O Figure 6.5: Three layers of fully connected neural network with nine input nodes and one output node 159 6.4.1 Experiment 1 This experiment considers three factors, namely hidden nodes, number of runs and momentum term. This was carried out by changing the level of one factor and assuming that two factors are fixed to run the networks. We then changed them from one level to next level, recorded the error in estimation for each phase (training, validation and testing) then calculated the average of the error. We used the correlation to measure the model’s performance. As an example, suppose that the hidden node was set to three and number of runs was also three, and that the momentum term is 0.25. Now we can write the experiment as [3, 3, 0.25]. The first value represents the number of hidden nodes, (HN), the second column represents the number of runs, (NR), and the last value represents the momentum term, (MT). In general the experiment can be written as [HN, NR, MT]. The momentum term level will increase and the process is repeated for each factor until the maximum value at [10, 20, 0.95] is reached. 6.4.2 Experiment 2 In the second experiment, we changed the momentum term with the learning rate, and we set the momentum term at random. We still used the two factors of the number of hidden nodes and the number of runs. The experiment then included the number of hidden nodes HN, number of runs NR, and learning rate, LR and could be written as [HN, NR, LR]. We considered the number of hidden nodes as having seven levels, i.e. 2, 4, 6, 8, 10, 12, and 14, and n levels of runs i.e. 3, 5, 7, 9, 11, 15, and 20. The learning rate had five levels, i.e. 0.05, 0.15, 0.25, 0.45, 0.65, and 0.95. We repeated the process as in the first experiment until the maximum levels for each factor were obtained. 160 6.4.3 Experiment 3 Outliers are observations that may influence the overall analysis, as it will decrease the correlation values and increase the variance. It is therefore essential to analyses the existing of outliers in our study. Outliers in a set of data will influence the modelling accuracy as well as the estimated parameters especially in statistical analysis as discussed by Hampel (1974), Andrew (1974), Rousseeuw and Leroy (1987), Birkes and Dodge (1993), Mokhtar (1994) and Azme and Mokhtar (2004). Barnett and Lewis (1995) and Rousseeuw and Leroy (1987) defined an outliers in a set of data to be an observation or subset of data which appears to be inconsistent with the remainder of that set of data. Reviews shows that no extensive research was conducted on the influence of outliers in neural network modelling. Klein and Rossin (1999a and 1999b) investigated the effects of data errors in neural network modelling and found that neural network performance is influenced by errors in the data set. Huber (1981) suggested that an observation is defined as outliers if its values are outside the range µ ± 1.5 σ̂ , where µ is the average of the data and σ̂ is the estimated standard deviation from the data set. This study examines the effects of outliers on the application of neural network models to the analysis of oil palm yield data. This study considers two factors namely the percentage-outliers (P-O) and the magnitude-outliers (M-O). The percentage-outliers are the percentage of the data in the appropriate section of the data set, which are perturbed. The magnitude-outliers are the degree to which the data deviate from the estimated mean. In this study, we considered five input variables and one output variable and 243 data for each variable, then total numbers of observations is 1458. Six levels of percentageoutliers factors were considered. They are the observations at 5%, 10%, 15%, 20%, 25% and 30%. The 5% outliers’ level means that the data set will contain 72 outliers. Therefore, the 10% level indicates 144 observations, the 15% level indicates 216 observations, the 20% level indicates 288 observations, the 25% level indicates 360 observations and the 30% level indicates 432 observations. This study suggests five levels of magnitude-outliers namely µ ± 2.0 σ̂ , µ ± 2.5 σ̂ , µ ± 3.0 σ̂ , µ ± 3.5 σ̂ and µ ± 4.0 σ̂ . The observations were selected randomly and replaced 161 uniformly with outliers. For each level of percentage-outliers and magnitude-outliers mentioned above, the number of hidden nodes increased from five to thirty and the MSE values were recorded. 6.5 RESULTS AND DISCUSSION The results of NN modelling will be discussed in four sections. Section 6.5.1 is a statistical analysis. Here, an analysis of variance (anova) will be used to compare the model performance between the different activation functions used in the model. Section 6.5.2 will discuss the neural network performance and the selection of the best architecture for NN for each station in the inland and coastal areas. Section 6.5.3 will elaborate on the residual analysis. The experiment’s design of factors effect to NN performance and outlier data will be discussed in Section 6.5.4. Here the discussion includes the effects of number of runs, number of hidden nodes, momentum terms and learning rate on the NN performance. 6.5.1 Statistical Analysis In this section, we will discuss the effect of the combination activation function in the hidden layer and output layer to the NN performance. Six combination activation functions were considered, namely logsigmoid and logsigmoid (LL), logsigmoid and purelin (LP), logsigmoid and tansigmoid (LT), tansigmoid and purelin (TP), tansigmoid and logsigmoid (TL) and tansigmoid and tansigmoid (TT). We ran all the combinations and recorded the mean squares error for each number of hidden nodes. As mentioned before, the NN was divided into three phases; training, validation and testing. We take the average of the MSE to study the overall performance of NN architecture. For the purpose of this study, we considered MSE as the testing variable of each phase. We also tested the correlation between the predicted and observed values. We are interested to test whether all combination activation functions will produce equal MSE values. So that, two hypotheses were tested; the null 162 hypothesis, H0: all the MSE forming for each combination activation functions are equal and the alternative hypothesis, Ha: at least one combination activation function is not equal. In this case, the dependent variable was MSE value and the independent variables were the combinations of activation function. For further explanation, station ILD1was considered as an example. For each combination activation function, the NN was running using 2 hidden nodes to 30 hidden nodes and the MSE values and correlation coefficient for each phase were recorded. The numbers of hidden node depended on the maximum number of hidden nodes which was obtained from equation (6.3). Then, rearranged the MSE values and performed the analysis of variance. As a standard procedure, the F statistic was used to test the null hypothesis. The results of the test for the inland area are presented in Table 6.1. The F values for training, validate, testing and average are 3.368, 17.997, 12.055 and 10.729, respectively, and found statistically significant at 5% level and (5, 198) degrees of freedom. Then, by using the same procedure, others station were obtained. As we can see, almost all the stations showed the p-value less than 5% corresponded with their degrees of freedom, which signified rejection of the null hypothesis, except for stations ILD6, ILD7 and the ILDT. At station ILD6 the average MSE for different combination activation functions can be assumed equal as it was also found in the training and correlation results at station ILD7. Table 6.1: The F statistics values for different combination activation functions used at inland station Station The F-value Training Validate Testing Degrees of Average Correlation freedom ILD1 3.368* 17.997* 12.055* 10.729* 3.062** (5, 198) ILD2 7.850* 15.516* 14.949* 10.091* 1.601 (5, 264 ) ILD3 2.736** 12.899* 7.951* 4.431* (5, 240) ILD4 2.291** 15.055* 10.452* 13.700* 3.058** (5, 265) ILD5 1.859 2.306** 42.523* 16.715* 2.519** (5, 168) ILD6 3.132** 22.047* 4.755* 0.927 4.606* (5, 132) ILD7 1.766 13.455* 7.028* 5.812* 0.916 (5, 264) ILDT 0.853 11.736* 12.742* 3.732* 0.613 (5, 264) 1.924 Note: * significant at 1% level; ** significant at 5% level 163 Table 6.2: The F statistics values for different combination activation functions used at coastal station Station F-value Degrees of Average Correlation freedom Training Validate Testing CLD1 0.847 11.091* 18.495* 15.103* 6.454* (5, 180) CLD2 3.664* 7.724* 9.166* 10.197* 2.403* (5, 192) CLD3 3.265** 3.762* 1.918 2.905** 4.017* (5, 54) CLD4 1.295 12.413* 6.006* 8.957* 5.272* (5, 222) CLD5 1.524 10.218* 7.112* 1.615 2.139 (5, 180) CLD6 3.232* 6.523* 10.145* 7.518* 6.146* (5, 264) CLD7 1.366 5.092* 3.354* 2.865** 5.911* (5, 264) CLDT 2.794** 8.083* 35.022* 13.037* 2.047 (5, 264) Note: * significant at 1% level; ** significant at 5% level We also performed the MSE testing using the nonparametric test, when normality assumptions are not required. Here the statistical test is using χ2 (Lehman, 1998). The results are displayed in Table 6.3. Almost all the Chi-square tests were statistically significant at 1% and 5% level at 5 degrees of freedom in the inland and coastal areas. We found the both tests gave similar conclusions. The Duncan test was performed using the average of MSE. Table 6.4 gives details of similarities and differences between the combination activation functions. For each station, the test shows a different result. This means that the NN performance depends on the data site characteristic. Station CLD2 stated that the LL activation function gives the smallest average of MSE, compared to the LP and others. If we look at the value itself, the differences between combination functions are rather small. The fluctuation of MSE occurs when the number of hidden nodes is added. Station CLD7 gave an interesting situation, where 5 combination activation functions performed more comparably than with the TP combination. This also happened at station ILD3 in the inland area. According to the findings, no generalization can be made on the selection of the combination activation function, and we suggest that the ‘trial and error’ method is a good alternative in selecting the best combination. 164 Table 6.3: The Chi-Square values of MSE testing for the inland and coastal stations The Chi-Square value Station Training Validation Testing Correlation Average Inland Stations ILD1 18.873* 89.812* 50.565* 62.292* 17.268* ILD2 38.581* 80.377* 123.511* 77.975* 9.615 ILD3 13.002** 75.554* 16.942* 18.634* 20.476* ILD4 28.548* 84.575* 62.190* 68.517* 19.074* ILD5 15.013** 18.960* 116.457* 86.067* 14.342** ILD6 14.432** 69.147* 30.352* 14.223** 19.550* ILD7 6.888 59.872* 45.036* 23.096* 5.646 ILDT 7.949 94.111* 57.421* 40.484* 11.315** Coastal Stations CLD1 5.245 49.758* 65.279* 48.697* 28.536* CLD2 19.341* 44.742* 47.265* 52.928* 9.819 CLD3 16.727* 14.322** 7.375 11.229** 20.495* CLD4 6.369 51.979* 41.195* 55.883* 29.942* CLD5 6.150 36.313* 69.254* 30.774* 11.038* CLD6 16.458* 27.999* 43.931* 23.837* 31.612* CLD7 6.902 63.029* 13.042** 18.063** 9.119 CLDT 16.602* 39.407* 104.941* 64.055* 23.134* Note: * significant at 1% level; ** significant at 5% level; df =5 165 Table 6.4: Duncan test for the average of MSE for homogeneous subsets for the inland and coastal areas Station Combination activation function Inland stations ILD1 TL TT LP TP LL LT ________________________ ILD2 LT LL TL LP TT TP ________________________ ILD3 TL LT TP TT LL LP ILD4 TL LT LL TT TP LP ILD5 LL TT TP TL LT LP ________________ ILD6 ILD7 Not Significant LL TT LT TL LP TP ______________ ILDT TL LT TP LL TT LP Coastal stations CLD1 TL LT TP LL LP TT _________________________________ CLD2 LL LT TL TT TP LP CLD3 TL LP LL TP LT TT __________________________________ CLD4 TT LL TL LT TP LP _________________ CLD5 Not significant CLD6 LL TL TT LT LP TP CLD7 LL TL LT TT LP TP CLDT TL LT TT LL TP LP _______________ _______________ 166 6.5.2 Neural Network Performance The best models in determining the hidden nodes were selected based on the minimum average of MSE and are represented in Table 6.5 for the inland and Table 6.6 for the coastal area. The values in brackets [ ] represent the number of hidden nodes, and we found that the optimum number of hidden nodes varies from the stations to station. The mean squares error for each model and every phase (training phase, validation phase and testing phases) are shown in Table 6.5 and Table 6.6. The average of MSE is in the range of 0.0200 to 0.0372 (Table 6.5) and the combination station ILDT recorded the lowest MSE value (0.0049). The MSE values in the training phase are slightly lower compared to validation and testing phases. The average MSE values for the coastal area are in the range from 0.0220 to 0.04770, and CLDT recorded the highest value (0.0520) in Table 6.6. Table 6.5: Mean squares error for training, validation, testing and average of the neural network model in the inland area Mean squares error estimate Station/Model Training Validation Testing Average ILD1: LL [24] 0.0176 0.0326 0.0372 0.0292 ILD2: LL [27] 0.0173 0.0207 0.0234 0.0200 ILD3: TP [24] 0.0177 0.0351 0.0444 0.0304 ILD4: TP [20] 0.0286 0.0313 0.0259 0.0286 ILD5: LL [13] 0.0136 0.0225 0.0247 0.0202 ILD6: TT[16] 0.0179 0.0312 0.0366 0.0285 ILD7: LL [38] 0.0354 0.0391 0.0373 0.0372 ILDT: LP [23] 0.0042 0.0057 0.0048 0.0049 167 Table 6.6: Mean squares error for training, validation, testing and average of the neural network model in the coastal area Mean squares error estimate Station/Model Training Validation Testing Average CLD1: TL [16] 0.0312 0.0542 0.0436 0.0430 CLD2: LP [23] 0.0135 0.0369 0.0928 0.0477 CLD3: LP [06] 0.0094 0.0138 0.0429 0.0220 CLD4: LT [07] 0.0236 0.0538 0.0255 0.0343 CLD5: TP [06] 0.0364 0.0525 0.0545 0.0436 CLD6: LT [39] 0.0265 0.0436 0.0456 0.0394 CLD7: TP [25] 0.0202 0.0347 0.0285 0.0333 CLDT: LP [38] 0.0384 0.0493 0.0474 0.0520 Table 6.7 shows the correlation coefficient between the observed value and the predicted value from the NN model. The higher value of the correlation coefficient indicates that the predicted value is similar to the actual value. The highest correlation was observed in station ILD1 by using LL combination (0.7984 or R2 = 0.6374). The LL combination at station ILD2 also produced a higher value (0.7840 or R2 = 0.6146). The best linear plot for all stations is presented in Appendix P and Appendix Q for the inland and coastal areas respectively. It shows how close the predicted and actual values are. For the inland area, the highest value of r was 0.8361, then with station ILD5 with r = 0.8114, station ILD1 with r = 0.7984), station ILD6 with r = 0.7900, station ILD3 with r = 0.7853 and followed by station ILD2 with r = 0.7840. For the coastal area, the highest value of r was recorded at station CLD3 (0.8703), and was followed by station CLD4 with r = 0.8313. In general, the r- value for the inland area was greater than for the coastal area. Thus, the effects of the foliar nutrient composition to oil palm yield in the inland areas are more evident than in the coastal areas. 168 Table 6.7: The correlation coefficient of the neural network model The activation function combination (input to hidden layer and hidden layer to output layer) Station LL LP LT TP TL TT Inland stations ILD1 0.7984 0.7461 0.7094 0.7575 0.7336 0.7274 ILD2 0.7840 0.7471 0.7574 0.7433 0.7530 0.7586 ILD3 0.7754 0.7662 0.7634 0.7853 0.7712 0.7558 ILD4 0.5687 0.5988 0.5858 0.5900 0.5559 0.5758 ILD5 0.8114 0.7529 0.7635 0.7927 0.7870 0.8051 ILD6 0.7306 0.7210 0.7114 0.7297 0.7136 0.7900 ILD7 0.4498 0.4961 0.4647 0.4912 0.4961 0.4245 ILDT 0.8288 0.8361 0.8360 0.8291 0.8331 0.8335 Coastal stations CLD1 0.7395 0.7756 0.7836 0.7328 0.7404 0.6718 CLD2 0.5880 0.5948 0.5625 0.6425 0.6094 0.5826 CLD3 0.8633 0.8603 0.8640 0.8657 0.8703 0.8460 CLD4 0.8313 0.7974 0.7922 0.7974 0.8135 0.7936 CLD5 0.5359 0.5155 0.5180 0.5397 0.5412 0.5161 CLD6 0.4031 0.4334 0.4560 0.4625 0.3863 0.3965 CLD7 0.6663 0.6590 0.6556 0.7054 0.6763 0.6572 CLDT 0.5241 0.5489 0.5111 0.5095 0.4916 0.5105 After NN architecture was selected, the MSE, RMSE, MAE and MAPE for each station and each combination were calculated. Here we calculate the accuracy of the model prediction(s) compared to the actual values. Table 6.8 shows the MAPE value for each station and for each combination activation function (another measurement shown in Appendix R and Appendix S). Normally, decisions can be made by looking at the model that produces the minimum error. The shadowed value of MAPE was selected to be the best model. 169 Table 6.8: The MAPE values of the neural network model The activation function combination (input layer to hidden layer and hidden layer to output layer) Station LL LP LT TP TL TT Inland area ILD1 0.1159 0.1318 0.1460 0.1443 0.1429 0.1456 ILD2 0.1162 0.1257 0.1244 0.1306 0.1263 0.1223 ILD3 0.1195 0.1190 0.1126 0.1092 0.1141 0.1166 ILD4 0.1356 0.1276 0.1304 0.1288 0.1350 0.1316 ILD5 0.0944 0.1133 0.1067 0.0964 0.0956 0.0968 ILD6 0.0987 0.1010 0.0992 0.1018 0.1052 0.0719 ILD7 0.1674 0.1580 0.1691 0.1642 0.1614 0.1747 ILDT 0.0576 0.0560 0.0564 0.0578 0.0570 0.0573 Coastal CLD1 0.1456 0.1337 0.1398 0.1437 0.1220 0.1461 CLD2 0.0725 0.0638 0.0717 0.0665 0.0779 0.0743 CLD3 0.0756 0.0657 0.0750 0.0688 0.0844 0.0789 CLD4 0.1126 0.1298 0.1118 0.1282 0.1163 0.1125 CLD5 0.1488 0.1498 0.1578 0.1431 0.1530 0.1514 CLD6 0.1279 0.1237 0.1232 0.1274 0.1335 0.1302 CLD7 0.1066 0.1086 0.1088 0.1003 0.1056 0.1103 CLDT 0.1499 0.1456 0.1516 0.1572 0.1531 0.1518 Figure 6.6 to Figure 6.10 show the actual and predicted FFB yield value using the NN model. They represent the best model, which has been discussed earlier. The pink line represents the predicted value, while the blue line represents the actual value. Figure 6.6 and Figure 6.7 show the plot for the inland areas. We can see that most of the red and blue lines cannot be differentiated except for these representing stations ILD4 and ILD7. The predicted FFB yield values for station ILD4 and ILD7 are quite smaller than the actual values. This is not surprising because the correlation coefficient values for both models were considered low compared to the other stations. In fact, the MAPE values for both stations are among the highest. If we 170 study the combined plot for all the stations (ILDT) in the inland areas, we found that the predicted and actual values are not much different. Figure 6.9 to Figure 6.11 represent the actual and predicted FFB yield values for the coastal areas. As for the inland area, the neural networks model can be used to represent the behavior of the Y ield (ton/hec/yr data for all stations. 50.00 40.00 30.00 20.00 10.00 235 222 209 196 183 170 157 144 131 118 105 92 79 66 53 40 27 14 1 0.00 Obs ervation A ctual Pred ILD1 Y ield (ton/hec/yr 50.00 40.00 30.00 20.00 10.00 505 477 449 421 393 365 337 309 281 253 225 197 169 141 113 85 57 29 1 0.00 Observ ation A ctual Pred 40.00 30.00 20.00 10.00 286 271 256 241 226 211 196 181 166 151 136 121 106 91 76 61 46 31 16 0.00 1 Y ield (ton/hec/yr ILD2 Obs erv ation Actual Pred ILD3 * Pred. = predicted Figure 6.6: The actual and predicted FFB yield for ILD1, ILD2 and ILD3 stations using the NN model 40.00 30.00 20.00 10.00 Observation Actual Pred 4 0 .0 0 3 0 .0 0 2 0 .0 0 199 188 177 166 155 144 133 122 111 89 78 67 56 45 34 23 12 0 .0 0 100 1 0 .0 0 1 Y ie ld ( t o n /h e c ILD4 O b s e r v a tio n A c tu a l Pr e d 40.00 30.00 20.00 10.00 163 154 145 136 127 118 109 100 91 82 73 64 55 46 37 28 19 10 0.00 1 Yield (ton/hec/yr) ILD5 Obs erv ation Pred Actual 5 0 .0 0 4 0 .0 0 3 0 .0 0 2 0 .0 0 523 494 465 436 40 7 378 349 320 291 262 233 204 175 88 59 30 1 0 .0 0 146 1 0 .0 0 117 Y ie ld ( t o n /h e c ILD6 O b s e r v a tio n A c tual P re d ILD7 * Pred. = predicted Figure 6.7: The actual and predicted FFB yield for ILD4, ILD5, ILD6 and ILD7 stations using the NN model 628 595 562 529 496 463 430 397 364 331 298 265 232 199 166 133 100 67 34 0.00 1 Y ield (ton/hec/yr 171 172 2566 2431 2296 2161 2026 1891 1756 1621 1486 1351 1216 1081 946 811 676 541 406 271 136 1 Y ield (ton/hec/yr 45.00 40.00 35.00 30.00 25.00 20.00 15.00 10.00 5.00 0.00 Obs ervation Actual Pred 217 205 193 181 169 157 145 133 121 109 97 85 73 61 49 37 13 0 .0 0 5 .0 0 0 .0 0 5 .0 0 0 .0 0 5 .0 0 0 .0 0 5 .0 0 0 .0 0 1 Y ie ld ( t o n / h 4 3 3 2 2 1 1 25 ILDT O b s e r v a t io n A c tu a l Pre d 2 22 209 19 6 183 170 15 7 14 4 131 1 18 105 92 79 66 53 40 27 0 .0 0 5 .0 0 0 .0 0 5 .0 0 0 .0 0 5 .0 0 0 .0 0 5 .0 0 0 .0 0 14 4 3 3 2 2 1 1 1 Y ie ld ( t o n / h e CLD1 O b s e r v a t io n A c tu a l Pre d CLD2 Yield (ton/hec/yr) 40.00 30.00 20.00 10.00 81 76 71 66 61 56 51 46 41 36 31 26 21 16 11 6 1 0.00 Observation Actual Pred CLD3 * Pred. = predicted Figure 6.8: The actual and predicted FFB yield for ILDT, CLD1, CLD2 and CLD3 stations using the NN model 256 2 41 22 6 2 11 1 96 181 1 66 15 1 136 1 21 10 6 91 76 61 46 31 16 4 0.00 3 5.00 3 0.00 2 5.00 2 0.00 1 5.00 1 0.00 5.00 0.00 1 Y ie ld (ton /h ec /y 173 Obs e rv atio n A c tua l Pr e d 217 205 193 181 169 157 145 133 121 109 97 85 73 61 49 37 25 13 4 0 .0 0 3 5 .0 0 3 0 .0 0 2 5 .0 0 2 0 .0 0 1 5 .0 0 1 0 .0 0 5 .0 0 0 .0 0 1 Y ie ld ( t o n / h e CLD4 O b s e r v a t io n A c tu a l Pr e d 487 460 433 379 379 406 352 352 325 298 271 244 217 190 163 136 109 82 55 28 4 0 .0 0 3 5 .0 0 3 0 .0 0 2 5 .0 0 2 0 .0 0 1 5 .0 0 1 0 .0 0 5 .0 0 0 .0 0 1 Y ie ld ( t o n / h e CLD5 O b s e r v a t io n A c tu a l Pr e d CLD6 3 0 .00 2 0 .00 487 460 433 406 325 298 271 244 217 190 163 82 55 28 1 0 .0 0 136 1 0 .00 109 Y ie ld ( t o n /h e c 4 0 .00 O b s e r v a t io n A c tu a l Pr e d CLD7 * Pred. = predicted Figure 6.9: The actual and predicted FFB yield for CLD4, CLD5, CLD6 and CLD7 using the NN model Y ield (ton/hec/yr) 174 50.00 40.00 30.00 20.00 10.00 2015 1909 1803 1697 1591 1485 1379 1273 1167 1061 955 849 743 637 531 425 319 213 107 1 0.00 Observation A ctual Pred CLDT * Pred. = predicted Figure 6.10: The actual and predicted FFB yield for CLDT using the NN model 6.5.3 Residual Analysis Residual analysis was also performed to examine the distribution of the error modelling Figure 6.11 and Figure 6.12 present the error distribution after NN models are fitted for the inland and coastal areas respectively. Figure 6.11 (ILD1 to ILD7) shows the error distribution for the inland stations, and Figure 6.11 ILDT is the combination of all inland stations. The error distribution of all stations appears to scatter randomly at the zero line. Figure 6.12 (CLD1 to CLD7), presents the error distribution for the coastal stations, and Figure 6.12 (CLDT) shows the error distribution of the combination for all stations in the coastal area. We also found that all the error distributions were normally distributed. It means that the selected neural network model is adequate to fit the oil palm yield data. 175 0.80 0.50 0.40 0.60 0.30 0.40 0.20 Error Error 0.10 0.00 -0.10 0 50 100 150 200 250 300 0.20 0.00 0 100 200 300 400 500 600 -0.20 -0.20 -0.30 -0.40 -0.40 -0.60 -0.50 O b se rva tion Observation ILD2 ILD1 0.50 0.80 0.40 0.60 0.30 0.20 0.40 0.00 -0.10 0 50 100 150 200 250 300 350 Error Error 0.10 0.20 0.00 -0.20 0 100 200 300 400 500 600 700 -0.20 -0.30 -0.40 -0.40 -0.50 -0.60 -0.60 Observation Observation ILD4 0.50 0.50 0.40 0.40 0.30 0.30 0.20 0.20 0.10 0.10 0.00 -0.10 0 50 100 150 200 250 Error Error ILD3 0.00 -0.10 0 -0.20 -0.20 -0.30 -0.30 -0.40 -0.40 -0.50 50 100 Obse rvation ILD6 ILD5 0.80 0.50 0.60 0.40 0.30 0.40 0.20 0.00 0 100 200 300 400 500 600 Error 0.20 Error 200 -0.50 Observation -0.20 150 0.10 0.00 -0.10 -0.40 -0.20 -0.60 -0.30 0 500 1000 1500 2000 2500 3000 -0.40 -0.80 Observation ILD7 Observation ILDT Figure 6.11: The error distribution plots of the neural network model for the inland stations 176 0.50 0.60 0.40 0.40 0.30 0.20 0.20 Error 0 50 100 150 200 250 -0.20 Error 0.10 0.00 0.00 -0.10 0 50 100 150 200 250 -0.20 -0.40 -0.30 -0.60 -0.40 -0.80 -0.50 Observation Observation CLD2 0. 30 0.60 0. 20 0.40 0. 10 0.20 0. 00 0 20 40 60 80 100 -0.10 Error Error CLD1 0.00 0 -0.20 -0.40 -0.30 -0.60 -0.40 50 100 200 250 300 400 500 600 -0.80 Observation Obse rva tion CLD3 CLD4 0.60 0.80 0.40 0.60 0.20 0.40 0.00 0 50 100 150 200 250 -0.20 Error Error 150 -0.20 0.20 0.00 0 -0.40 -0.20 -0.60 -0.40 -0.80 100 200 300 -0.60 Observation Observation CLD5 CLD6 0.60 1.00 0.80 0.40 0.60 0.20 0 100 200 300 -0.20 400 500 600 Error Error 0.40 0.00 0.20 0.00 -0.20 0 500 1000 1500 2000 2500 -0.40 -0.40 -0.60 -0.60 -0.80 -0.80 Obse rva tion CLD7 Observation CLDT Figure 6.12: The error distribution plots of the neural network model for the coastal stations 177 6.5.4 Results of Experiment 1 After completing the experiment, we rearranged the data for variance analysis and response surface analysis. The F value of hidden nodes, 8.7759 (p = 0.0000 and df = (7, 1912)), indicated that the performance of the neural networks model was statistically affected by the number of hidden nodes. The F value for number of runs, 1.6950 (p = 0.1330 and df = (5, 1914)), and the momentum term, 1.3300 (p = 0.2630 and df = (3, 1916)), show that both factors did not influence the overall performance of the neural networks. This analysis yields the conclusion that only the number of hidden nodes has a significant influence on the NN performance, and there is no effect resulting from the number of runs or the momentum term value on the neural network’s performance. 6.5.5 Results of Experiment 2 After running the analysis of variance, we found that the F value for the hidden nodes is 8.0480 (p = 0.0000 and df = (6, 2932)) and the F value for the number of runs is 2.8840 (p = 0.0080 and df = (6, 2932)). This indicates that both factors affect the neural network’s performance. However, the F value for the learning rate is 1.6090 (p = 0.1540 and df = (5, 2933)), which means that the null hypothesis cannot be rejected and we conclude that the learning rate does not influence the neural network’s performance. 6.5.6 Results of Experiment 3 The results of the analysis of variance (ANOVA) tests, and independent sample t-tests (Norušis 1998) conducted to test the effects of percentage-outliers and magnitude-outliers on MSE, are discussed. Tests are also performed to obtain which combinations of percentage-outliers and magnitude-outliers differ significantly from the base-case scenario with no data outliers, and their findings are reported. For both 178 experiments, actual and predicted values were compared using mean squares error (MSE) as a measure of modelling accuracy. (i) Experiment for outliers in the training data Estimated MSE results, using simulated inaccuracies for magnitude-outliers and percentage-outliers in oil palm yield modeling, are given in Figure 6.13. The results show that as percentage-outliers increases from 5% to 30%, MSE values also increases, indicating a decrease in modelling accuracy. As magnitude-outliers increases from 2 σ̂ to 4 σ̂ , MSE values also increase, again indicating a decrease in MSE modelling accuracy in the training data (Figure 6.14). 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 5 10 15 20 25 30 Percentage-outlier 2 2.5 3 3.5 4 Figure 6.13: The MSE values for different levels of the percentage-outliers in the training data MS E 179 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 0 2 2.5 3 3.5 4 Magnitude-outlier 5 10 15 20 25 30 Figure 6.14: The MSE values for different levels of the magnitude-outliers in the training data A one-factor ANOVA test was conducted to investigate the individual effects of percentage-outliers and magnitude-outliers on the neural network’s performance. The independent variables are the percentage-outliers (5%, 10%, 15%, 20%, 25% and 30%) and the magnitude-outliers (µ ± 2.0 σ̂ , µ ± 2.5 σ̂ , µ ± 3.0 σ̂ , µ ± 3.5 σ̂ and µ ± 4.0 σ̂ ). The F values were recorded as 18.481 (p = 0.000 and df =(5, 179)) and 3.988 (p = 0.002 and df = (4, 179)) for the percentage-outliers and magnitude-outliers respectively, indicating that both factors produced a statistically significant effect on the modelling accuracy. Following this, the two-factor ANOVA test was conducted to examine the effects of both independent variables on MSE simultaneously. Significant main effects for the percentage-outliers (F = 28.246 and df = (5, 154)) and the magnitudeoutliers (F = 3.332 and df = (4, 154)) and their interaction (F = 2.507 and df = (20, 154)), were found as the p-values were less then 0.05. These results indicated that modelling accuracy in the training data can be affected by both the percentageoutliers and the magnitude-outliers. When more than two levels of factor were conducted, the ANOVA results did not indicate where significant differences occurred. For example, while the 180 percentage-outliers is a significant factor, this difference may be a result of the percentage-outliers changing from 10% to 15%, or 15% to 20%, or 25% to 30%. It could also have come from a larger jump, such as 5% to 25% or 10% to 30%. The independent t-test with 9 degrees of freedom was performed to test the MSE values between results with no outliers and the conjunction of percentageoutliers and magnitude-outliers. Independent sample t-tests were performed in order to determine exactly where significant differences occurred. The results of the t-tests are presented in Table 6.9. For all the σ̂ ’s of magnitude-outliers, significant differences (p < 0.05) were found between the percentage-outliers of 15%, 20%, 25% and 30% and data sets with no outliers. This means that the neural network was first influenced by the outliers in the training data when the percentage-outliers reached 15%. The neural network is unaffected by the outliers impact when the percentageoutliers in the training data is lower than 15%. Table 6.9: The t-statistic values in the training data Magnitudeoutliers Percentage-outliers 5% 10% 15% 20% 25% 30% 2.0 σ̂ 0.410 -0.918 -2.902* -3.797* -3.374* -2.722* 2.5 σ̂ 0.208 -0.597 -2.857* -3.266* -3.687* -3.517* 3.0 σ̂ -1.348 -2.080 -3.301* -3.218* -3.979* -3.503* 3.5 σ̂ -0.897 -0.142 -3.048* -3.178* -4.805* -6.867* 4.0 σ̂ -0.861 -1.991 -2.831* -3.990* -5.147* -6.211* * p-value < 0.05; degrees of freedom = 9. (ii) Experiment for outliers in the test data In this section, we conducted an experiment for outliers in test data which used the same procedures of ANOVA and independent sample t-tests as the training data. The estimated MSE results, using the simulated inaccuracies for magnitudeoutliers and percentage-outliers for the oil palm yield modelling are given in Figure 181 6.15. They show that as the percentage-outliers increases from 5% to 30%, the MSE also increases, indicating a decrease in estimate accuracy. As the magnitude-outliers increases from 2 σ̂ to 4 σ̂ , the MSE also increases, which indicates a decrease in the MSE modelling accuracy (Figure 6.16). 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 5 10 15 20 25 30 Percentage-outlier 2 2.5 3 3.5 4 Figure 6.15: The MSE values for different levels of the percentage-outliers in test data 0.14 0.12 MSE 0.1 0.08 0.06 0.04 0.02 0 0 2 2.5 3 3.5 4 Magnitude-outlier 5 10 15 20 25 30 Figure 6.16: The MSE values for different levels of the magnitude-outliers in the test data A one-factor ANOVA test was conducted to investigate the individual effects of percentage-outliers and the magnitude-outliers on the neural network’s performance in the test data set. The independent variables used are percentageoutliers (6 levels) and magnitude-outliers (5 levels). The F values were recorded as 12.171 (p = 0.000 and df = (5, 179)) and 3.570 (p = 0.004 and df = (4, 179)) for the 182 percentage-outliers and magnitude-outliers respectively, indicating that both factors are statistically significant therefore affecting the modelling accuracy. Next, the two-factor ANOVA test was conducted to investigate for the effect of both independent variables on MSE simultaneously. Significant main effects for percentage-outliers (F = 11.709 and df = (5, 154)), magnitude-outliers (F = 2.640 and df = 4, 154)) and their interaction (F = 2.273 and df = (20, 154)) were found as the p- values were less then 0.05. These results indicated that the percentage-outliers and magnitude-outliers had an effect on modelling accuracy. The independent t-tests were also performed to examine the MSE values between results with no outliers and the conjunction of percentage-outliers and magnitude-outliers. Independent sample t-tests were performed in order to determine exactly where significant differences occurred. The results of the t-test with 9 degrees of freedom are presented in Table 6.10. For all the σ̂ ’s of magnitudeoutliers, significant differences (p < 0.05) were found between percentage-outliers of 15%, 20%, 25% and 30% and data sets with no outliers. Therefore, the conclusion can be made that the neural network was first influenced by the outliers when the percentage-outliers reached 15%. The neural network is resilient to the outliers’ impact when the percentage-outliers in the test data is lower than 15%. This result is consistent with the result from the training set data. 183 Table 6.10: The t-statistic values for the test data Magnitude- Percentage-outliers outliers 5% 10% 15% 20% 25% 30% 2.0 σ̂ -1.043 -1.196 -3.092* -5.429* -2.558* -8.283* 2.5 σ̂ -1.365 -0.982 -2.814* -4.304* -3.073* -6.072* 3.0 σ̂ -0.567 -1.442 -3.535* -4.461* -5.086* -5.669* 3.5 σ̂ -0.090 -0.523 -2.999* -3.619* -5.902* -6.768* 4.0 σ̂ -0.172 -0.346 -3.061* -3.322* -5.141* -3.355* * p-value < 0.05; degrees of freedom = 9. 6.6 COMPARATIVE STUDY ON YIELD MODELLING The MLR and RMR have been discussed in chapter 5. In this section, we will look at the performance of each model and conduct comparative study. The comparison is based on measurements of error such as MSE, RMSE, MAE, MAPE and the correlation coefficient (Table 6.11). Description of each station is given according to the MAPE and correlation values. The MAPE value for the MLR model at station ILD1 is 0.1623, or as a percentage, 16.23% error, whereas the RMR model decreased the error to 15.79%. The NN model recorded the minimum MAPE value as 0.1159, or an 11.59% error. The correlation coefficient for the NN model was also the highest at 0.7984, compared to the MLR and the RMR models at 0.6260 and 0.7554 respectively. For station ILD2, we found that the MAPE value for the MLR and the RMR models were 0.1483 and 0.1555 respectively, compared to the NN model which was 0.1162. This means that the overall errors from the NN model were about 11.62%. The correlation value for the RMR and NN models are quite close to each other 0.7732 and 0.7810 respectively. 184 The error modelling results for station ILD3 using the MLR, RMR and NN models are 14.04%, 14.03% and 10.92% respectively. While the correlation coefficients for each are 0.6360, 0.6169 and 0.7853 respectively. The error percentages recorded at station ILD4 are considered high at 15.06%, 17.65% and 12.76% for the MLR, RMR and NN models respectively. This might be due to the nutrients composition’s low response to FFB yield. The correlation coefficients also support this due to the low values recorded for each model. The error model recorded at station ILD5 was 14.97% for MLR, 14.83% for RMR and 9.44% for the NN models. Furthermore, the correlation coefficient for the NN models comparatively high compared to MLR and RMR at 0.8114 compared to 0.6330 and 0.5681, respectively. The MAPE for the MLR and RMR models at station ILD6 are 0.1242 and 0.1778 respectively. However, the NN model recorded a quite low result of 0.0719. For station ILD7, all models recorded very low results in correlation, around 0.3430 and 0.4961, in a similar phenomenon to station ILD4. For the ILDT, the percentage error is 18.31% for the MLR model, 19.01% for the RMR model and 5.60% for the NN model. Table 6.11: The MSE, RMSE, MAE and MAPE for MLR, RMR and NN performance for the inland area Inland Stations Station MSE RMSE MAE MAPE Correlation ILD1 MLR 21.4611 4.6326 3.6667 0.1623 0.6260 RMR 25.0822 5.0082 3.7602 0.1579 0.7554 NN 13.1079 3.6204 2.6677 0.1159 0.7984 ILD2 MLR 16.8930 4.1101 3.2249 0.1483 0.6490 RMR 18.4810 4.2989 3.3185 0.1555 0.7732 NN 11.4734 3.3872 2.5910 0.1162 0.7810 ILD3 MLR 18.3084 4.2788 3.4338 0.1404 0.6360 RMR 18.4817 4.2990 3.4290 0.1403 0.6169 NN 11.9431 3.4558 2.7076 0.1092 0.7853 185 (continue…) Station MSE RMSE MAE MAPE Correlation ILD4 MLR 17.8201 4.2214 3.4573 0.1506 0.4300 RMR 26.0176 5.1001 4.0155 0.1765 0.4457 NN 14.0759 3.7517 2.9545 0.1276 0.5988 ILD5 MLR 13.8848 3.7262 3.0457 0.1497 0.6330 RMR 13.9845 3.7395 3.0222 0.1483 0.5681 NN 5.9225 2.4336 1.8398 0.0944 0.8114 14.0868 3.7532 3.0552 0.1242 0.5630 MR 27.8998 5.2820 4.4151 0.1778 0.5593 NN 7.0829 2.6613 1.7861 0.0719 0.7900 23.5147 4.8492 3.8944 0.1798 0.3430 MR 23.7693 4.8754 3.8651 0.1745 0.3568 NN 20.4788 4.5253 3.5087 0.1580 0.4961 ILDT MLR 24.8232 4.9822 4.0833 0.1831 0.3840 RMR 29.0756 5.3922 4.2919 0.1901 0.4932 NN 2.8263 1.6811 1.2689 0.0560 0.8361 ILD6 MLR ILD7 MLR Table 6.12 presents the summary of measurements for those three models considered in this study for coastal areas. For example, the percentage error at station CLD1 is 17.66% for MLR, 16.25% for RMR and 12.20% for NN. These figures prove that the NN model is able to fit the model more effectively when compared with regression approach. Looking at adequacy (coefficient correlation), we found that the MLR approach is much better than the RMR. However, the NN model still shows some improvement; an improvement of 0.616 to 0.7407 from the MLR to the NN model. For stations CLD2 and CLD3, the error percentages stated are quite low at around 8 %. This signifies that the models have been matched efficiently to the data. The coefficient correlation for station CLD3 is does not differ much among the three matched models. Meanwhile, in station CLD4, the coefficient correlation for the NN model is very different to the MLR and RMR models. The NN models results of recorded 0.8312, compared to 0.224 and 0.4748 for the MLR and RMR models. 186 Table 6.12: The MSE, RMSE, MAE and MAPE for MLR, RMR and neural network performance for the coastal area Coastal area Station CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 CLDT MSE RMSE MAE MAPE Correlation MLR 24.4451 4.9442 4.0550 0.1766 0.6160 RMR 28.4546 5.3343 4.0660 0.1625 0.3438 NN 15.1412 3.8911 2.8715 0.1220 0.7404 MLR 8.3597 2.8913 2.3787 0.0813 0.4130 RMR 9.5160 3.0848 2.4246 0.0843 0.5538 NN 7.5109 2.7406 1.9190 0.0638 0.6422 MLR 7.5409 2.7461 2.1067 0.0839 0.8290 RMR 7.5526 2.7482 2.1055 0.0838 0.7186 NN 6.6892 2.5863 1.7651 0.0657 0.8703 MLR 40.2150 6.3415 5.6664 0.2562 0.2240 RMR 80.9058 8.9948 5.8929 0.3064 0.4748 NN 13.2776 3.6438 2.6550 0.1118 0.8312 MLR 23.7183 4.8701 3.9941 0.1729 0.3320 RMR 23.7804 4.8765 3.9849 0.1725 0.3397 NN 18.5634 4.3085 3.3054 0.1431 0.5397 MLR 17.6260 4.1983 3.3901 0.1394 0.2080 RMR 17.6583 4.2022 3.3828 0.1389 0.2231 NN 15.0867 3.8841 3.0213 0.1232 0.4560 MLR 17.5519 4.1895 3.3821 0.1301 0.4810 RMR 23.8404 4.8827 3.6051 0.1505 0.3883 NN 11.6043 3.4065 2.6659 0.1003 0.7054 MLR 27.3915 5.2337 4.2791 0.1804 0.2100 RMR 33.7806 5.8121 4.4196 0.2000 0.3742 NN 20.0818 4.4812 3.5495 0.1456 0.5489 187 In station CLD5, CLD6 and CLDT, the MLR and RMR models were recorded comparatively high MAPE values of around 13 to 20%. Furthermore, the coefficient correlations attained are considered low compared to the other stations. Even so, the NN model has successfully increased the coefficient correlation for those three stations. For example, the correlation recorded in station CLD5 for the MLR model has increased from 0.3320 to 0.5397. In addition, at station CLD6 the correlation has increased from 0.2080 to 0.4560. For CLDT also shows an increase from 0.2100 to 0.5489 Figure 6.17 and Figure 6.18 display graphically the differences of the correlation coefficient for those three models for the inland and coastal areas. The first and second bars represent the correlation value from the MLR and RMR models respectively, while the third bar show the correlation value from the NN model. For all inland stations, the correlation values of the NN model were recorded as significantly higher than when using the regression approach, except for station ILD2. At station ILD2 the correlation values of the RMR and NN models were quite close, but the NN model still gave a higher result. For the coastal area, all correlation values from the NN model recorded higher than those from the regression model. Correlation coefficient 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 ILD1 ILD2 ILD3 ILD4 ILD5 ILD6 ILD7 ILDT Stations MLR RMR NN Figure 6.17: The correlation coefficients from the MLR, RMR and NN models for the inland area 188 0.9 0.8 0.7 0.6 Correlation 0.5 coeffficient 0.4 0.3 0.2 0.1 0 CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 CLDT Stations MLR RMR NN Figure 6.18: The correlation coefficient from the MLR, RMR and NN models for the coastal area 0.2 0.18 0.16 0.14 0.12 MAPE 0.1 0.08 0.06 0.04 0.02 0 ILD1 ILD2 ILD3 ILD4 ILD5 ILD6 ILD7 ILDT Stations MLR RMR NN Figure 6.19: Comparison of the MAPE values between the MLR, RMR and NN models for the inland area 189 0.35 0.3 0.25 0.2 MAPE 0.15 0.1 0.05 0 CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 CLDT Stations MLR RMR NN Figure 6.20: Comparison of the MAPE values between the MLR, RMR and NN models for the coastal area Figure 6.19 represents the MAPE values for three different models for the inland area. In this figure, the third bar represents the MAPE values of the NN model, and all of them are lower than those from the regression models especially that of ILDT. Figure 6.20 also represents the MAPE values for those three models for the coastal area. As with the inland area, the NN model also recorded lower MAPE values compared to the regression model approach. In this section, we will discuss thoroughly the changes in the coefficient correlation and the increment of accuracy among the MLR, RMR and NN models. Table 6.13 indicates the change of coefficient correlation from the MLR to the NN model and from the RMR to the NN model. For station ILD1 (inland area), the change from the MLR to the NN model is considered higher than the change from the RMR to the NN model. Specifically, the change is 27.54% compared to5.69%. We found that the same situation occurs at station ILD2. The change from the RMR to the NN model is 20.34%, the change from the RMR model to the NN model is only 1.01. 190 On the other hand, stations ILD3, ILD4, ILD6 and ILD7 recorded large changes in correlation coefficient, but the changes from both the two models (MLR and RMR) are quite similar. The percentage change from the MLR to the NN model is very high 117.73% for the ILDT, compared to only a 69.52% changes from the RMR to the NN model. From Table 6.13, we can see the large changes in coefficient correlation for the coastal area from the regression analysis to the NN model. The change from the MLR model to the NN model at station CLD1 is quite small, 20.19%, compared with the 115.36% change that occurred from the RMR to the NN model. The highest change from the MLR model to the NN model were recorded at station CLD4, CLDT and CLD6 as 253.66%, 161.23% and 119.23%, respectively. In addition, the percentage change from the RMR to the NN model quite high between about 58% and 104 %, except for stations CLD2 and CLD3, where the changes are only 7.43% and 14.17% respectively for the RMR model. Table 6.14 presents the numerical values of accuracy (measured in MAPE) estimating from the models proposed. The last two columns present the changes in accuracy of the regression model compared to the NN model. In the inland area, the changes in accuracy from the MLR to the NN model range from around 2.6 to 15.56%, whilst the percentage changes from the RMR to the NN model range from around 2 to 16.56%. The highest changes were recorded at the ILDT. 191 Table 6.13: The correlation changes from the MLR and RMR model to neural network model Station MLR RMR Percentage change NN MLR to NN RMR to NN Inland stations ILD1 0.6260 0.7554 0.7984 27.5399 5.6923 ILD2 0.6490 0.7732 0.7810 20.3389 1.0088 ILD3 0.6360 0.6169 0.7853 23.4748 27.2978 ILD4 0.4300 0.4457 0.5988 39.2558 34.3504 ILD5 0.6330 0.5681 0.8114 28.1832 42.8269 ILD6 0.5630 0.5593 0.7900 40.3197 41.2479 ILD7 0.3430 0.3568 0.4961 44.6356 39.0410 ILDT 0.3840 0.4932 0.8361 117.7343 69.5255 Coastal stations CLD1 0.6160 0.3438 0.7404 20.1948 115.3577 CLD2 0.4130 0.5538 0.5948 44.0194 7.4034 CLD3 0.8290 0.7186 0.8603 3.7756 14.1700 CLD4 0.2240 0.4748 0.7922 253.6607 66.8492 CLD5 0.3320 0.3397 0.5397 62.5602 58.8754 CLD6 0.2080 0.2231 0.4560 119.2307 104.3926 CLD7 0.4810 0.3883 0.7054 46.6528 81.6637 CLDT 0.2100 0.3742 0.5489 161.3809 46.6863 192 Table 6.14: The performance changes of MAPE from the MLR and RMR models to the NN model Percentage change of model accuracy MLR RMR NN MLR to NN RMR to NN Inland stations ILD1 0.1623 0.1579 0.1159 5.5390 4.9875 ILD2 0.1483 0.1555 0.1162 3.7689 4.6536 ILD3 0.1404 0.1403 0.1092 3.6296 3.6175 ILD4 0.1506 0.1765 0.1276 2.7078 5.9381 ILD5 0.1497 0.1483 0.0944 6.5036 6.3285 ILD6 0.1242 0.1778 0.0719 5.9717 12.8801 ILD7 0.1798 0.1745 0.1580 2.6579 1.9988 ILDT 0.1831 0.1901 0.0560 15.5588 16.5576 Coastal stations CLD1 0.1766 0.1625 0.1220 6.6310 4.8358 CLD2 0.0813 0.0843 0.0638 1.9049 2.2387 CLD3 0.0839 0.0838 0.0657 1.9867 1.9756 CLD4 0.2562 0.3064 0.1118 19.4138 28.0565 CLD5 0.1729 0.1725 0.1431 3.6030 3.5529 CLD6 0.1394 0.1389 0.1232 1.8824 1.8232 CLD7 0.1301 0.1505 0.1003 3.4257 5.9094 CLDT 0.1804 0.2000 0.1456 4.2460 6.8000 As we can see from Figure 6.21, it is very clear that the NN method constantly provides better results than regression model. Generally, the accuracy changes in the coastal area are lower than in the inland area. For example, stations CLD2, CLD3 and CLD6 recorded the lowest changes around 2%. Station CLD4 shows the highest changes; 19.41% for the MLR model and 28.06% for the RMR model. Figure 6.22 shows the accuracy in estimating by three models for the coastal area, and the bar for the NN model is fairly similar to the bar for the regression model. Graphically, changes in accuracy are very obvious, as shown in Figure 6.23. On the other hand in Figure 6.24, only station CLD4 shows a sizeable difference between the regression model and the NN model. 193 95 90 85 (1-MAPE)*100 80 75 70 ILD1 ILD2 ILD3 ILD4 ILD5 ILD6 ILD7 ILDT Stations MLR RMR NN Figure 6.21: Comparison of the accuracy of models for the inland area 100 90 80 70 60 (1-MAPE)*100 50 40 30 20 10 0 CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 CLDT Stations MLR RMR NN Figure 6.22: Comparison of the accuracy of models for the coastal area 194 18 16 14 12 % change of 10 accuracy 8 6 4 2 0 ILD1 ILD2 ILD3 ILD4 ILD5 ILD6 ILD7 ILDT Stations MLR to NN RMR to NN Figure 6.23: The percentage changes of the model accuracy for the inland area 30 25 20 % change of 15 accuracy 10 5 0 CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 CLDT Stations MLR to NN RMR to NN Figure 6.24: The percentage changes of the models accuracy for the coastal area 195 6.7 CONCLUSION For both experiments, the result demonstrated that the number of hidden nodes produces an effect in the overall performance of the NN architecture. The experiment also found that the momentum term and learning rate do not reflect their influence in the NN’s performance. The first experiment shows that the number of runs did not affect the NN performance, but changing the momentum term to learning rate, due to shows a significant effect on NN performance. The number of runs too influences NN performance. For outliers in the training data, it has been demonstrated that modelling accuracy decreases as the percentage-outliers and magnitude-outliers increases. It has also been shown that the magnitude-outliers affect on modelling accuracy, and that the relationship between the percentage-outliers and model accuracy is linear. When the percentage-outliers is lower than 15% (even though the magnitude of outliers may increase), the effect on model accuracy is statistically insignificant as there are no outliers in the training data. The model’s accuracy is statistically significant compared to having no outlier data, starting at the combination of 15% of percentage-outliers and magnitude-outliers at all σ̂ ’s. For outliers in the test data it has been demonstrated that modelling accuracy decreases as the percentage-outliers and magnitude-outliers increases. The finding that modelling accuracy decreased as the percentage of outliers increased is a departure from the work of Bansal et al. (1993), who discussed a neural network application that is not affected by the error rate of test data. Our research findings are similar to work by Klein and Rossin (1999a and 1999b). One difference between this study and the work of Bansal et al. (1993) and Klein and Rossin (1999a and 1999b) is that the magnitude of the outliers in our research is defined using variance from the data set and has five levels, while their study was based on percentage where only two levels were considered. Therefore, this work shows that variations in the percentage of outliers and magnitude of outliers in the test data may affect modeling accuracy at these higher levels. 196 Comparative study conducted between MLR, RMR and the NN model, shows that the neural network approach outperforms both the multiple linear regression and the robust M-regression. 197 CHAPTER 7 THE APPLICATION OF RESPONSE SURFACE ANALYSIS IN MODELLING OIL PALM YIELD 7.1 INTRODUCTION This chapter describes the application of response surface analysis in modelling oil palm yield. The analysis of the response surface was conducted using the Statistical Analysis System (SAS) version 6.12 package. The results are demonstrated in this chapter. 7.2 RESPONSE SURFACE ANALYSIS The purpose of using response surface analysis is to determine the optimum level of oil palm yield using fertiliser information. Canonical analysis was used to investigate the shape of the predicted response surface. The brief mathematical explanation of response surface analysis is given as follows. For each response variable, the model can be written as follows; yi = xi′Bxi + b′xi + c′zi + εi i = 1, 2, …, n. where yi is the ith observation of the response variable. xi = (xi1, xi2, …, xik)′ are the k factor variables for the ith observation. (7.1) 198 zi = (zi1, zi2, …, zip)′ are the p covariates, including the intercept term. B is the kxk symmetrical matrix of quadratic parameters, with diagonal elements of equal value to the coefficient of the pure quadratic terms in the model, and off-diagonal elements of equal value to half the coefficient of the corresponding cross-product. b is the kx1 vector of linear parameters. c is the px1 vector of covariate parameters, one of which is the intercept. εi is the error associated with the ith observation. Tests performed assume that errors are independently and normally distributed with mean zero and variance σ2. The parameters in B, b and c are estimated by the least squares method. To optimize y with respect to x, take the partial derivatives, set them to zero and solve the equation (7.2). ∂y = 2xB + b′ = 0 , then x = -0.5B-1b ∂x (7.2) Canonical analysis also known as the stationary point is used to determine whether the solution at the stationary point has a maximum or minimum response. It is needed to find out whether B is positive or negative definite by looking at the eigenvalues (represented by λ in equation (3.71)) of B. If the eigenvalues are all negative then the solution is at a maximum point, if the eigenvalues are all positive then the solution is at a minimum point, and if the eigenvalues have mixed signs then the solution is at the stationary point is saddle point. A detailed mathematical explanation of the response surface analysis was given in Chapter 3. If the estimated surface is found at a maximum or minimum point, the analysis performed by model fitting and the canonical analysis may be sufficient (Myers and Montgomery, 1995; Christensen, 2001; Box and Draper, 1987; SAS 1992). If the stationary point is a saddle point then the ridge analysis is proposed to ensure the stationary point will be inside the experimental region. The result is a set of coordinates for the maximum or minimum point, along with the predicted response at each computed point on the path. The method of the ridge analysis solves the estimated ridge for the optimum response and increases the radius from 199 the center of the original design. The coded radius is the distance from the ridge’s origin. An example of the response surface plot for fertiliser treatments in the inland and coastal areas are presented in Figure 7.1. ILD1 CLD2 ILD5 CLD7 Figure 7.1: The response surface plots for the fertiliser treatments in ILD1 and ILD2 stations in the inland area and CLD2 and CLD7 stations in the coastal area 200 7.3 DATA ANALYSIS The data analysis was conducted using fertiliser treatments. The detailed procedure of response surface analysis is depicted in Figure 7.2. Data Fertiliser Treatments RSA [Canonical Analysis] Stationary Point Maximum/Minimum/Saddle Point? if saddle point RSA [Ridge Analysis] if maximum or minimum Profit Analysis Conclusion Figure 7.2: Data analysis procedure obtained the optimum level of fertiliser The SAS package provided the easy way to perform response surface analysis via PROC RSREG procedure (SAS, 1992). The RSREG procedure allows one of each of the following statements; 201 PROC RSREG option; MODEL response = independents/options; RIDGE option; WEIGHT variable; ID variables; BY variables; The PROC RSREG and MODEL statements are required. The MODEL statement lists the dependent variable (oil palm yield) followed by an equal sign, and then lists independent variables namely, N, P, K and Mg fertilisers. Independent variables specified in the MODEL statement must be variables used in the data set. A RIDGE statement specifies that the ridge of optimum response be computed. The ridge starts at given point x0, and the point on the ridge at radius r from x0 is a collection of factor settings that optimises the predicted response at that radius. The ridge analysis can be used as a tool to help interpret an existing response surface or to indicate the direction in which further experimentation should be performed. A BY statement can be used with PROC RSREG to obtain separate analyses on observations in groups defined by the BY variable. When it is stated in the programming, the procedure expects the input data set to be sorted in order of the BY variables. The ID statement names variables that are to be transferred to the created data set, which contains statistics for each observation. The WEIGHT statement names a numeric variable in the input data set. 7.4 NUMERICAL EXAMPLE Analysis of the fertiliser treatments using response surface analysis was conducted for each station. The discussion on the findings will be divided into the canonical analysis and the ridge analysis for the fertiliser treatments. The stationary 202 point was identified to determine its turning points: either a maximum, a minimum or a saddle. The ridge analysis is introduced if the stationary point is a saddle point. 7.4.1 Canonical Analysis for Fertiliser Treatments The summary of the response surface analysis, which provides the values of MSE, RMSE and R2, is presented in Table 7.1 for inland areas and Table 7.2 for coastal areas. The average of the FFB yield is considered as the mean of the FFB yield in tonnes, and R2 represents the variance explained by the exploratory variables or factors. The average FFB yield in the inland area was recorded as between 21.4925 tonnes/hectare/year to 29.2185 tonnes/hectare/year. While in the coastal areas, the average FFB yield was recorded within the range of 26.766 tonnes/hectare/year to 31.4014 tonnes/hectare/year. The highest R2 value was recorded at station CLD3 (0.7613) and was followed by station CLD4 (0.5972). Both stations are located in the coastal area. Table 7.1: The average of FFB yield, MSE, RMSE and R2 values for inland area Station Average FFB yield MSE RMSE R2 (tonnes/hectare/year) ILD1 23.7382 15.7966 3.9744 0.5802 ILD2 23.5865 22.6885 4.7632 0.4322 ILD3 26.7956 15.2171 3.9009 0.5330 ILD4 26.4915 8.5504 2.9241 0.5660 ILD5 29.2185 6.5911 2.5673 0.4532 ILD6 23.5196 17.0863 4.1335 0.4638 ILD7 21.4925 15.8995 3.9874 0.4529 203 Table 7.2: The average of FFB yield, MSE, RMSE and R2 values for coastal area Station Average FFB yield MSE RMSE R2 (tonnes/hectare/year) CLD1 28.1731 8.82712 2.9710 0.5696 CLD2 28.2383 11.2728 3.3575 0.5858 CLD3 26.6933 7.0654 2.6581 0.7613 CLD4 30.3716 4.3348 2.0820 0.5972 CLD5 26.7660 16.4810 4.0596 0.5354 CLD6 30.0687 8.1174 2.8491 0.5214 CLD7 31.4014 13.6416 3.6934 0.5316 All the important results, including the eigenvalues, critical values, predicted FFB yield values at the stationary points and concluding remarks of stationary points are presented in Table 7.3. As shown in Table 7.3, station ILD1 has all negative values of λ, λ1 = -0.3439, λ2 = -0.9395, λ3 = -2.3165 and λ4 = -4.5217, thus the stationary point is a maximum point. The results from these findings indicated that 5.148 kg of the N fertiliser, 2.7054 kg of the P fertiliser, 3.2195 kg of the K fertiliser and 2.8126 kg of the Mg fertiliser are needed to achieve the maximum level of FFB yield of 30.1826 tonnes per hectare per year. On the other hand, station ILD3 recorded λ values of the N, P and K fertiliser at 0.8696, -0.6600 and -1.7946 respectively. It shows that the stationary point for ILD3 is a saddle point as the λ values recorded mixed signs. As presented in the Table 7.2, we found that the stationary points for all stations were saddle except for ILD1 and ILD2. 204 Table 7.3: The eigenvalues, predicted FFB yield at the stationary points, and critical values of fertiliser level in the inland area Station/ Eigenvalue Critical fertiliser (λ) value Predicted FFB yield at stationary Concluding remarks point N -0.3439 5.1480 ILD1 P -0.9395 2.7054 K -2.3165 3.2195 Mg -4.5217 2.8126 N -0.9069 6.4000 ILD2 K -2.2656 6.3864 N 0.8696 6.2525 ILD3 P -0.6600 0.9165 K -1.7946 3.1341 N 1.6003 5.1652 ILD4 P 0.8658 2.0342 K -0.2154 4.8463 Mg -1.0857 2.0679 N 1.6615 1.1063 ILD5 P -0.9745 0.9999 K -1.1204 2.9753 N 0.7537 7.5114 ILD6 P 0.3066 3.0042 K -0.2191 2.4278 Mg -1.7060 0.6666 N 1.3554 5.8538 ILD7 P -0.1517 4.7872 K -0.8699 4.3905 Mg -4.2806 1.7877 30.1826 Maximum point Maximum 27.3469 point 28.0684 Saddle point 26.2051 Saddle point 29.9558 Saddle point 24.6765 Saddle point 30.7051 Saddle point The summary of the response surface analysis for the coastal area is shown in Table 7.4 and Table 7.5. In CLD6 station for example, the eigenvalues of the eigen 205 vector are λ1 = 0.4646, λ2 = -0.0116 and λ3 = - 5.0618. The signs of the eigenvalues are mixed, thus the stationary point is shown to be a saddle point. This occurred in all the coastal stations in this study. The estimated FFB yield at the station CLD6 was 29.2281 tonnes per hectare per year and it corresponded with 3.9034 kg of the N fertiliser, 0.1187 kg of the P fertiliser and 6.5243 kg of the K fertiliser. The canonical analysis indicated that the predicted response surface was shaped as saddle at station CLD6. The eigenvalue of the N fertiliser, 0.4646 shows that the valley orientation of the saddle point is less curved than the hill orientation with the K concentration eigenvalues of -5.0618. The negative sign of the eigenvalues for P and K factors indicated the directions of downward curvature. The larger eigenvalue of absolute is for the K factor; it means that the K factor is more pronounced and the curvature of the response surface is in the associated direction. The surface is more sensitive to the changes in K, compared to factors N and P. For detailed numerical results of the critical value and predicted value at the stationary points, refer to Table 7.4 and Table 7.5. Table 7.4: The eigenvalues, the predicted FFB yield at the stationary points, and critical values of fertiliser level for CLD1 and CLD2 stations Station/ Eigenvalue Critical Predicted FFB yield at Concluding fertiliser (λ) value the stationary point remarks N 1.1058 0.2328 CLD1 P 1.0041 6.8132 28.5310 Saddle K -0.1793 4.4443 Mg -0.4926 1.2246 N 0.2903 1.9312 CLD2 P -0.3234 3.4270 K -0.7052 0.5436 Mg -1.8375 2.3214 point 30.2809 Saddle point 206 Table 7.5: The eigenvalues, the predicted FFB yield at the stationary points, and critical values of fertiliser level for CLD3, CLD4, CLD5, CLD6 and CLD7 stations Station/ Eigenvalue Critical Predicted FFB yield at Concluding fertiliser (λ) value the stationary point remarks N 1.4988 6.3486 CLD3 P 0.0634 2.9778 29.9374 Saddle K -0.6883 2.0066 Mg -3.1878 1.6929 N 0.5961 2.3739 CLD4 P -0.1260 2.0859 K -0.6599 4.5873 Mg -0.6916 2.1789 N 4.9445 2.8598 CLD5 P 1.2246 3.2396 K -0.8439 6.2467 Mg -3.7785 5.4442 N 0.4646 3.9034 CLD6 P -0.0116 0.1187 K -5.0618 6.5243 N 1.3496 5.3768 CLD7 P -1.0850 0.6474 K -1.1862 4.5479 point 30.9816 Saddle point 25.9715 Saddle point 29.2281 Saddle point 32.5936 Saddle point 7.4.2 Ridge Analysis for Fertiliser Treatments The estimated responses of the FFB yield at certain radii and the fertiliser levels for the inland stations are presented in Table 7.6 to Table 7.8. To illustrate the result of the ridge analysis, we only considered the station, which have saddle points at their stationary points. As mentioned earlier, ridge analysis is used to find the optimum value of FFB yield when the canonical analysis indicated the stationary 207 point is a saddle point. The stations of ILD1 and ILD2 are not discussed in section 7.4.2. The Mg fertiliser is not used in the experiment at stations ILD3 and ILD5. At station ILD3, the estimated FFB yield is 29.7351 tonnes per hectare per year at radius 0.0, and corresponded with 4.7800 kg of N fertiliser, 0.9000 kg of P fertiliser and 4.1000 kg of K fertiliser. When the radius was increased from 0.0 to 0.5, the estimated FFB yield also increased to 30.3281 tonnes per hectare per year. There were also increments of fertiliser inputs to 5.5958 kg of N fertiliser, 1.0560 kg of P fertiliser and 5.7555 kg of K fertiliser. When the radius reached its maximum value, the estimated FFB yield was recorded at 31.1853 tonnes per hectare per year, and it corresponded with 5.6887 kg of N fertiliser, 1.1711 kg of P fertiliser and 7.8547 kg of K fertiliser. For station ILD4 at radius 0.0 the results suggest that given the annual application of 4.000 kg of N fertiliser, 1.8750 kg of P fertiliser, 4.2000 kg of K fertiliser and 1.4400 kg of Mg fertiliser, the average estimated FFB yield is 25.8575 tonnes per hectare. The increased radius, from 0.0 to 1.0, caused the average estimated FFB yield to increase from 25.8575 tonnes per hectare to 27.7434 tonnes per hectare. The N, P and K fertilisers also increased to 4.2611 kg, 2.1026 kg and 6.5259 kg respectively. Whereas, the Mg fertiliser decreased from 1.44 kg to 1.1745 kg. The details of the estimated FFB yield and corresponding fertiliser levels at certain radii for other stations are presented in Tables 7.6 to 7.8. 208 Table 7.6: The estimated FFB yield and fertiliser level at certain radii for stations ILD3 and ILD4 in the inland area Station ILD3 ILD4 Fertiliser Level (kg/palm/year) Estimated FFB yield Radius N P K Mg (tonnes/hectare/year) 0.0 4.7800 0.900 4.1000 - 29.7351 0.1 5.081 0.9358 4.2057 - 29.8744 0.2 5.3312 0.9707 4.4629 - 29.9887 0.3 5.4781 1.0025 4.8707 - 30.0938 0.4 5.5525 1.0305 5.3158 - 30.2047 0.5 5.5958 1.0560 5.7555 - 30.3281 0.6 5.6247 1.0800 6.1862 - 30.4667 0.7 5.6461 1.1035 6.6097 - 30.6212 0.8 5.6629 1.1264 7.0282 - 30.7923 0.9 5.6768 1.1488 7.4428 - 30.9803 1.0 5.6887 1.1711 7.8547 - 31.1853 0.0 4.0000 1.8750 4.200 1.4400 25.8575 0.1 4.2044 1.8790 4.3248 1.4542 25.9351 0.2 4.3186 1.8998 4.5724 1.4448 26.0419 0.3 4.3397 1.9274 4.8514 1.4133 26.1475 0.4 4.3373 1.9539 5.1096 1.3790 26.2814 0.5 4.3284 1.9795 5.3557 1.3446 26.4460 0.6 4.3168 2.0046 5.5953 1.3104 26.6420 0.7 4.3039 2.0294 5.8310 1.2763 26.8696 0.8 4.2901 2.0539 6.0641 1.2423 27.1290 0.9 4.2758 2.0783 6.2956 1.2083 27.4202 1.0 4.2611 2.1026 6.5259 1.1745 27.7434 209 Table 7.7: The estimated FFB yield and fertiliser level at certain radii for stations ILD5 and ILD6 in the inland area Fertiliser Level (kg/palm/year) Estimated FFB yield Station ILD5 ILD6 Radius N P K Mg (tonnes/hectare/year 0.0 1.365 2.275 2.275 - 29.5075 0.1 1.4014 2.1037 2.4119 - 29.6488 0.2 1.4936 1.9752 2.5417 - 29.7814 0.3 1.6307 1.9119 2.6462 - 29.9233 0.4 1.7757 1.8845 2.7298 - 30.0891 0.5 1.9188 1.8712 2.803 - 30.2842 0.6 2.0595 1.8646 2.8706 - 30.5106 0.7 2.1983 1.8615 2.9352 - 30.7691 0.8 2.3358 1.8604 2.9977 - 31.0602 0.9 2.4723 1.8606 3.0589 - 31.3840 1.0 2.6081 1.8619 3.1192 - 31.7407 0.0 4.67 1.875 3.975 2.08 24.4806 0.1 4.9085 1.8581 4.0471 2.1299 24.7152 0.2 5.1234 1.8453 4.1449 2.1918 24.9275 0.3 5.3048 1.8384 4.2665 2.2679 25.1228 0.4 5.4476 1.8388 4.4022 2.3587 25.3072 0.5 5.5539 1.8479 4.539 2.4628 25.4866 0.6 5.6308 1.8661 4.6654 2.5774 25.6659 0.7 5.6859 1.8934 4.7744 2.6996 25.8489 0.8 5.7256 1.9291 4.8636 2.8265 26.0384 0.9 5.7546 1.9716 4.9333 2.9559 26.2365 1 5.7761 2.0197 4.9856 3.0861 26.4447 210 Table 7.8: The estimated FFB yield and fertiliser level at certain radii for station ILD7 Fertiliser Level (kg/palm/year) Station ILD7 Radius N Estimated FFB yield P K Mg (tonnes/hectare/year) 0.0 4.325 2.275 3.4750 1.7400 24.4803 0.1 4.5706 2.3088 3.4393 1.7512 24.7287 0.2 4.8201 2.3354 3.4304 1.7757 24.9778 0.3 5.0655 2.3536 3.4351 1.8127 25.2337 0.4 5.3021 2.3635 3.447 1.8623 25.5011 0.5 5.5273 2.3664 3.4628 1.9225 25.7835 0.6 5.7406 2.3636 3.4806 1.9909 26.0841 0.7 5.9427 2.3564 3.4994 2.065 26.4050 0.8 6.1351 2.3461 3.5186 2.1431 26.7479 0.9 6.3192 2.3333 3.538 2.2241 27.1141 1.0 6.4966 2.3188 3.5574 2.3068 27.5043 The results of the ridge analysis of fertiliser treatments in the coastal areas are presented in Tables 7.9 to 7.12. The estimated FFB yield at radius 0.1 for station CLD1 is 27.4295 tonnes per hectare per year. Therefore it needed 1.8872 kg of N fertiliser, 1.8940 kg of P fertiliser, 2.9434 kg of K fertiliser and 1.7663 kg of Mg fertiliser to achieve this level. The increment of radius from 0.1 to 0.5 has resulted in the increment of the estimated FFB yield to 28.2428 tonnes per hectare per year, and corresponded with 2.2825 kg of N fertiliser, 2.0209 kg of P fertiliser, 3.8199 kg of K fertiliser and 1.6061 kg of Mg fertiliser. The maximum value of the estimated FFB yield was 29.6934 ton per hectare per year, when the radius reached the maximum value of 1.0. The fertiliser levels required were also increased to 2.8791 kg for N fertiliser, 2.0007 kg for P fertiliser, 4.8655 kg for K fertiliser and 1.4579 kg for Mg fertiliser. However, at station CLD2, the small increase in the estimated average of FFB yield from 32.3348 tonnes per hectare per year at radius 0.0 to 32.5424 tonnes per hectare per year, was recorded at the maximum radius of 1.0. The fertiliser level required were also increased, from 1.8200 kg to 2.8791 kg for N fertiliser, P fertiliser increased from 1.8200 kg to 3.0549 kg, K fertiliser increased from 1.3600 kg to 2.0685 kg and Mg fertiliser also recorded an increment from 1.8200 kg to 2.5049 kg. 211 It shows that the needs of N, P, K and Mg fertilisers at CLD2 station are quite comparable. Mg fertiliser was not applied at stations CLD6 and CLD7. Table 7.9: The estimated FFB yield and fertiliser level at certain radii for stations CLD1 and CLD2 in the coastal area Station CLD1 CLD2 Fertiliser Level (kg/palm/year) Estimated FFB yield Radius N P K Mg (tonnes/hectare/year) 0.0 1.8200 1.8200 2.7300 1.8200 27.2633 0.1 1.8872 1.8940 2.9434 1.7663 27.4295 0.2 1.9709 1.9481 3.1626 1.7194 27.6082 0.3 2.0670 1.9850 3.3833 1.6779 27.8022 0.4 2.1718 2.0082 3.6028 1.6405 28.0132 0.5 2.2825 2.0209 3.8199 1.6061 28.2428 0.6 2.3975 2.026 4.0342 1.5739 28.4919 0.7 2.5152 2.0253 4.2457 1.5433 28.7111 0.8 2.6351 2.0202 4.4546 1.514 29.0508 0.9 2.7565 2.0117 4.6611 1.4856 29.3615 1.0 2.8791 2.0007 4.8655 1.4579 29.6934 0.0 1.8200 1.8200 1.3600 1.8200 32.3348 0.1 1.8978 1.9413 1.4383 1.7833 32.3944 0.2 1.9726 2.0687 1.5164 1.7606 32.4438 0.3 2.0458 2.2006 1.5942 1.7575 32.4832 0.4 2.1176 2.3356 1.6716 1.7806 32.5132 0.5 2.1871 2.4711 1.7476 1.8362 32.5345 0.6 2.2531 2.6031 1.8208 1.9263 32.5480 0.7 2.314 2.7284 1.8897 2.0464 32.5547 0.8 2.3696 2.8452 1.9537 2.188 32.5555 0.9 2.4204 2.9536 2.0131 2.3428 32.5512 1.0 2.4672 3.0549 2.0685 2.5049 32.5424 212 Table 7.10: The estimated FFB yield and fertiliser level at certain radii for stations CLD3 and CLD4 in the coastal area Station CLD3 CLD4 Fertiliser Level (kg/palm/year) Estimated FFB yield Radius N P K Mg (tonnes/hectare/year) 0.0 3.6400 1.8200 3.6400 1.6200 28.2359 0.1 4.0036 1.8121 3.6367 1.8200 28.6748 0.2 4.3673 1.8053 3.6425 1.8256 29.0506 0.3 4.7298 1.8003 3.6652 1.8450 29.3638 0.4 5.081 1.8008 3.7274 1.9124 29.6169 0.5 5.3372 1.8192 3.8738 2.1270 29.8253 0.6 5.4454 1.8477 4.0349 2.4012 30.0304 0.7 5.5045 1.8741 4.1720 2.6447 30.2560 0.8 5.5479 1.8986 4.2952 2.8671 30.5079 0.9 5.5844 1.9218 4.4104 3.0766 30.7880 1.0 5.6169 1.9442 4.5206 3.2781 31.0969 0.0 3.6400 1.8200 3.6400 1.8200 30.9592 0.1 3.8391 1.9564 3.6175 1.8868 31.0200 0.2 4.1241 2.0637 3.5584 1.9331 31.0846 0.3 4.4415 2.1512 3.4809 1.9663 31.1577 0.4 4.7692 2.2279 3.3953 1.9927 31.2409 0.5 5.1002 2.2984 3.3058 2.0153 31.3352 0.6 5.4324 2.3654 3.2141 2.0356 31.4408 0.7 5.7648 2.4299 3.1211 2.0544 31.5581 0.8 6.0973 2.4929 3.0272 2.0722 31.6869 0.9 6.4297 2.5547 3.9327 2.0893 31.8276 1.0 6.7619 2.6157 2.8378 2.1059 31.9800 213 Table 7.11: The estimated FFB yield and fertiliser level at certain radii for stations CLD5 and CLD6 in the coastal area Fertiliser Level (kg/palm/year) Station CLD5 CLD6 Estimated FFB yield Radius N P K Mg (tonnes/hectare/year) 0.0 2.7300 4.5500 9.1000 4.5500 26.0478 0.1 2.8881 4.7353 8.9110 4.2429 26.2357 0.2 3.0149 4.7345 8.8843 3.8037 26.4774 0.3 3.1188 4.6571 8.8651 3.3592 26.8602 0.4 3.2131 4.5551 8.8422 2.9228 27.2298 0.5 3.3027 4.4429 8.8168 2.4925 27.7504 0.6 3.3897 4.3256 8.7898 2.0661 28.369 0.7 3.4752 4.2053 8.7616 1.6423 29.0858 0.8 3.5596 4.0831 8.7328 1.2204 29.9012 0.9 3.6433 3.9596 8.7034 0.7998 30.8152 1.0 3.7266 3.8353 8.6736 0.3801 31.8279 0.0 2.7250 1.3650 3.4100 - 32.6262 0.1 2.9364 1.3988 3.2121 - 33.0906 0.2 3.1074 1.4489 2.9718 - 33.5074 0.3 3.2377 1.5115 2.7024 - 33.8947 0.4 3.3357 1.5816 2.4183 - 34.2664 0.5 3.4111 1.6556 2.1284 - 34.6314 0.6 3.4712 1.7317 1.8368 - 34.9953 0.7 3.5207 1.8088 1.5453 - 35.3614 0.8 3.5628 1.8864 1.2545 - 35.7317 0.9 3.5995 1.9642 0.9646 - 36.1077 1.0 3.6321 2.0422 0.6756 - 36.4902 214 Table 7.12: The estimated FFB yield and fertiliser level at certain radii for station CLD7 in the coastal area Station CLD7 7.5 Fertiliser Level (kg/palm/year) Estimated FFB yield Radius N P K Mg (tonnes/hectare/year) 0.0 2.7250 1.8200 3.4100 - 31.4923 0.1 2.9651 1.8948 3.4892 - 31.7818 0.2 3.1851 1.9985 3.5571 - 32.0639 0.3 3.3774 2.1309 3.611 - 32.3452 0.4 3.5395 2.2864 3.6506 - 32.6327 0.5 3.6743 2.4568 3.6781 - 32.9326 0.6 3.7877 2.6356 3.6965 - 33.2493 0.7 3.8852 2.8185 3.7084 - 33.5861 0.8 3.9711 3.0035 3.7158 - 33.9447 0.9 4.0482 3.1891 3.7199 - 34.3267 1.0 4.1189 3.3749 3.7216 - 34.7329 ECONOMIC ANALYSIS In addition to the statistical analysis, an economic analysis should be carried out to determine the point at which the total profit of the oil palm yield is at the highest level (Nelson, 1997). The economic analysis is purposely focused on gaining the optimum level of fertiliser. As discussed earlier, ridge analysis will give several optimum solutions based on the estimated FFB yield and the fertiliser level of the N, P, K and Mg at certain radii. Thus, an economic analysis is required to obtain the optimum profit in oil palm yield modelling. To obtain the maximum profit in oil palm yield production, four types of fertilisers are considered, namely, nitrogen (N), phosphorus (P), potassium (K) and magnesium (Mg). These fertilisers are the most needed in oil palm yield. 215 Let the cost of fertiliser (RM per ton/hec/year) at a certain radius, Ci be given as; Ci = aiNp + biPp + ciKp + diMgp for i = 1, 2, ..,11 (7.3) where a, b, c and d are the weights for N, P, K and Mg fertilisers (measured in kg per palm per year) respectively, derived form the ridge analysis, and Np, Pp, Kp and Mgp are the prices (per tonnes)of fertiliser N, P, and K respectively. Since the FFB yield is measured in tonnes per hectare per year, we also converted the cost and total profit (TP) into RM per hectare per year. The total income per hectare per year at a certain radius, Hi, is then given as Hi = Eri*Yp for i = 1, 2, …, 11 (7.4) where Eri is expected FFB yield at radius i, and Yp is the yield price. Therefore, the total profit TP can be formulated as; TPi = Hi - Ci for i = 1, 2, …, 11 (7.5) Thus, we can determine the optimum fertilisers which can be used to achieve a high TP. 7.5.1 Profit Analysis Based on the fertiliser prices in Januari 2005, the price of the ammonium sulphate (AS) was RM720 per ton, christmas island rock phosphate (CIRP) was RM 440 per ton, murate of potash (MOP) was RM1040 per ton and kieserite (Mg) was RM729 per ton. The average price of FFB yield in Januari 2005 was about RM 288.00 per ton. Assuming that other costs such as management costs is constant. A very simple calculation was conducted to obtain the optimum profit from several radius levels. The total profit for each station is summarised in Table 7.13. This will help the policy makers identify the maximum profit in the oil palm yield. The details of the calculation in finding the total profit are shown in Appendix T for the inland area and Appendix U for the coastal area. 216 The results suggested that, given an annual application of 5.148 kg for N fertiliser, 2.7054 kg for P fertiliser, 3.2196 kg for K fertiliser and 2.8126 kg for Mg fertiliser, palms grown in Bungor series soil (ILD2) are capable of producing an average FFB yield 30.1826 tonnes per hectare and of making a total profit of RM7254.73. The Renggam soils series (ILD3) needed the combination of 4.78 kg of N fertiliser, 0.9 kg of P fertiliser and 4.1000 kg of K fertiliser in order to produce an average FFB yield of 29.7351 tonnes per hectare. The highest total profit for the inland stations is RM8309.58 at station ILD5 (Pohoi series soil). It has been suggested that palms grown in Pohoi series soil are more suitable for producing at the maximum FFB yield with the minimum cost of fertiliser. Given the combination of 2.8791 kg of N fertiliser, 2.0007 kg of P fertiliser, 4.8655 of K fertiliser and 1.4579 kg of Mg fertiliser at Carey series soils (station CLD1), 29.6934 tonnes per hectare of FFB yield could be produced with a total profit of RM7282.87. A combination of the fertilisers suggested at 5.6169 of N fertiliser, 1.9442 of P fertiliser, 4.5206 of K fertiliser and 3.2781 kg of Mg fertiliser, made the average FFB yield of 31.0969 tonnes per year at CLD3 (Briah series soil). As shown in Table 7.11, RM9918.89 appears to be the highest total profit for the coastal area at station CLD6 (Briah series soil). 217 Table 7.13: The fertiliser level, average estimated FFB yield and total profit for the inland and coastal areas Fertiliser Level (kg/palm/year) N P K Mg Estimated FFB yield Total Profit (tonnes/hectare/year) (RM) Inland stations ILD3 4.7800 0.9000 4.1000 * 29.7351 7429.48 ILD4 4.2611 2.1026 6.5259 1.1745 27.7434 7250.29 ILD5 2.6081 1.8619 3.1192 * 31.7407 8309.58 ILD6 5.7761 2.0197 4.9856 3.0861 26.4447 5872.45 ILD7 6.4966 2.3188 3.5574 2.3068 27.5043 6373.06 Coastal stations CLD1 2.8791 2.0007 4.8655 1.4579 29.6934 7282.87 CLD2 1.8200 1.8200 1.3600 1.8200 32.5558 7939.66 CLD3 5.6169 1.9442 4.5206 3.2781 31.0969 7281.32 CLD4 3.6400 1.8200 3.6400 1.8200 30.9592 7723.78 CLD5 3.7266 3.8353 8.6736 0.3801 31.8279 7253.34 CLD6 3.6321 2.0422 0.6756 * 36.4902 9918.89 CLD7 4.1189 3.3749 3.7216 * 34.7329 8838.13 * Note: Mg fertiliser was not used in these treatments. 7 6 5 Fertiliser level 4 (kg/palm/year) 3 2 1 0 ILD1 ILD2 ILD3 ILD4 ILD5 ILD6 ILD7 Inland stations N P K Mg Figure 7.3: The optimum fertiliser level for each station in the inland area 218 After determining the optimum level of fertilisers and the maximum profit for each station, we must perform a comparison of the optimum fertiliser needed for each station. Figure 7.3 depicts the summary of the fertilisers required by oil palms. It is obvious that the predominantly required fertilisers for oil palm are the N and K fertilisers. The recordings of the ILD1 (Bungor series soil), ILD3 (Munchong series soil), ILD6 (Batu Anam series soil) and ILD7 (Durian series soil) stations disclosed a need for the N fertiliser to be higher than the K fertiliser. While at the stations ILD4 and ILD5, where the soil series are Batu Anam and Pohoi respectively, the K fertiliser recorded higher than the N fertiliser. The levels of the N and K fertilisers needed at station ILD2 (Renggam series soil) are almost the same. In general, the need for P and Mg fertilisers in the inland area are less than that for the N and K fertilisers. 9 8 7 6 Fertiliser level 5 (kg/palm/year) 4 3 2 1 0 CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 Coastal stations N P K Mg Figure 7.4: The optimum fertiliser level for each station in the coastal area Figure 7.4 illustrates the need for fertiliser by oil palm in the coastal area. As in the inland area, the N and K fertilisers are the dominant fertiliser required by oil palms. Stations CLD3 (Briah series soil), CLD4 (Sedu series soil), CLD6 (Briah series soil) and CLD7 (Briah series soil) recorded a higher need for N fertiliser than 219 for K fertiliser. The palms grown on these Carey series soil (CLD1 and CLD5) revealed that recorded the K fertiliser was needed more than the N fertiliser. However, the needs for the N and K fertilisers are quite similar in station CLD2 where the soil series is Selangor. The foliar nutrient composition levels and the average estimate for the FFB yield giving the maximum profit are shown in Table 7.14, and displayed graphically in the Figure 7.5. The findings show that the combination of foliar nutrient composition 2.5303% of N, 0.1698% of P, 1.0855% of K, 0.5757% of Ca and 0.3562% of Mg are capable of producing an average FFB yield in the ILD1 station (Bungor series soil) of 30.1826 tonnes per hectare. The results from Table 7.12 suggest that the Renggam series soil (ILD2) produced an estimated FFB yield of 27.3469 tonnes per hectare when the composition of the combination of N, P, K, Ca and Mg nutrient composition are 2.3398, 0.1677, 0.6858, 0.6957 and 0.2936% respectively. Table 7.14: The estimated FFB yield and the foliar nutrient composition level in (%) for the inland area Foliar nutrient composition (%) Station Estimated FFB yield N P K Ca Mg (tonnes/hectare/year) ILD1 30.1826 2.5303 0.1698 1.0855 0.5757 0.3562 ILD2 27.3469 2.3398 0.1677 0.6858 0.6957 0.2936 ILD3 29.7351 2.6841 0.1672 1.2257 0.8615 0.1941 ILD4 27.7434 3.0223 0.1695 1.0470 0.6823 0.6548 ILD5 31.7406 2.9331 0.1693 1.1392 0.7418 0.2681 ILD6 26.4447 2.6869 0.1685 0.7642 0.7668 0.2744 ILD7 27.5043 2.6264 0.1517 0.9626 0.5795 0.5365 Figure 7.5 shows that for all the inland stations, the N concentration is consistently the highest followed by the K concentration. The result is consistent with the findings of the N and K fertiliser cumulative level needed by the oil palm per year. The P concentration recorded the lowest level in the foliar analysis when 220 compared to the other nutrients. The sequence of the foliar nutrient composition in ascending order is the P concentration, Mg concentration, Ca concentration, K concentration and lastly the N concentration (P < Mg < Ca < K < N). 3.5 3 2.5 2 Foliar composition (%) 1.5 1 0.5 0 ILD1 ILD2 ILD3 ILD4 ILD5 ILD6 ILD7 Inland stations N P K Ca Mg Figure 7.5: The foliar nutrient composition levels for each station in the inland area The average estimate of the FFB yield and the foliar nutrient composition for the coastal area is presented in Table 7.15. For station CLD1 (Carey series soil), the average estimated FFB yield gave the maximum profit of 29.6934 tonnes per hectare when the combination of the foliar nutrient composition was 2.6222% of N, 0.162% of P, 0.9306% of K, 0.4821% of Ca and 0.246% of Mg. Sedu soils series (CLD4) estimated the FFB yield to be 30.9592 tonnes per hectare, and the composition of the N, P, K, Ca and Mg concentrations to be 2.6418, 0.1569, 0.8810, 0.4948 and 0.7641%, respectively. Even though the soil series are the same, the foliar nutrient compositions and the average estimated FFB yield are different depending on the local soil and climatic factors (Foster 1995). Station CLD3, CLD6 and CLD7 are all from Briah soil series, but the foliar nutrient compositions are different from each other. The foliar nutrient composition for all the stations in the coastal area is represented graphically in Figure 7.6. The N concentration recorded the highest composition in all the coastal stations. These results are similar to the inland area, where the nutrient concentration varies from one station to the next even though the soil series are the same. 221 Table 7.15: The estimated FFB yield and the foliar nutrient composition levels in (%) for the coastal area Foliar nutrient composition (%) Station Estimated FFB yield N P K Ca Mg (tonnes/hectare/year) CLD1 29.6934 2.6222 0.1620 0.9306 0.4821 0.2460 CLD2 32.5558 2.5565 0.1536 0.8321 0.5036 0.3656 CLD3 31.0969 2.5646 0.1618 0.6196 0.4863 0.4425 CLD4 30.9592 2.6418 0.1569 0.8810 0.4948 0.7641 CLD5 31.8279 2.6689 0.1482 0.9366 0.5857 0.3226 CLD6 36.4902 2.3310 0.1494 0.7438 0.6728 0.4042 CLD7 34.7329 2.5536 0.1495 0.8503 0.7208 0.2641 3 2.5 2 Foliar 1.5 composition (%) 1 0.5 0 CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 Coastal stations N P K Ca Mg Figure 7.6: The foliar nutrient composition levels for each station in the coastal area 222 9 8 7 6 N and K fertiliser 5 (kg/palm/year) 4 3 2 1 0 CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 ILD1 ILD2 ILD3 ILD4 ILD5 ILD6 ILD7 Stations N K Figure 7.7: Comparison between the N and K fertiliser level needs by oil palm for the coastal and inland areas Illustrations of the different levels required of N and K fertilisers, for all stations in the inland and coastal areas, are depicted in Figure 7.7. The Carey series soil (CLD1 and CLD5) and Batu Anam series soil (ILD4) needed more K fertiliser compared to N fertiliser. The rest showed that the N fertiliser is needed more for oil palm. The absence of P and Mg fertiliser means that the same level of the N and the K fertilisers are needed (shown in station ILD2, where the soil series is Renggam). 7.6 CONCLUSION The results discussed on the previous section clearly indicated that the R2 value for fertiliser treatments is comparable to the study done by Ahmad Tarmizi et al. (1986; 1991) and Green (1976). The canonical analysis for the fertiliser levels identified the stations ILD1 and ILD2 to be at the maximum point. Other inland stations are marked at the saddle points. The ridge analysis disclosed of the optimal level of the estimated FFB yield. The inland areas are expected to produce around 26 to 31 tonnes of FFB yield per hectare per year. 223 Meanwhile, the estimated FFB yield is around 29 to 36 tonnes per hectare per year and can be produced in coastal areas, which are higher than inland areas. These findings are similar to the research done by Foster (1995) and Ahmad Tarmizi et al. (1999). In terms of profit, the total profit can be obtained with the optimal level of fertilisers. The foliar combination found in this study is in the range of the optimal level suggested by Foster and Chang (1977). The fertiliser level needed by the oil palm is different among the experimental stations, even though they are grown in the same series soil. The nutrients in the soil and the climate also appeared to be other factors which affected the production of FFB yield (Foster et al., 1987; Foster 1995; Soon and Hong 2001). 224 CHAPTER 8 SUMMARY AND CONCLUSION 8.1 INTRODUCTION This chapter summarises the study of the modelling of oil palm yield and then discusses some important results and findings. We draw some conclusions from these observations, and examine some of possible directions of future research into the modelling of oil palm yield. This chapter begins by summarising the results, and discussing the findings from which some conclusions were developed. 8.2 RESULTS AND DISCUSSION Discussion on the results starts with the initial exploratory study in section 8.2.1. In this study, the nonlinear growth model and the MLR were discussed. The application of the neural networks and how this approach overcomes the MLR weakness are touched in the section 8.2.1. A comparative study on neural networks and the multiple linear regression approach is highlighted before the conclusion is made in section 8.2.2. In the section 8.2.3, a discussion will be done on the response surface analysis results. 225 8.2.1 Initial Exploratory Study This study is divided into four main types of modelling approach. This is an exploratory study which required the author to look into current practices of yield growth modelling in plants. In the first stage, this study explored the use of the nonlinear growth model, followed by the use of multiple linear regression and robust M-regression models. The multiple linear regression model is commonly used in the modelling of oil palm yield, but due to the lack of knowledge among practitioners it was found that nonlinear growth modelling was not widely explored. Inadequacy in the ability to model results in poor management and decisions which hinder the improvement of oil palm yield. Modelling using the nonlinear growth model demonstrated that nonlinear models were suitable to model the yield growth data. The comparative studies among the models were performed based on the MSE, RMSE, MAE, MAPE and correlation results and were presented in Table 8.1. The findings showed that the Logistic, von Bertalanffy, Richard’s and Stannard growth models produced minimum values of MAPE. However, the Von Bertalanffy, Richard’s and Stannard growth models were found to be statistically insignificant to fit the oil palm yield growth data because the zero value is included in the asymptotic confidence interval of the parameter’s estimate. Based on the MAPE values, the Logistic model was selected as the best model, followed by the Gompertz, Morgan-Mercer-Flodin, Chapman-Richard (with initial stage) and Log-logistic growth models respectively. Literature reviews on oil palm modelling using various approaches provided us with guidelines on the direction of the study. Reviews on techniques used for modelling oil palm yield showed that multiple linear regression was the most popular technique used. It is a model used in oil palm research in order to find the relationship between factors influencing the oil palm yield. 226 Table 8.1: The adequacy of fit measurements used for the nonlinear growth models Model MSE RMSE MAE MAPE Correlation Logistic 2.9552 1.7191 1.2241 0.0347 0.9714 Gompertz 3.1528 1.7756 1.3525 0.0417 0.9694 Von Bertalanffy 2.9366 1.7136 1.2175 0.0346 0.9715 Negative exponential 4.0574 2.0143 1.6125 0.0512 0.9617 Monomolecular 3.7202 1.9287 1.5316 0.0501 0.9638 Log-logistic 4.6442 2.1550 1.7164 0.0598 0.9552 Richard’s 2.9365 1.7136 1.2176 0.0345 0.9715 Weibull 3.7202 1.9287 1.5316 0.0501 0.9638 Morgan-Mercer-Flodin 3.2045 1.7901 1.3696 0.0411 0.9689 Chapman-Richard 6.1903 2.4880 2.2749 0.0696 0.9630 Chapman-Richard* 3.4127 1.8473 1.4535 0.0478 0.9670 Stannard 2.9366 1.7136 1.2175 0.0345 0.9715 * with initial stage Due to the nature of oil palm yield data, which is highly dependent on many factors such as humidity, chemical content and soil type, we explored causal type models such as the MLR model. A comparison between the MLR and RMR models was also performed. The purpose was to explore the possibility of improving the model with an improved the model’s accuracy. The relationship between oil palm yield and foliar nutrient composition was investigated using MLR and RMR. The modelling began by selecting FFB yield as the dependent variable, and the concentrations of N, P, K, Ca and Mg as the independent variables. Nutrient balance ratio, deficiency of K, deficiency of Mg and critical level of phosphorus were later added to the model as independent variables. The error modelling results for the MLR and MLR‡ models in the inland area were recorded from 12.42 percent to 18.31 percent, and from 14.42 percent to 18.19 227 percent respectively. These are shown in Table 8.2. The R2 values also displayed a similarity between the two models. The lowest error modelling result was 8.13 percent recorded at station CLD2; it is followed by station CLD3 with 8.39 percent of error. The modelling accuracy of the MLR model ranged from 81.69 to 87.58 percent, and from 81.81 to 87.58 percent for the MLR‡ model recorded in the inland area. In the coastal area (Table 8.3), the accuracy of the MLR model ranged from 74.38 to 91.87 percent. The MLR‡ model’s accuracy ranged from 75.01 to 91.92 percent. As the MAPE values for the inland and coastal areas were approximately equal, the conclusion can be made that the performance of both approaches was also similar. Table 8.2: The RMSE, MAPE and R2 values for the MLR and RMR modelling for the inland and coastal areas RMSE Station MLR MLR‡ R2 MAPE RMR MLR MLR‡ RMR MLR MLR‡ RMR Inland area ILD1 4.6326 4.6831 5.0082 0.1623 0.1634 0.1579 0.392 0.379 0.571 ILD2 4.1101 4.0536 4.2989 0.1483 0.1457 0.1555 0.422 0.433 0.598 ILD3 4.2788 5.4819 4.2990 0.1404 0.1809 0.1403 0.404 0.478 0.381 ILD4 4.2214 4.1421 5.1001 0.1506 0.1455 0.1765 0.185 0.215 0.199 ILD5 3.7262 3.4895 3.7395 0.1497 0.1376 0.1483 0.400 0.474 0.323 ILD6 3.7532 3.7331 5.2820 0.1242 0.1242 0.1778 0.317 0.325 0.313 ILD7 4.8492 4.8107 4.8754 0.1798 0.1784 0.1745 0.118 0.132 0.127 ILDT 4.9822 4.9802 5.3922 0.1831 0.1819 0.1901 0.148 0.168 0.243 ‡ Independent variables: N, P, K, Ca, Mg, nutrient balance ratio, critical leaf phosphorus, K deficiency, Mg deficiency and TLB. The presence of outlier observations will affect the credibility of the model performance. The Q-Q plot showed the presence of outlier observations in the data sets. For this reason robust regression was introduced in this study, specifically robust M regression. The R2 values showed that six stations recorded an improvement, five stations recorded no change and five stations recorded a decrease in value. The accuracy of the RMR model was from 80.99 to 85.17 percent, 228 recorded in the inland area. Mean while, in the coastal area, the accuracy of modelling was between 69.36 to 91.62 percent. The highest R2 value was 0.687, recorded in the station CLD3; this meant that only 68.7 percent of the variance could be explained by the independent variables, while 31.3 percent of the variance was contributed by unexplained factors. The lowest of the R2 values were due to the lack of information on the other factors that contributed to the oil palm yield production, such as rainfall, soil moisture and other factors. These findings resulted in a modelling accuracy that is still fairly low and that requires improvement. Table 8.3: The RMSE, MAPE and the R2 values for the MLR and RMR models for the coastal area RMSE Station MLR MLR‡ R2 MAPE RMR MLR MLR‡ RMR MLR MLR‡ RMR 0.118 Coastal CLD1 4.9442 4.9085 5.3343 0.1766 0.1750 0.1625 0.380 0.388 CLD2 2.8913 2.8736 3.0848 0.0813 0.0808 0.0843 0.171 0.181 0.307 CLD3 2.7461 2.7944 2.7482 0.0839 0.0854 0.0838 0.687 0.676 0.516 CLD4 6.3415 6.3909 8.9948 0.2562 0.2499 0.3064 0.050 0.035 0.225 CLD5 4.8701 4.8758 4.8765 0.1729 0.1737 0.1725 0.111 0.108 0.115 CLD6 4.1983 4.2217 4.2022 0.1394 0.1399 0.1389 0.043 0.031 0.049 CLD7 4.1895 3.9527 4.8827 0.1301 0.1206 0.1505 0.232 0.315 0.151 CLDT 5.2337 5.0946 5.8121 0.1804 0.1746 0.2000 0.044 0.094 0.140 ‡ Independent variables: N, P, K, Ca, Mg, nutrient balance ratio, critical leaf phosphorus, K deficiency, Mg deficiency and TLB. 8.2.2 Modelling using neural network Artificial neural networks are computing systems containing many interconnected nonlinear neurons, capable of extracting linear and nonlinear regularity in a given data set. Artificial neural networks, like humans, learn by example. Neural network are potentially useful for studying the complex 229 relationship between the inputs and outputs of a system. Literature reviews on the application of neural network for modelling purposes showed that they gave more reliable results when compared to the statistical approach. Neural network are also capable of learning how to do tasks based on the data given for training on initial experience. They are data driven, rather than model driven as in the statistical approaches. The neural network model is still new in oil palm research compared to research work in other short term crops such as barley, corn and potatoes. We used the neural network model to further refine the modelling oil palm yield. In the first stage, we used the same input and output variables used as in the MLR and RMR models. The neural network model was run using different combinations of the activation function. To ensure that the network was not overfit, the maximum number of hidden nodes was calculated. The influence of the activation function combination on the neural network’s performance was examined using the F statistics (Table 8.4). The test was evaluated using the MSE values for the training, validation and testing phases, the average of the MSE values and the correlation coefficient. In the training phase, nine out of sixteen stations were statistically significant. This meant that neural network was able to produce different performances with different combinations of the activation function(s) in the training phase. All stations showed that the F tests were statistically significant in both the validation and testing phases. For the average MSE, fourteen out of sixteen stations showed the F tests as statistically significant at the 0.001 and 0.05 levels at their degrees of freedom, respectively. The F test used for the correlation was also found to be significant at eleven stations. Generally, different combinations of activation functions, will result in different performances by the neural network. 230 Table 8.4: The F values of the analysis of variance for different activation functions, for the inland and coastal areas Station The F-values Training Validation Testing Average Correlation df Inland Stations ILD1 3.368* 17.997* 12.055* 10.729* 3.062** (5, 198) ILD2 7.850* 15.516* 14.949* 10.091* 1.601 (5, 264 ) ILD3 2.736** 12.899* 1.924 7.951* 4.431* (5, 240) ILD4 2.291** 15.055* 10.452* 13.700* 3.058** (5, 265) ILD5 1.859 2.306** 42.523* 16.715* 2.519** (5, 168) ILD6 3.132** 22.047* 4.755* 0.927 4.606* (5, 132) ILD7 1.766 13.455* 7.028* 5.812* 0.916 (5, 264) ILDT 0.853 11.736* 12.742* 3.732* 0.613 (5, 264) Coastal Stations CLD1 0.847 11.091* 18.495* 15.103* 6.454* (5, 180) CLD2 3.664* 7.724* 9.166* 10.197* 2.403* (5, 192) CLD3 3.265** 3.762* 1.918 2.905** 4.017* (5, 54) CLD4 1.295 12.413* 6.006* 8.957* 5.272* (5, 222) CLD5 1.524 10.218* 7.112* 1.615 2.139 (5, 180) CLD6 3.232* 6.523* 10.145* 7.518* 6.146* (5, 264) CLD7 1.366 5.092* 3.354* 2.865** 5.911* (5, 264) CLDT 2.794** 8.083* 35.022* 13.037* 2.047 (5, 264) Note: * significant at the 1% level; ** significant at the 5% level The correlation was used to measure the proximity between the target value and the predicted values, while the MAPE values were used to measure the error. The correlation and the MAPE values are presented in Table 8.5. The highest correlation value in the inland area was recorded as 0.8361 for the ILDT, followed by 0.8114 at the ILD5 station. Meanwhile, the minimum MAPE value of 0.0560 was also recorded for the ILDT, followed by station ILD6 with a MAPE value of 0.0719. For the coastal area, the highest value of r was 0.8703 and was recorded at ILD3 station; the second highest was recorded at station ILD4. The MAPE values 231 recorded ranged from 0.0638 to 0.1456. At this stage, we concluded that the neural network model is able to estimate the FFB yield. Table 8.5: The MAPE values and the correlation of the neural network model for the inland and coastal areas Station ILD1 ILD2 ILD3 ILD4 ILD5 ILD6 ILD7 ILDT MAPE 0.1159 0.1162 0.1092 0.1276 0.0944 0.0719 0.1580 0.0560 r 0.7984 0.7840 0.7853 0.5988 0.8114 0.7900 0.4961 0.8361 Station CLD1 CLD2 CLD3 CLD4 CLD5 CLD6 CLD7 CLDT MAPE 0.1220 0.0638 0.0657 0.1118 0.1431 0.1232 0.1003 0.1456 r 0.7404 0.6425 0.8703 0.8313 0.5397 0.4560 0.7054 0.5489 In the second stage, we examined the effect of an increased number of input nodes on the neural network’s performance. The fertiliser trials and foliar nutrient composition information were used as input nodes to estimate the FFB yield. The results showed that the neural network’s performance was comparable to its previous performance, so that the fertiliser trials information as inputs did not significantly improve the neural network performance. Experimental design was conducted to determine the effects of the neural network parameters (such as learning rate, momentum terms, number of hidden nodes and number of runs) to the neural network’s performance. By using the analysis of variance test, it was found that the number of runs and the momentum term did not influence neural network performance in the first experiment (Table 8.6). In the second experiment, the learning rate did not exert a statistically influence on the neural network’s performance. However, the number of runs and the hidden nodes influenced the neural network’s performance significantly. In both experiments, the influence of the number of hidden nodes on the neural network’s performance was statistically significant at the 0.05 level. 232 Table 8.6: The F values of analysis of variance for Experiment 1, 2 and 3 Experiment 1 Experiment 2 Parameter Fstatistic pvalue df HN 8.7759 0.0000 (7, 1912) NR 1.6950 0.1330 (5, 1914) MT 1.3300 0.2630 (3, 1916) HN 8.0480 0.0000 (6, 2932) NR 2.8840 0.0080 (6, 2932) LR 1.6090 0.1540 (5, 2933) P-O 18.481 0.000 (5, 179) M-O 3.988 0.002 (4, 179) P-O 12.171 0.000 (5, 179) M-O 3.570 0.004 (4, 179 Experiment 3 Training data Test data The third experiment was conducted to investigate the effects of percentageoutliers and magnitude-outliers on neural network performance in training and test data. It is found that modelling accuracy in training and test data decreased as the percentage-outliers and magnitude-outliers increased (Table 8.6). The Ftests and t-tests were performed and proved that both factors statistically significant influenced on the neural network modelling. A comparative study was conducted to evaluate the model performance between the three models proposed in this study. The MAPE values and the correlation values are presented in Table 8.7. The MAPE values of the NN models were lower compared to the MLR and RMR models in both the inland and coastal areas. This showed that the NN model can reduce the error in modelling and improve prediction accuracy. The correlation values for the neural network’s model were also higher than those for the MLR and RMR models. 233 Table 8.7: The comparison of the MAPE and correlation values between the MLR, RMR and NN models, for the inland and coastal areas MAPE MLR RMR Correlation NN MLR RMR NN Inland Stations ILD1 0.1623 0.1579 0.1159 0.6260 0.7554 0.7984 ILD2 0.1483 0.1555 0.1162 0.6490 0.7732 0.7840 ILD3 0.1404 0.1403 0.1092 0.6360 0.6169 0.7853 ILD4 0.1506 0.1765 0.1276 0.4300 0.4457 0.5988 ILD5 0.1497 0.1483 0.0944 0.6330 0.5681 0.8114 ILD6 0.1242 0.1778 0.0719 0.5630 0.5593 0.7900 ILD7 0.1798 0.1745 0.1580 0.3430 0.3568 0.4961 ILDT 0.1831 0.1901 0.0560 0.3840 0.4932 0.8361 Coastal Stations CLD1 0.1766 0.1625 0.1220 0.6160 0.3438 0.7404 CLD2 0.0813 0.0843 0.0638 0.4130 0.5538 0.6425 CLD3 0.0839 0.0838 0.0657 0.8290 0.7186 0.8703 CLD4 0.2562 0.3064 0.1118 0.2240 0.4748 0.8313 CLD5 0.1729 0.1725 0.1431 0.3320 0.3397 0.5397 CLD6 0.1394 0.1389 0.1232 0.2080 0.2231 0.4560 CLD7 0.1301 0.1505 0.1003 0.4810 0.3883 0.7054 CLDT 0.1804 0.2000 0.1456 0.2100 0.3742 0.5489 Finally, the accuracy of fit for these three models was compared. As shown in Table 8.8 for inland area and Table 8.9 for coastal area. The changes in value are all positive. This meant that the NN model was able to improve the modelling accuracy when compared to both statistical approaches. The accuracy of fit when using the neural network model ranged from 84.2 to 94.4 percent accuracy for the inland area, and from 85.44 to 93.62 percent accuracy for the coastal area. The lowest change in accuracy from the MLR model to the NN model recorded in the inland area was 2.6579 percent at station ILD7. 234 The highest change was recorded at 15.5588 percent for the ILDT. The lowest change in accuracy from the RMR model to the NN model was 1.9988 percent, recorded at station ILD7, and the highest change was 16.5576 percent, recorded for the ILDT. In the coastal area, station CLD4 recorded the highest accuracy changes from the MLR model to the NN model and from the RMR model to the NN model at 19.4138 percent and 28.0565 percent respectively. The results showed that the neural network model performs better in modelling than the regression model. The comparative study showed that the neural network model provided more reliable results when compared to both the MLR and RMR models. Hence, the neural network performance is robust to the impact of outlier observations when compared to with linear regression model. Table 8.8: The accuracy of the MLR, RMR and NN models, and the accuracy changes for the inland area Accuracy change (%) MLR RMR NN From MLR to NN From RMR to NN Inland Area ILD1 83.7700 84.2100 88.4100 5.5390 4.9875 ILD2 85.1700 84.4500 88.3800 3.7689 4.6536 ILD3 85.9600 85.9700 89.0800 3.6296 3.6175 ILD4 84.9400 82.3500 87.2400 2.7078 5.9381 ILD5 85.0300 85.1700 90.5600 6.5036 6.3285 ILD6 87.5800 82.2200 92.8100 5.9717 12.8801 ILD7 82.0200 82.5500 84.2000 2.6579 1.9988 ILDT 81.6900 80.9900 94.4000 15.5588 16.5576 235 Table 8.9: The accuracy of the MLR, RMR and NN models, and the accuracy changes, for the coastal area Accuracy change (%) MLR RMR NN From MLR to NN From RMR to NN Accuracy change (%) CLD1 82.3400 83.7500 87.8000 6.6310 4.8358 CLD2 91.8700 91.5700 93.6200 1.9049 2.2387 CLD3 91.6100 91.6200 93.4300 1.9867 1.9756 CLD4 74.3800 69.3600 88.8200 19.4138 28.0565 CLD5 82.7100 82.7500 85.6900 3.6030 3.5529 CLD6 86.0600 86.1100 87.6800 1.8824 1.8232 CLD7 86.9900 84.9500 89.9700 3.4257 5.9094 CLDT 81.9600 80.0000 85.4400 4.2460 6.8000 8.2.3 Modelling using response surface analysis Conventional analyses of variance indicated particular application rates that produce larger or smaller yields than other rates, but did not estimate an optimum application rate. In this response surface analysis was attempted to obtain the optimum application of fertiliser rate in order to generate maximum oil palm yield. The use of ridge analysis was proposed in this study when the stationary point was a saddle. The fertiliser combinations, the average estimated FFB yield and the estimated total profit are recorded in Table 8.10 and Table 8.11. In the inland area, the Renggam (ILD2) and Durian (ILD7) soil series recorded the highest requirements for the N fertiliser at the levels of 6.4 kg/palm and 6.4966 kg/palm respectively. The requirements for the N fertiliser for the inland and coastal areas were different. In the inland area, the N fertiliser needed ranged from 2.6081 kg/palm to 6.4966 kg/palm, whereas in the coastal area it ranged from 1.82 kg/palm to 5.6169 kg/palm. Thus, the amount of the N fertiliser required in the inland area was greater than those in the coastal area. The requirements for the P and Mg fertiliser were quite similar 236 for both areas. This study found that oil palm trees need more of the N and K fertilisers compared to other types of fertiliser. The total profit calculated for each fertiliser combination by considering only the fertiliser’s cost and assuming that other cost factors were constant. The highest total profit for the inland area was recorded at station ILD5, which produced 31.7407 tonnes/hectare/year of FFB yield with a total profit of RM8309.58. The total profit recorded in the coastal area ranged from RM 7253.34 /hectare/year to RM 9918.89 /hectare/year; these figures are higher than those recorded in the inland areas. The fertiliser combinations were varied from one station to another, even those stations with the same soil series. Table 8.10: The fertiliser level, average estimated FFB yield and total profit for the inland area Fertiliser Level (kg/palm/year) Station/Soil type N P K Mg Estimated FFB yield Total (tonnes/hectare Profit /year) (RM) Inland Stations ILD1/Bungor 5.1480 2.7054 3.2195 2.8126 30.1826 7306.09 ILD2/Renggam 6.4000 ** 6.3864 * 27.3469 6357.17 ILD3/Munchong 4.7800 0.9000 4.1000 * 29.7351 7429.48 ILD4/Batu Anam 4.2611 2.1026 6.5259 1.1745 27.7434 7250.29 ILD5/Pohoi 2.6081 1.8619 3.1192 * 31.7407 8309.58 ILD6/Batu Anam 5.7761 2.0197 4.9856 3.0861 26.4447 5872.45 ILD7/Durian 6.4966 2.3188 3.5574 2.3068 27.5043 6373.06 Note: * Mg fertiliser was not applied in the trials; ** P fertiliser was not applied in the trials 237 Table 8.11: The fertiliser level, average estimated FFB yield and total profit for the coastal area Fertiliser Level (kg/palm/year) Station/Soil type Estimated FFB yield Total (tonnes/hectare Profit N P K Mg /year) (RM) CLD1/Carey 2.8791 2.0007 4.8655 1.4579 29.6934 7282.87 CLD2/Selangor 1.8200 1.8200 1.3600 1.8200 32.5558 7939.66 CLD3/Briah 5.6169 1.9442 4.5206 3.2781 31.0969 7281.32 CLD4/Sedu 3.6400 1.8200 3.6400 1.8200 30.9592 7723.78 CLD5/Carey 3.7266 3.8353 8.6736 0.3801 31.8279 7253.34 CLD6/Briah 3.6321 2.0422 0.6756 * 36.4902 9918.89 CLD7/Briah 4.1189 3.3749 3.7216 * 34.7329 8838.13 Note: * Mg fertiliser was not applied in the trials. The foliar nutrient composition levels that corresponded with the average estimated FFB yield are presented in Table 8.12. The results suggested that given an annual application of 5.1480 kg of N fertiliser, 2.7054 kg of P fertiliser, 3.2195 kg of K fertiliser and 2.8126 kg of Mg fertiliser, palms grown on the Bungor soils at station ILD1 or example were capable of producing an average crop in the region of 30.1826 tonnes per hectare. This combination of fertilisers had the estimated foliar nutrient composition of 2.5303% N, 0.1698% P, 1.0855% K, 0.5757% Ca and 0.3562% Mg. The other results are in Table 8.12. 238 Table 8.12: The average estimated FFB yield and the foliar nutrient composition levels for the inland and coastal areas Foliar nutrient composition (%) Estimated FFB yield Station/Soil type (tonnes/hectare/year) N P K Ca Mg Inland Stations ILD1/Bungor 30.1826 2.5303 0.1698 1.0855 0.5757 0.3562 ILD2/Renggam 27.3469 2.3398 0.1677 0.6858 0.6957 0.2936 ILD3/Munchong 29.7351 2.6841 0.1672 1.2257 0.8615 0.1941 ILD4/Batu Anam 27.7434 3.0223 0.1695 1.0470 0.6823 0.6548 ILD5/Pohoi 31.7406 2.9331 0.1693 1.1392 0.7418 0.2681 ILD6/Batu Anam 26.4447 2.6869 0.1685 0.7642 0.7668 0.2744 ILD7/Durian 27.5043 2.6264 0.1517 0.9626 0.5795 0.5365 Coastal Stations CLD1/Carey 29.6934 2.6222 0.1620 0.9306 0.4821 0.2460 CLD2/Selangor 32.5558 2.5565 0.1536 0.8321 0.5036 0.3656 CLD3/Briah 31.0969 2.5646 0.1618 0.6196 0.4863 0.4425 CLD4/Sedu 30.9592 2.6418 0.1569 0.8810 0.4948 0.7641 CLD5/Carey 31.8279 2.6689 0.1482 0.9366 0.5857 0.3226 CLD6/Briah 36.4902 2.3310 0.1494 0.7438 0.6728 0.4042 CLD7/Briah 34.7329 2.5536 0.1495 0.8503 0.7208 0.2641 8.3 CONCLUSION This study has introduced several statistical approaches in modelling oil palm yield. These models provide important information which can be used in the oil palm industry. The nonlinear growth model has been applied to the oil palm yield growth data and this method has never been implemented to the oil palm data before. The nonlinear growth model is found to be a suitable approach in estimating the oil palm yield at any stages of ages. 239 When the nutrient balance ratio was applied to the oil palm yield model, there was not a significant difference between using nutrient balance ratio and principle nutrient component. The robust M regression was also suggested and the accuracy of modelling was not indicates much difference to the multiple linear regression approach. On the other hand, the accuracy of modelling is still low with the multiple linear regression, due to using the linear model. The neural networks model which also has never been used in modelling oil palm yield was introduced in this study. We found that the accuracy of the modelling using neural network is much better than the multiple linear regression approach. The neural networks model proved to be an efficient and reliable model. The usage of the fertilisers differs from the inland and coastal areas. The palms grown in the coastal area need less the N, P and K fertiliser compared to the inland area. Thus, oil palm yield produced in the coastal areas will produce more FFB yield so that this area will generates more income to the planters. 8.4 AREAS FOR FURTHER RESEARCH For further research, a comprehensive model can be built and run it in operational mode. There are many other factors that affect the oil palm yield, not only fertiliser and foliar nutrient composition. One can conduct an experiment to gather the data. The development of a future model must consider other factors that may influence the oil palm yield. The factors that significantly impact the oil palm yield are soil factors (including soil pH, nutrient content, soil moisture, clay content and land slope), climate factors (including rainfall, temperature, sunshine, humidity and water level), management (including the type of fertiliser used, fertiliser dosage, pruning and labour cost), planting density (including leaf area index and planting practice) and general factors (including palm age, species, inflorescence rate, abortion rate and oil palm genetics). The factors mentioned above are summarised in Figure 8.1. 240 The investigation and identification of the effects of each factor to the oil palm yield performed by using path diagrams or causality effect models. It is necessary to identify the individual effects before integrating all the factors into a model. Too many factors will make the model very complex, and the model’s development will be difficult to interpret. If necessary, the number of factors should be reduced to several component variables, while the valuable and required information should be retained. The principle component analysis is developed purposely to reduce the number of factors into several components that are not correlated to each other. This analysis may useful to reducing the dimensions of the oil palm yield’s complex system. Other techniques that can be explored include the use of neural network to select the best combination of the variables that influence oil palm yield, based on a sensitivity analysis. It is emphasised that even if all the common site factors which normally influence fertiliser requirements are taken into account, some unusual factors may still unpredictably alter the complex system of nutrients chain supply. However, the relevant data is unavailable, which could prove be a problem. 241 Soil factors pH, Soil type, Nutrient content, Soil moisture Climate Planting density Rainfall, Temperature, LAI, Planting practice Humidity, Sunshine. Oil Palm Yield Management Type of fertileser dosage, General factors Pruning, Labour cost, Palm age, Species, Fertiliser application, Abortion rate, Genetics Fertiliser Nitrogen, Phosphate, Potassium, Calcium Magnesium, etc Figure 8.1: The factors which may influence oil palm yield. 242 REFERENCES Adam, J. B. (1999). Predicting pickle harvest using a parametric feedforward neural network, Journal of Applied Science, 26(2): 165-176. Ahmad Tarmizi Mohammed and Wahid Omar (2002). Pembajaan sawit yang berkesan. Prosiding Persidangan Kebangsaan Pekebun Kecil Sawit 2002: Strategi Ke Arah Pengukuhan dan Hala Tuju Sektor Pekebun Kecil Sawit. Ahmad Tarmizi Mohammed, Foster, H. L. Zin Zawawi Zakaria and Chow C. S. (1986). Statistical and economic analysis of oil palm fertilizer trials in Peninsular Malaysia between 1970-1981. PORIM Occasional Paper. No. 22. Ahmad Tarmizi Mohammed, Hamdan Abu Bakar., Mohd Tayeb Dolmat and Chan K. W. (1999). Development and validation of PORIM fertilizer recommendation system in Malaysian oil palm cultivation. Proceedings of the 1999 PORIM International Palm Oil Congress (Agriculture). 203-217. Ahmad Tarmizi Mohammed, Zin Zawawi Zakaria, Mohd Tayeb Dolmat and Ariffin Darus (2004). Oil palm fertilizer programme: A proposal for higher yield. Presented at Mesyuarat Plan Tindakan MPOB dan RISDA, February, 10 2004, at Kluang. Ahmad Tarmizi Mohammed, Zin Zawawi Zakaria, Mohd Tayeb Dolmat and Ariffin Darus (2004). Oil palm fertilizer programme: A proposal for higher yield. Presented in Mesyuarat Plan Tindakan MPOB dan RISDA, at Prime City, Kluang. Ahmad Tarmizi Mohmmed, Zin Zawawi Zakaria, Mohd Tayeb Dolmat, Foster, H. L., Hamdan Abu Bakar and Khalid Haron (1991). Relative efficiency of urea to sulphate of ammonia in oil palm: Yield response and environmental factors. Proceedings of the 1991 PORIM International Palm Oil ConferenceAgriculture. 340-348. Alder, D. (1980). Forest volume estimation and yield prediction. Yield Prediction, vol. 2, FAO Forestry Paper 22/2. 243 Amer, F. A. and Williams, W. T. (1957). Leaf area growth in Pelargonium Zonale, Ann. Biot. 21, 339. Amstrong, J. S., Brodie, R. J. and McIntyre, S. H. (1987). Forecasting methods for marketing-review of empirical research. International Journal of Forecasting, 3: 355-376. Anderson, V. L. and McLean, R. A. (1974). Design of Experiments: A Realistic Approach. New York: Marcel Dekker, Inc.. Andrew, D. F. (1974). A robust method for multiple linear regression. Technometrics, 16: 523-551. Angstenberger, J. (1996). Prediction of the S and P 500 Index with Neural Networks, 43-152, Neural Networks and their Applications, edited by J. G. Taylor, John Wiley and Sons, Inc. Azme Khamis and Mokhtar Abdullah (2004). On robust environmental quality indices. Pertanika Journal of Science and Technology, 12(1): 1-10. Azme Khamis and Zuhaimy Ismail (2003). Perbandingan di antara regresi berganda dan regresi komponen utama dalam menganggar harga minyak sawit mentah. Prosiding Seminar Kebangsaan Sains Matematik Ke XI, 22-24 Disember 2003. Azme Khamis and Zuhaimy Ismail. (2004). Comparative study on nonlinear growth curve to tobacco leaf growth data. Journal of Agronomy, 3(2): 147-153. Azme Khamis, Zuhaimy Ismail and Ani Shabri (2003). Pemodelan harga minyak sayuran menggunakan analisis regresi linear berganda. Matematika, 19(1): 59-70. Bansal, A, Kauffman, R. and Weitz, R. (1993). Comparing the modeling performance of regression and neural networks as data quality varies: A business value approach. Journal of Management Information System, 10: 1132. Barnett, V. and Lewis, T. (1995). Outliers in Statistical Data. England: John Wiley & Sons, Bass, F. M. (1960). A new product growth model for consumer durables. Management Science, 15: 215-227. Bates, D. M. and Watts, D. V. (1988). Nonlinear Regression Analysis and its Applications, New York: John Wiley. 244 Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988). The New S Language. Wadsworth, Pacific Grove, CA. Belanger, G., Walsh, J. R., Richards, J. E., M. P. H. and Ziadi, N. (2000). Comparison of three statistical models describing potato yield response to nitrogen fertilizer. Agronomy Journal. 92: 902-908. Bewley, R. and Fiebig, D. 1988. Flexible logistic growth model with application in telecommunications. International Journal of Forecasting. 4: 177-192. Birkes, D and Dodge, Y. (1993). Alternative Methods of Regression. New York: John Wiley and Sons, Inc. Bishop, C. M. (1995). Neural Networks for Pattern Recognition, Oxford University Press. Boussabaine, A. H. and Kaka, A. P. (1998). A neural networks approach for cost flow forecasting. Construction Managemant and Economics. 16: 471-479. Box, G. E. P. and Draper, N. R. (1987). Empirical model building and response surfaces, New York: John Wiley & Sons. Causton, D. R. and Venus, J. C. (1981). The Biometry of Plant Growth, London: Edward Arnold, Chaddha, R. L. and Chitgopekar, S. S. (1971). A generalization of the logistic curve and long range forecast (1966-1981) of residence telephones. The Bell Journal of Economics and Management Science, 2: 542-560. Chan, K. W, Wahid, M. B., Ngan, M. A. and Basiron, Y. (2003). Climate change and its effects on Yield of Oil Palm. Proceedings of International Palm Oil Congress: Agriculture Conference: 237-260. Chan, K. W., Lim K. C. and Ahmad Alwi (1991). Fertilizer efficiency studies in oil palm. Proceedings of the 1991 PORIM International Palm Oil ConferenceAgriculture. 302-311. Chan. K. W. (1999). System approach to fertilizer management in oil palm. Proceedings of the 1999 PORIM International Palm Oil Congress – Agriculture: 171-187. Chatterjee, S. and Price, B. (1991). Regression Analysis by Example. 2nd Edition. New York: John Wiley and Sons, Inc. Chin, S. A. (2002). Narrowing the yield gap in oil palm between potential and realization. The Planters, 78(919), 541-544. 245 Chow, C. S. (1984). Forecast of Malaysian palm oil production up to year 2000. Proceedings of Int. Seminar on Market Development for Palm Oil Products. 31-47. Chow, C. S. (1987). The seasonal and rainfall effects on palm oil production in Peninsular Malaysia. Proceedings of 1987 Oil Palm Conference – Agriculture: 46-52. Chow, C. S. (1988). The seasonal and rainfall effects on palm oil production in Peninsular Malaysia. Proceedings of the 1987 International Oil Palm/Palm Oil Conference: Progress and Prospects: Agriculture. 46-55. Christensen, R. (2001). Advanced Linear Modeling. Multivariate, Time Series and Spatial Data; Nonparametric Regression and Response Surface Maximization. New York: Springer-Verlag. Cline, R. A. (1997). Leaf analyses for fruit crop nutrition. Horticultural Research Institute of Ontario. Connor, D. (1988). Data transformation explains the basics of neural networks. EDN, 33(10): 138-144. Corley, R. H. V. and Gray, B. S. (1976). Growth and morphology. In Corley, R. H. V., Haron, J. J. and Wood, B. J. (1976). Oil palm research: Development in crop science (1). Elsevier Scientific Publishing Company. 7-21. Corley, R. H. V. (1976). Photosynthesis and productivity. In Corley, R. H. V., Haron, J. J. and Wood, B. J. (1976). Oil palm research: Development in crop science (1). Elsevier Scientific Publishing Company. 55-76. Corley, R.H.V. (1976). Physiological aspect of nutrition. In Corley, R. H. V., Haron, J. J. and Wood, B. J. (1976). Oil palm research: Development in crop science (1). Elsevier Scientific Publishing Company. 157-164. Corley, R.H.V. and Mok, C. K. (1972). Effects of nitrogen, phosphorus, potassium and magnesium on growth of oil palm. Expl. Agri. 8: 347-353. Corne, S. A., Carver, S. J., Kunin, W. E., Lenon, J. J. and van Hees, W. W. S. (2000). Using neural network methods to predict forest characteristics in southeast Alaska. 4th International Conference on Integrating GIS and Environmental Modeling (GIS/EM4): Problems, Prospects and Research Needs. 246 Corne, S. Kneale, P., Openshaw, S. and See L. (1998). The use and evaluation of artificial neural networks in flood forecasting. http://www.ccg.leeds.ac.uk/simon/maff98.htm. Deck, S. H., Morrow, C. T., Heinemann, P. H. and Sommer, H. J. (1995). Comparison of a neural network and traditional classifier for machine vision inspection of potatoes. Applied Engineering in Agriculture, 11: 319-326. Department of Statistics, Malaysia, 1975 – 1989. Kuala Lumpur, Malaysia. Donaldson, R. G., Kamstra, M., and Kim, H. Y. (1993). Evaluating alternative models for conditional stock volatility: Evidence from international Data. Working Paper, University of British Columbia. Draper, N. R. and Smith. H. (1981). Applied regression analysis. New York: John Wiley and Sons, Drummond, S. T. Sudduth, K. A. and Birrell. (1995). Analysis and correlation methods for spatial data. ASAE Paper No. 95-1335. St. Joseph, Mich.: ASAE Drummond, S. T., Sudduth, K. A., Joshi, A., Birrell, S. J. and Kitchen, N. R. (2002). Statistical and neural network methods for site-specific yield prediction. http://www.nal.usda.gov/ttic/tektran/data/000013/14/0000131434.html Epstein, E. (1972). Mineral nutrition of plants: principle and perspectives. New York: Wiley. Evan, O. V. D. (1997). Short-term currency forecasting using neural networks. ICL System Journal. 11(2). Fairhurst, T. H. and Mutert, E. (1999). Interpretation and management of oil palm leaf analysis data. Better Crops International. 13(1): 48-51. Farazdaghi, H. and Harris, P. M. 1968. Plant competition and cop yield. Nature, vol. 217: 289-290. Fausett, L. (1994). Fundamentals of Neural Networks: Architectures, Algorithms, and Applications. Prentice-Hall, Inc. Fekedulegn, D., Mac Suirtain, M. P. and Colbert, J. J. (1999). Parameter estimation of nonlinear growth models in forestry, Silva Fennica 33(4): 327-336. Foong, F. S. (1991). Potential evapotranpiration, potential yield and leaching losses of oil palm. Proceedings of the 1991 PORIM International Palm Oil Congress (Agriculture): 105-119. 247 Foong, F. S. (1999). Impact of moisture on potential evapotranspiration, growth and yield of palm oil. Proceedings of the 1999 PORIM International Palm Oil Congress (Agriculture): 265-287. Foong, S. F. (2000). Faktor yang menentukan pengeluaran hasil dan mempengaruhi potensi hasil sawit. Modul Kursus Pengurusan Ladang untuk Pengurus FELDA. Foster, H. (2003). Assessment of oil palm fertilizer requirements. . In Thomas Fairhust and Rolf Hardter, Oil Palm Management for Large and Sustainable Yields. PPI, PPIC and IPI. Foster, H. L. (1995). Experience with fertilizer recommendation system for oil palm. Proceedings of 1993 PORIM International Palm Oil Congress – update and vision (Agriculture): 313-328. Foster, H. L. and Chang, K. C. (1977). The diagnosis of the nutrient status of oil palms in West Malaysia. In Earp, D. A. and Newall, W. (eds.). International Developments in Oil Palm. Malaysian International Agricultural Oil Palm Conference, Kuala Lumpur. 14-17 June 1976. ISP, 290-312. Foster, H. L. and Chang, K. C. Mohd Tayeb Dolmat, Ahmad Tarmizi Mohammed and Zin Zawawi Zakaria (1985). Oil palm yield responses to N and K fertilizers in different environments in Peninsular Malaysia . PORIM Occasional Paper No. 16. Palm Oil Research Institute of Malaysia Kuala Lumpur. Foster, H. L., Ahmad Tarmizi Mohammed and Zin Zawawi Zakaria. (1987). Foliar diagnosis of oil palm in Peninsular Malaysia. Proceedings of 1987 International Palm Oil Conference – Agriculture: 249-261. Foster, H. L., Mohd Tayeb Dolmat and Gurmit Singh. (1987). The effect of fertilizers on oil palm bunch components in Peninsular Malaysia. Proceedings of 1987 International Palm Oil Conference – Agriculture: 294- 305. Franses, P. H. and Homelen, P. V. (1998). On forecasting exchange rate using neural networks. Applied Financial Economics. 8: 589-596. Gallant. A. R. (1987). Nonlinear statistical models. New York: John Wiley and Sons. 248 Gan, W. S. and Ng, K. H. (1995). Multivariate FOREX forecasting using artificial neural networks. IEEE International Conference of Neural Networks. 2: 10181022. Garcia, O. (1983). The stochastic differential equation model for the height growth of forest stands, Biometrics, 39, 1059-1072. Garcia, O. (1988). Growth modeling - A review. New Zealand Forestry, 33(3): 1417. Garcia, O. (1989). Growth modeling – New development. In Nagumo, H., and Konohira, Y (Eds.), Japan and New Zealand Symposium on Forestry Management Planning, Japan Association for Forestry Statistics, 152-158. Garcia, O. (1993). Stand growth models: Theory and practice. In Advancement in Forest Inventory and Forest Management Sciences – Proceedings of IUFRO Seoul Conference. Forest Research Institute of The Republic of Korea. Garcia, R. and Gency, R. (2000). Pricing and hedging derivative securities with neural networks and a homogeneity hint. Journal of Econometrics. 94: 93115. Gaudart, J., Giusiano, B. and Huiart, L. (2003). Comparison of the performance of multi-layer perceptron and linear regression for epidemiological data. Computational Statistics and Data analysis. Glasbey, C. A. (1979). Correlated residual in non-linear regression applied to growth data. App. Stat. 28: 251-259. Goh, K. J., Hardter, R. and Fairhust, T. (2003). Fertilizing for Maximum Return. In Thomas Fairhust and Rolf Hardter, Oil Palm Management for Large and Sustainable Yields. PPI, PPIC and IPI. 279-306. Gorr, W. L. (1994). Research prospective on neural network forecasting. International Journal of Forecasting, 10: 1-4. Green, A. H. (1976). field Experiments as a Guide to Fertilizer Practice. In Corley, R. H. V., Hardon, J. J. and Wood, B. J. 1976. Oil Palm Research: Developments in Crop Science (1). Elsevier Scientific Publishing Company, Netherlands. Gujarati, D. N. (1988). Basic Econometrics. New York, McGraw-Hill. Hagan, M. T., Demuth, H. B. and Beale, M. H. (1996). Neural Network Design. Boston: PWS Publishing Company. 249 Hampel, F. R. (1974). The influence curve and its role in robust estimation. Journal of the American Statistical Association, 69: 383-393. Harris, J. M. and Kennedy, S. (1999). Carrying capacity in agriculture: global and regional issues. Ecological Economics, 29: 443-461. Hartley, C. W. S. (1988). The Oil Palm (Elaeis guineensis Jacq.). New York: Longman Scientific & Technical, 3rd Edition Heeler, R. M. and Hustad, T. P. (1980). Problems in predicting new product growth for consumer durables. Management Science, 26:1007-1020. Henson, I. E. (2000). Modeling the effects of ‘haze’ on oil palm productivity and yield. Journal of Oil Palm Research. 12(1): 123-134. Henson, I. E. and Chan, K. C. (2000). Oil palm productivity and its component process. Advances in Oil Palm Research, 1: 97-145. Henson, I. H. and Mohd Harun, H. (2004). Seasonal in oil palm fruit bunch production: Its origin and extent. The Planters, 80(937): 201-212. Hill, T. and Remus, W. (1994). Neural network models for intelligent support of managerial decision making. Decision Support Systems, II: 449-459. Holliday, R. 1960. Plant population and crop yield: Part I. Field Crop Abstract, vol. 13(3), 159-167. Holliday, R. 1960. Plant population and crop yield: Part II. Field Crop Abstract, vol. 13(4), 247-254. Hornik, K., Stinchcombe, M. and White, H (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2: 359-366. Hsieh, W. W. and Tang, B. (1998). Applying neural network models to prediction and data analysis in meteorology and oceanography. Bulletin of the American Meteorology Society. 79(9): 1885-1887. Hsu, C. C. and Chen, C. Y. (2003). Regional load forecasting in Taiwan – applications of artificial neural networks. Energy Conversion and Management, 44: 1941-1949. Hsu, K., Gupta, H. V. and Sorooshian, S. (1993). Artificial neural network modeling of the rainfall-runoff process. Water Resources Research, 29(4): 1185-1194 Huber, P. J. (1981). Robust Statistics. New York: Wiley. Hunt, R. (1982). Plant Growth Curves, London: Edward Arnold. Hykin, S. (1999). Neural networks: A comprehensive foundation. 2nd edition. New Jersey: Prentice Hall. 250 Indro, D. C., Jiang, C. X., Patuwo, B. E. and Zhang, G. P. (1999). Predicting mutual fund performance using artificial neural networks. Omega, Int. J. Mgmt. Sci. 27: 569-582. Jennrich, R. I. (1995). An introduction to computational statistics: Regression analysis. New Jersey: Prentice Hall, Englewood Cliffs. Jiang, X., Chen, M. S., Manry, M. T., Dawson, M. S. and Fung, A. K. (1994). Analysis and optimization of neural networks for remote sensing. Remote Sensing Review, 9: 97-114. Kastens, T. L. Dhuyvetter, K. C., Schmidt, J. P. and Stewart, W. M. (2000). Wheat yield modeling: How important is soil test phosphorus? Better Crops. 84(2): 8-10. Kastens, T. L., Schmidt, J. P. and Dhuyvetter, K. C. (2000). Wheat yield modeling with site-specific information: A Kansas farm case study. In A. J. Schlegel (ed.) Proceedings of the 2000 Great Plains Soil Fertility Conference. 41-48. Kastens, T. L., Schmidt, J. P. and Dhuyvetter, K. C. (2003). Yield models implied by traditional fertilizer recommendations and a framework for including nontraditional information. Soil Sci. Soc. Am. J. 67: 351-364. Kee, K. K. and Chew, P. S. (1991). Oil palm responses to nitrogen and drip irrigation in a wet monsoonal climate in Peninsular Malaysia. Proceedings of PORIM International Palm Oil Conferences. Module 1: Agriculture: 321-339. Klein, B. D. and Rossin, D. F. (1999a). Data errors in neural network and linear regression models: An experimental comparison. Data Quality Journal, 5(1): 1-19. Klein, B. D. and Rossin, D. F. (1999b). Data quality in neural network: effect of error rate and magnitude of error on predictive accuracy. Omega, International Journal of Management Science, 27: 569-582. Kominakis, A. P., Abas, Z., Maltaris, I. and Rogdakis, E. (2002). A preliminary study of the application of artificial neural networks to prediction of milk yield in dairy sheep. Computers and Electronics in Agriculture. 35(1): 35-48. Kruse, J. R. 1999. Trend yield analysis and yield growth assumption. Technical Report No. 06-99. Food and Agricultural Policy Research Institute. Lai, L. L. (1998). Intelligent system applications in power engineering: Evolutionary programming and neural networks. West Sussex, England: John Wiley and Sons. 251 Law, R. (2000). Back-propagation learning in improving the accuracy of neural network-based tourism demand forecasting. Tourism Management, 21: 331340. Law, R. and Au, N. (1999). A neural network model to forecast Japanese demand for travel to Hong Kong. Tourism Management, 20: 89-97. Lawrence, S., Giles, C. L. and Tsoi, A. C. (1996). What size neural network gives optimal generalization? Convergence properties of backpropagation. Technical Report UMIACS-TR-96-22 and CS-TR-3617, University of Marryland College Park. Lee, J. C., Lu, K. W. and Horng, S. C. (1992). Technological forecasting with nonlinear models. Journal of Forecasting, 11: 195-20. Lehmann, E. L. (1998). Nonparametrics: Statistical methods based on ranks.1st edition (revised), New Jersey: .Prentice Hall. Lei, Y. C. and Zhang, S. Y. (2004). Features and partial derivatives of BertalanffyRichards growth model in forestry. Nonlinear Analysis: Modelling and Control, vol. 9(1): 65-73. Leng, T., Pin, O. K. and Zainuriah, A. (2000). Effects of fertilizer withdwawal prior to replanting on oil palm performance. Proceeding of the International Planters Conference. 233-249. Limsombunchai, V., Gan, C. and Lee, M. (2004). House price prediction: Hedonic price model vs. artificial neural network. American Journal of Applied Science, 1(3): 193-201. Lin, F., Yu, X. H., Gregor, S. and Irons, R. (1995). Time series forecasting with neural networks. http://www.csu.edu.au/ci/vo102/cmxhk/cmxhk.html Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP Magazine. 4-22. Liu, J., Goering, C. E. and Tian. L. (2001). A neural network for setting target corn yields. Transactions of the American Society of Agriculture Engineers. 44 (3): 705-713. Ludlow, A. R. (1991). Modeling as a tool for oil palm research. PORIM International Palm Oil Conference – Agriculture. 273-289. Makowski, D., Wallach, D. and Meynard, J. M. (2001). Statistical methods for predicting responses to applied nitrogen and calculating optimal nitrogen rates. Agronomy Journal. 93: 531-539. 252 Malaysian Palm Oil Board Statistics (1990 – 2003). Malaysian Palm Oil Board, Kuala Lumpur: Malaysia. Masters, T. (1993). Practical Neural Network Recipes in C++. Academic Press, Inc. McCullagh, P and Nelder, J. A. (1983). Generalized Linear Models. London: Chapman and Hall. Md Yunus Jaafar. (1999). Pensuaian model taklinear Gompertz terhadap pertumbesaran pokok koko. Matematika, 15(1): 1-20. Meade, N. (1984). The use of growth curves in forecasting market development-a review and appraisal. Journal of Forecasting, 3: 429-451. Meade, N. (1985). Forecasting using growth curves-an adaptive approach. Journal of Operational Research Society, 36: 1103-1115. Meade, N. (1988). A modified logistic model applied to human populations. Journal of Royal Statistical Society, Ser. A, 151: 491-498. Meade, N. and Islam, T. (1995). Forecasting with growth curves: An empirical comparison. International Journal of Forecasting, 11: 199-215. Michaelsen, J. (1987). Cross-validation in statistical climate forecast models. J. Clim. Appl. Meteor. 26: 1589-1600. Mihalakakou, G., Santamouris, M. and Asimakopoulos, D. N. (2000). The total solar radiation time series simulation in Athens using neural networks. Theoritical and Applied Climatology, 66: 185-197. Mohd Haniff Harun. (2000). Yield and yield components and their physiology. Advances in Oil Palm Research, 1: 146-170. Mokhtar Abdullah (1994). Analisis Regresi. Kuala Lumpur: Dewan Bahasa dan Pustaka. Montgomery, D. C. (1984). Design of Experiments, second edition, New York: John Wiley & Sons, Inc. Morgan, P. H., Mercer, L. P. and Flodin, N. W. (1975). General model for nutritional response of higher organisms. Proc. Nat. Acad. Sci. USA. 72: 4327-4331. Moshiri, S. and Cameron, N. (2000). Neural network versus econometric models in forecasting inflation. Journal of Forecasting. 19(3): 201-217. Motiwalla, L. and Wahab, M. (2000). Predictable variation and profitable trading of US equities: A trading simulation using neural networks. Computers and Operational Research, 27: 1111-1129. 253 Murata, N., Yoshizawa, S. and Amari, S. (1994). Network information criteriondetermining the number of hidden units for an artificial neural network model. IEEE Transactions on Neural Networks. 5: 865-872. Myers, R. H. and Montgomery, D. C. (1995). Response Surface Methodology : Process and product optimization using design experiments. John Wiley and Sons, NY. Nasr, G. E., Badr, E. A. and Joun, C. (2003). Backpropagation neural networks for modeling gasoline consumption. Energy Conversion and Management, 44: 893-905. Navone, H. D. and Ceccatto, H. A. (1994). Predicting Indian monsoon rainfall: a neural networks approach. Climate Dynamics. 10: 305-312. Naylor, R, Falcon, W. and Zavaleta, E. 1997. Variability and growth in grain yields, 1950-1994. Population Dev. Rev. 23(1): 41-58. Nelder, J. A., Austn, R. B., Bleasdale, K. A. and Salter, P. J. 1960. An approach to the study of yearly and other variation in crop yield. J. Hort. Sci., vol. 35: 7382. Nelson, L. (1997). Statistics in Fertilizer Use Research (Modern Agriculture and Fertilizers Lecture Series No. 1) Potash and Phosphate Institute of Canada, India Programme, Dundahera, Gurgaon. Norušis, M, J. (1998). SPSS® 8.0. Guide to Data Analysis. New Jersey: Prentice Hall. Oboh, B. O. and Fakorede, M. A. B. (1999). Effects of weather on yield components of the oil palm in forest location in Nigeria. Journal of Oil Palm Research. 11(1): 79-89. Oil World Annual: (1999 – 2003). Oil World. ISTA Mielke Gmbh. Hamburg. Oil World. 2003. Oil World. ISTA Mielke Gmbh. Hamburg. Oliver, F. R. (1970). Estimating the exponential growth function by direct least squares. Applied Statistics, 19: 92-100. Patrick, B. K. and Smagt, V. D. (1996). An Introduction to Neural Network. The University of Amsterdam. Patterson, D. W. (1996). Artificial Neural Networks: Theory and Applications. Singapore: Prentice Hall. Penman, H. L. (1956). Weather and water in growth of grass. In Milthrope, F. L. The growth of leaves. London, Butterworths Scientific Publications. 254 Philip, M. S. (1994). Measuring trees and forests. 2nd edition. Wallingford, UK: CAB International. Pienaar, F. J. and Turnbull, K. J. (1973). The Chapman-Richards generation of Von Bertalanffy’s growth model for basal area growth and yield in even-aged stands. For. Sci. 19: 2-22. Pushparajah, E. (1994). Leaf analysis and soil testing for plantation tree crops. Presented at International Workshop Leaf Diagnosis and Soil Testing as a Guide to Crop Fertilization. Ralston, M. L. and Jennrich, R. I. (1978). DUD, A Derivative-Free Algorithm for Nonlinear Least Squares. Technometrics, 20: 7-14. Rao, S. K. (1985). An empirical comparison of sales forecasting models. Journal of Product Innovation Management, 4: 232-242. Ratkowskay, D. A. (1983). Nonlinear regression modeling. New York. Marcel Dekker. Rawson, H. M. and Hackett, C. (1974). An exploration of the carbon economy of the tobacco plant III. Gas exchange of leaves in relation to the position of the stem, ontogeny and nitrogen content, Aust. J. Plant Physio., 1: 551. Reed, R. D. and Mark, R. J. (1999). Neural Smoothing: Supervised Learning in Feedforward Artificial Neural Networks. The MIT Press. Richard, F. J. (1969). The quantitative analysis of growth. In Stewart (Ed.), Plant Physiolog, Volume VA: Analysis of Growth: Behaviour of Plants and their Organs. New York: Academic Press. Ripley, B. D. (1994). Neural networks and flexible regression and discrimination. Advances in Applied Statistics. 39-57. Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge University Press. Rousseeuw, P. J. and Leroy, A. M. (1987). Robust Regression and Outlier Detection. New York: John Wiley and Sons, Inc. Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1986). Learning internal representation by back-propagating errors. In: Rumelhart, D. E., McCleland, J.L., and the PDP Group, editors. Parallel processing: explorations in the microstructure of cognation. MA. MIT Press. Sarles, W. S. (1997). Neural Network FAQ, periodic posting to the Usenet newsgroup comp.ai.neural-net, URL: ftp://ftp.sas.com/pub/neural/FAQ.html. 255 Sarles, W. S. (1998). Neural networks and statistical models. Proceedings of the Nineteenth Annual SAS Users Group International Conference. SAS/STAT User’s Guide, (1992). Release 6.03 edition. SAS Institute, Cary, NC. Schnute, J. (1981). A versatile growth model with statistically stable parameters, Can. J. Fish. Aquat. Sci. 38: 1128. Seber, G. A. F. and Wild. C. J. (1989). Nonlinear regression New York: John Wiley and Sons. Shearer, S. A., Burks, T. F., Fulton, J. P., Higgins, S. F., Thomasson, J. A., Mueller, T. G. and Samson, S. (1994). Yield prediction using a neural network classifier trained using soil landscape features and soil fertility data. Agronomy Journal, 89: 54-59. Shrestha, D. S. and Steward, B. L. (2002). Early growth stage corn plant population measurement using neural network approach. Proceedings of the World Congress of Computers in Agriculture and Natural Resources.: 8-14. Smith, K. A. and Gupta, J. N. D. (2000). Neural networks in business: techniques and applications for the operations research. Computers and Operation Research. 27: 1023-1044. Soon, B. B. F. and Hong, H. W. (2001). Oil palm responses to N, P, K and Mg fertilizers on two major soil types in Sabah. Proceedingsof the 2001 PIPOC International Palm Oil Congress (Agriculture, 318-334. S-Plus 6 for Windows: Guide to Statistics, Volume. 1. (2001). Insightful Corporation, Seattle, Washington. Tangang, F. T. and Hsieh, W. W. and Tang, B. (1997a). Forecasting the equatorial Pacific sea surface temperatures by neural network models. Climate Dynamic, 13: 135-147. Tangang, F. T., Tang, B., Monahan, A. H. and Hsieh, W. W. (1998). Forecasting ENSO events: A neural network-extended EOF approach. Journal of Climate, 29-41. Tanner, J. C. (1978). Long term forecasting of vehicle ownership and road traffic. The Journal of the Royal Statistical Society, Ser. A, 141:14-63. Teng, Y. and Timmer, V. R. (1996). Modeling nitrogen and phosphorus interactions in intensively managed nusery soil-plant systems. Canadian Journal Soil Science. 76: 523-530. 256 Teoh, C. H. (2000). Land use and the palm oil industry in Malaysia. Report produced under Project MY 0057 ‘Policy Assessment of Malaysian Conservation Issues’, Kuala Lumpur. Thiesing, F. M. and Vornberger, O. (1997). Sales forecasting using neural networks. IEEE.Proceedings ICNN, Houstan, 4: 2125-2128, Timmermans, A. J. M. and Hulzebosch, A. A. (1996). Computer vision system for on-line sorting of pot plants using an artificial neural network classifier. Computers and Electronics in Agriculture, 15: 41-55. Tinker, P. B. (1976). Soil requirements of the oil palm. In Corley, R. H. V., Haron, J. J. and Wood, B. J. (1976). Oil palm research: Development in crop science (1). Elsevier Scientific Publishing Company. 165-179. Tkacz, G. and Hu, S. (1999). Forecasting GDP growth using artificial neural networks. Bank of Canada Working Paper 99-3. Tsoularis, A. & Wallace, J. (2002). Analysis of logistic growth models. Mathematical Biosciences. 179: 21-55. Uysal, M. and El Roubi, M. S. (1999). Artificial neural networks versus multiple regression in tourism demand analysis. Journal of Travel Research, 38(2): 111-118. Vanclay, J. K. 1994. Modelling forest growth and yield. Wallingford, UK, CAB International. Verdooren, R. (2003). Design and analysis fertilizer experiments. In Fairhust, T. and Hardter, R. 2003. Oil Palm: Management for Large and Sustainable Yields. PPI, PPIC and IPI. Wang, D., Dowell, F. E. and Lacey, R. E. (1999). Single wheat kernel color classification using neural networks. Transaction of the ASAE, 42: 233-240. Welch, S. M., Roe, J. L. and Dong, Z. (2003). A genetic neural network model of flowering time control in Arabidopsis thaliana. Agron. J. 95: 71-81. Welstead, S. T. (1994). Neural Network and Fuzzy Logic Applications in C++. New York: John Wiley and Sons, Inc. Wendroth, O, Reuter, H. I. and Kersebaum, K. C. (2003). Predicting yield of barley across a landscape: a state-space modeling approach. Journal of Hydrology. 272: 250-263. 257 Wong, B. K. and Selvi, Y. (1998). Neural network application in finance: A review and analysis of literature (1990-1996). Information and Management, 34: 129-139. Wong, B. K., Lai, V. S. and Lam, J. (2000). A bibliography of neural networks business applications research: 1994-1998. Computers and Operations Research. 27: 1045-1076. Yang, C. C., Prasher, S. O., Lacroix, R., Sreekanth, S., Madani, A. and Masse, L. (1997). Artificial neural network model for subsurface-drained farmlands. Journal of Irrigation and Drainage Engineering, 123: 285-292. Yang, C. C., Prasher, S. O., Landry, J. A., Ramaswamy, H. S. and Ditommaso, A. (2000). Application of artificial neural networks in image recognition and classification of crop and weeds. Canadian Agriculture Engineering, vol. 42(3): 147-152. Yao, J, Li, Y. and Tan, C. L. (2000). Option prices forecasting using neural networks. Omega, Int. J. Mgmt. Sci. 28: 455-466. Yao, J. and Poh, H. L. (1995). Forecasting the KLSE Index using neural networks. IEEE. 1013-1017. Yao, J. and Tan, C. L. (2000). A case study on using neural networks to performs technical forecasting of forex. Neurocomputing, 34: 79-98. Yuancai, L., Marques, C. P. & Macedo, F. W. (1997). Comparison of Schnute’s and Bertalanffy-Richards’ growth function. Forest Ecology and Management. 96: 283-288 Zhang, G. and Hu, M. Y. (1998). Neural network forecasting of the British pound/Us dollar exchange rate. Omega, Int. J. Mgmt. Sci. 26(4): 495-506. Zhang, G. P., Patuwo, G. E. and Hu, M. Y. (2001). A simulation study of artificial neural networks for nonlinear time-series forecasting. Computers and Operations Research. 28: 381-396. Zuhaimy Ismail and Azme Khamis (2001). A review on combining neural network and genetic algorithm. Laporan Teknik/M. 5. Jabatan Matematik, UTM. Zuhaimy Ismail and Azme Khamis (2002). A review on neural network and its application in forecasting. Laporan Teknik/M. 3. Jabatan Matematik, UTM. Zuhaimy Ismail and Azme Khamis (2003). Rangkaian neural dalam peramalan harga minyak kelapa sawit. Jurnal Teknologi: C, 39: 17-28. 258 Zuhaimy Ismail, Azme Khamis and Md Yunus Jaafar. (2003). Fitting nonlinear Gompertz curve to tobacco growth data. Pakistan Journal of Agronomy, 2(4): 223-236. 259 Appendix A The list of oil palm experimental stations Code Name of station (Estate) Soil series Palm age Coastal area CLD1 West Carey Estate, Klang, Selangor Carey 16 - 17 CLD2 North Carey Estat, Klang, Selangor Selangor 16 - 17 CLD3 Nordanal Estate, Muar, Johor Briah 19 - 20 CLD4 Dusun Durian, Selangor Sedu 12 - 14 CLD5 South Carey Estate, Carey Island, Carey 12 - 15 Selangor CLD6 Athlone Estate, Kapar, Perak Briah 11 - 14 CLD7 Athlone Estate, Kapar, Perak Briah 14 - 17 Bungor 12 - 14 Rengam 7 - 11 Munchong 7 - 10 Batu Anam 10 - 13 Pohoi 11 - 14 Inland area ILD1 Sungai Mahang Estae, Nilai, Negeri Sembilan ILD2 Daimond Jublee Estate, Jasin, Melaka ILD3 Sendayan Estate, Port Dickson, Negeri Sembilan ILD4 Gomali Estate 3, Bt. Anam, Segamat, Johor ILD5 Lepan Kabu Estate, Kuala Krai, Kelantan ILD6 Gomali Estate, Batu Anam, Segamat, Batu Anam 7 - 10 Johor ILD7 Gomali Estate, Batu Anam, Segamat, Durian Johor 9 - 12 260 Appendix B The rate and actual value of fertiliser (kg/palm/year) CLD1 Level 0 1 2 N (kg N26) 0.00 1.82 3.64 P (kg CIRP) 0.00 1.82 3.64 K (kg KCI) 0.00 2.37 5.46 Mg (kg Kies) 0.00 1.82 3.64 CLD2 Level 0 1 2 N (kg N26) 0.00 1.82 3.64 P (kg CIRP) 0.00 1.82 3.64 K (kg KCI) 0.00 2.72 5.44 Mg (kg Kies) 0.00 1.82 3.64 CLD3 Level 0 1 2 N (kg ASN) 0.00 3.64 7.28 P (kg CIRP) 0.00 1.82 3.64 CLD4 Level 0 1 2 N (kg ASN) 0.00 3.64 7.28 P (kg CIRP) 0.00 1.82 3.64 CLD5 Level 0 1 2 N (kg CAN) 0.00 2.73 5.46 P (kg CIRP) 0.00 4.55 9.10 K (kg KCI) 0.00 3.64 7.28 Mg (kg Kies) 0.00 1.82 3.64 K (kg KCI) 0.00 3.64 7.28 Mg (kg Kies) 0.00 1.82 3.64 K (kg ASH) 0.00 9.10 18.20 Mg (kg Kies) 0.00 4.55 9.10 CLD6 Level 0 1 2 3 N (kg N26) 0.00 1.82 3.64 5.45 P (kg CIRP) 0.00 2.73 --- K (kg KCI) 0.00 2.27 4.55 6.82 CLD7 Level 0 1 2 3 N (kg N26) 0.00 1.82 3.64 5.45 P (kg CIRP) 0.00 2.73 --- K (kg KCI) 0.00 2.27 4.55 6.82 261 Fertiliser Rate (kg/palm/year) ILD1 Level 0 1 2 N (kg N26) 0.00 3.18 6.36 P (kg CIRP) 0.00 1.82 3.64 K (kg KCI) 0.00 2.50 5.00 Mg (kg Kies) 0.00 1.82 3.64 ILD2 Level 0 1 2 3 N (kg N26) 0.00 1.82 3.64 5.45 K (kg KCI) 0.00 2.73 5.45 8.18 ILD3 Level 1 2 3 4 N (kg N26) 1.34 3.64 5.92 8.20 P (kg CIRP) 0.90 1.80 --- K (kg KCI) 1.34 3.64 5.92 8.20 K (kg KCI) 1.68 3.36 6.72 Mg (kg Kies) 0.58 1.15 2.30 ILD4 Level 1 2 3 N (kg AS) 1.60 3.20 6.40 ILD5 Level 0 1 2 N (kg N26) 0.00 1.36 2.73 ILD6 Level 1 2 3 N (kg AS) 1.45 2.90 4.80 P (kg CIRP) 0.75 1.50 3.00 K (kg KCI) 1.65 3.30 6.60 Mg (Kies) 0.60 1.20 2.40 ILD7 Level 1 2 3 N (kg AS) 1.73 3.46 6.92 P (kg CIRP) 0.91 1.82 3.64 K (kg KCI) 1.39 2.78 5.56 Mg (Kies) 0.70 1.39 2.78 P (kg CIRP) 0.75 1.50 3.00 P (kg CIRP) 0.00 2.27 4.55 K (kg KCI) 0.00 2.27 4.55 262 Appendix C Summary of macro nutrients needed by plants Nitrogen (N): Nitrogen is essential constituent of proteins, nuclide acid and various coenzymes. The majority of biochemical reactions in the plant are catalyzed by enzymes, which are protein and thus nitrogen plays an essential part in virtually all physiological process. A general symptom of nitrogen deficiency is chlorosis of the leaves, since chlorophyll synthesis is inhibited; this leads to reduced photosynthesis. While reduced proteins synthesis will result in a general loss of ‘vigour’. Corley and Mork (1972) showed that nitrogen fertiliser caused increased leaf area, leaf weight and rate of leaf production and a higher net assimilation rate. Phosphorus (P): Phosphate is an essential constituent of nuclide acids and phospholipids, while the processes of photosynthesis and respiration involve reaction among sugar phosphates and coenzymes adenosine diphosphate and triphosphate (ADP and ATP) and nicotine-adenin dinucleotide phosphate. ATP in particular, is an intermediate metabolite whose breakdown, to ADP and phosphate, releases energy, which is used for almost all the energy requiring processes in plant metabolism. ATP is produced in both photosynthesis and respiration, and its synthesis depends on a supply of ADP and phosphate. Hence phosphate deficiency can be expected to cause considerable disruption of metabolism. Potassium (K): The main function of potassium is an activator of numerous enzymes; that is, presence of potassium ions is necessary for activity of the enzyme, but potassium is not constituent of the actual enzyme molecule, nor of a co-factor. In the oil palm, stomata resistance is considerably increased when potassium is deficient (Corley 1976 and Tinker 1976). Magnesium (Mg): Magnesium has a general role as an enzyme activator, and is required by even more enzymes than potassium. The requirement is not always very specific, and other divalent cations can often substitute, in particular manganese. Among many systems requiring magnesium is that of fatty acid synthesis. 263 Magnesium is an essential component of the chlorophyll molecule, and magnesium deficiency in maize was found to cause decreased chlorophyll content and hence decreased photosynthesis (Corely 1976). Calcium (Ca): Calcium is an essential component of an enzyme, amylase, and is required as an activator by some enzymes. It is also a major component of the middle famella of plant cell walls, and may therefore have effects on the mechanical strength of tissues. Epstein (1972) point out that a major function of calcium may be in maintaining cell organization. Nutrient uptake is selective; chemically similar ions may be taken up at quite different rates. For example, the rate of potassium uptake was found to be independent of the presence of the sodium in the solution, even when the concentration of sodium was one hundred times that of potassium (Epstein 1972). Interaction in the uptake of pairs of chemically dissimilar ions occur; the negative relationship between tissue levels of potassium and magnesium is well known, and probably results from competition between the ions in uptake or translocation. Uptake of cations is independent of anion uptake. For example, the potassium uptake occur at a similar rate whether the anion in the solution is chloride or sulphate, although sulphate is taken up much more slowly than chloride (Epstein 1972). Where a cation is taken up faster than the anion, the anion deficit in the cell is compensated for by production of organic anions such as malate. 264 Appendix D The list of paper published from 2001 until Now International Level 1. Zuhaimy Ismail, Azme Khamis and Md Yunus Jaafar, (2003). Fitting nonlinear Gompertz curve to tobacco growth data. Pakistan Journal of Agronomy, 2: 223-236. 2. Azme Khamis and Zuhaimy Ismail. (2004). Comparative study on nonlinear growth model to tobacco leaf growth data. Journal of Agronomy, 3 (2): 147153. 3. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed. (2005). The effects of outliers data on neural network performance. Journal of Applied Sciences. 5(8): 1394-1398 4. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed. (2005). Modelling oil palm yield using multiple linear regression and robust M-regression. Journal of Agronomy 5(1): 32-36. 5. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed. (2005). Nonlinear Growth Models for Modeling Oil Palm Yield Growth. Journal of Mathematics and Statistics.1(3): 225-233 6. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed. Neural Network Model for Oil Palm (Eleais guineensis. Jacq.) Yield Modeling. Journal of Applied Science (In Press). 7. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed. The Use of Response Surface Analysis in Obtaining Maximum Profit in Oil Palm Industry. Songklanakarin Journal of Science and Technology Vol. 28(3), May-June 2006. (In Press). 265 National Level 1. Zuhaimy Ismail and Azme Khamis. (2001). A review on combining neural networks and genetic algorithm. Laporan Teknik, LT/M Bil. 5/2001. Jabatan Matematik, Universiti Teknologi Malaysia. 2. Zuhaimy Ismail and Azme Khamis, (2002). A review on neural networks and its application in Forecasting. Laporan Teknik, LT/M Bil. 3/2002. Jabatan Matematik, Universiti Teknologi Malaysia 3. Zuhaimy Ismail dan Azme Khamis. (2003). Perbandingan di antara regresi berganda dan regresi komponen utama untuk menganggar harga minyak sawit Mentah. Prosidang Simposium Kebangsaan Sains Matematik Ke-11, di Kota Kinabalu, Sabah pada 22-24 Disember 2003. 647-656. 4. Zuhaimy Ismail dan Azme Khamis. (2003). Rangkaian neural dalam peramalan harga minyak kelapa sawit. Jurnal Teknologi C: 17-28. 5. Azme Khamis, Zuhaimy Ismail dan Ani Shabri. (2003). Pemodelan harga-harga minyak sayuran menggunakan analisis regresi linear berganda. MATEMATIKA, 19: 59-70. 6. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed. The effects of foliar nutrient compositions on oil palm yield (In progress). 7. Azme Khamis, Zuhaimy Ismail, Khalid Haron and Ahmad Tarmizi Mohammed. Modelling In Oil Palm Industry: A Review (In Progress) Seminar/Symposium 1. Malaysian Science and Technology Congress (MSTC) 2002, Symposium C: Life Science, 12-14 December 2002, Kuching Sarawak. 2. Simposium Kebangsaan Sains Matematik Ke-X, organized by Universiti Teknologi Malaysia and Persatuan Sains Matematik (PERSAMA) at Johor Bahru, Johor. 23-24 December 2002. 3. Simposium Kebangsaan Sains Matematik Ke-XI, organized by Universiti Malaysia Sabah and PERSAMA at Kota Kinabalu, Sabah. 23-24 December 2003. 266 4. Seminar Jabatan Matematik, Universiti Teknologi Malaysia (UTM) pada 25 Ogos 2004. 5. Annual Foundation of Science Seminar (AFSS) at Universiti Teknologi Malaysia on 4-5 July 2005. 6. International Science Congress (ISC) 2005. (Incorporating Malaysian Science Technology Convention (MASTEC) 2005 and MASO 2005) at Putra World Trade Center, Kuala Lumpur on 4-7 August 2005. 267 Appendix E The ridge analysis Consider the B matrix discussed in equation (3.65). Consider also the P matrix (orthogonal) which diagonalizes B. That is, P′BP = Λ 0 ⎤ ⎡λ1 ⎥ ⎢ λ2 ⎥ ⎢ = ⎥ ⎢ ... ⎥ ⎢ 0 λm ⎦ ⎣ where the λi are the eigenvalues of B. The solution x that produces locations where ∂L 1 = 0 is given by (B - κI)x = − b ∂x 2 If we pre-multiply (B - κI) by P′ and post-multiple by P we obtain P′(B - κI)P = Λ- κI because P′P = Im. If (B - κI) is negative definite, the resulting solution x is at least a local maximum on the radius H = (x′x)1/2. On the other hand if (B - κI) is positive definite, the result is a local minimum. Because (B - κI) = Λ- κI ⎡λ1 − κ ⎢ =⎢ ⎢ ⎢ ⎣ 0 ⎤ ⎥ ⎥, ⎥ ⎥ λm − κ ⎦ 0 λ2 − κ then if κ > λmax, (B - κI) is negative definite and if κ < λmin, (B - κI) is positive definite. 268 Appendix F (i) Nonlinear least squares iterative phase, nonlinear least squares summary statistics and normal probability plot for the logistic growth model Iter α β 0 27.500000 0.550000 1 27.784101 0.437181 2 28.070708 0.443325 3 28.359738 0.398224 4 28.271777 0.434605 5 28.526438 0.450143 6 36.872926 2.065726 7 36.914557 2.109364 8 36.919986 2.119640 9 36.824820 2.179654 10 36.332521 3.329466 11 36.895564 4.793287 12 36.814083 4.951146 13 36.806923 5.191885 14 37.089334 4.978397 15 37.087246 4.775178 16 37.090585 4.784499 17 37.086662 4.784170 18 37.079051 4.822734 19 37.080554 4.814801 20 37.080606 4.814883 Note: Convergence criterion met. Source DF Regression 3 Residual 16 Uncorrected Total 19 (Corrected Total) 18 κ Sum of Squares 1.250000 1549.336981 0.821661 1454.859996 0.633481 1368.691939 0.432456 1326.059200 0.586234 1322.598926 0.492724 1265.244775 0.467708 120.790077 0.515355 102.781174 0.547810 100.262635 0.542003 98.127250 0.663181 74.732572 0.820524 58.338084 0.829236 58.204258 0.837600 57.803919 0.784076 56.336902 0.778952 56.153913 0.779078 56.152935 0.778778 56.152772 0.782226 56.150410 0.781682 56.150223 0.781682 56.150223 Sum of Squares 22234.038977 56.150223 22290.189200 994.230779 Mean Square 7411.346326 3.509389 Normal Probability Plot 2.5+ ++*++ * | +*+*+* | * ***+*** | * * **+++ | ++++++ | ++++++* | +++++ * -4.5++++ * +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 269 (ii) Nonlinear least squares iterative phase, nonlinear least squares summary statistics and normal probability plot for the Gompertz growth model Iter α β 0 35.000000 0.605000 1 35.049524 0.510863 2 35.292655 0.796643 3 35.483424 0.651081 4 35.525819 0.641890 5 35.558886 0.649027 6 35.559849 0.647886 7 36.494416 1.388648 8 36.569993 1.472994 9 37.076966 1.960631 10 37.074823 1.958641 11 37.161537 2.017509 12 36.843065 2.065772 13 37.159282 2.274528 14 37.186649 2.275576 15 37.192075 2.265236 16 37.187339 2.264565 17 37.186098 2.261244 18 37.185294 2.261130 19 37.180485 2.265362 20 37.178973 2.268331 21 37.178910 2.268318 22 37.178870 2.268295 Note: Convergence criterion met Source DF Regression 3 Residual 16 Uncorrected Total 19 (Corrected Total) 18 κ 1.250000 0.940512 1.156999 0.504601 0.405256 0.365298 0.361218 0.373371 0.413736 0.530213 0.530140 0.538426 0.590572 0.621459 0.620041 0.618033 0.611715 0.611743 0.611758 0.612635 0.613302 0.613233 0.613232 Sum of Squares 746.480434 682.655909 647.769394 367.245225 321.924597 305.800586 305.673059 162.906709 123.313547 67.721805 67.721114 65.990714 62.799193 60.031550 60.008103 60.000902 59.907512 59.906776 59.906715 59.905620 59.905488 59.905481 59.905481 Sum of Squares Mean Square 22230.283719 7410.094573 59.905481 3.744093 22290.189200 994.230779 Variable=YRESID Normal Probability Plot 2.5+ +++*++ * | +*+*+* | * ***+*** | * * **+++ | ++++++ | +++++* | +++++ * -4.5+++++ * +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 270 (iii) Nonlinear least squares iterative phase, nonlinear least squares summary statistics and normal probability plot for the von Bertalanffy growth model Iter α β 0 33.000000 -0.050000 1 35.023737 -0.014217 2 35.259687 -0.038748 3 35.460718 -0.049915 4 37.072682 -0.046658 5 37.075892 -0.041762 6 37.079520 -0.035101 7 37.078108 -0.033745 8 37.065582 -0.061425 9 37.068230 -0.062993 10 37.068501 -0.062515 11 37.065402 -0.062551 12 37.066245 -0.061998 13 37.087627 -0.060947 14 37.076089 -0.062608 15 37.070430 -0.059468 16 37.066570 -0.057282 17 37.068257 -0.058354 18 37.064574 -0.056308 19 37.065255 -0.056627 20 37.064611 -0.056125 21 37.063814 -0.055596 22 37.060425 -0.053924 23 37.057934 -0.052462 24 37.046978 -0.048243 25 37.046618 -0.048856 26 37.046616 -0.048777 27 37.046529 -0.048572 28 37.043026 -0.046837 29 37.041871 -0.045239 30 37.042169 -0.045275 31 37.042016 -0.045635 32 37.041767 -0.045685 33 37.041866 -0.045607 34 37.041787 -0.045561 35 37.041600 -0.045527 36 37.041603 -0.045522 Note: Convergence criterion met κ 0.750000 1.089793 1.003824 0.981359 0.904712 0.929619 0.971731 0.967617 0.904977 0.904702 0.902104 0.901063 0.899972 0.849564 0.847887 0.851623 0.854553 0.853254 0.856166 0.855948 0.856523 0.857331 0.859822 0.862092 0.869122 0.869373 0.869316 0.869590 0.871616 0.873640 0.873495 0.872874 0.872721 0.872944 0.873010 0.873113 0.873121 δ Sum of Squares 2.500000 380.624938 3.055292 110.839035 2.630589 98.069723 2.482279 92.784953 2.506261 57.711529 2.585367 57.006317 2.716818 56.899189 2.720237 56.775826 2.446393 56.554304 2.425974 56.481544 2.421153 56.452830 2.430985 56.425526 2.426527 56.354069 2.382534 55.843272 2.369937 55.837479 2.393605 55.827506 2.410832 55.822992 2.403249 55.820656 2.420114 55.814715 2.418831 55.812588 2.422487 55.812014 2.426665 55.811648 2.440569 55.809136 2.453299 55.807034 2.491502 55.806373 2.490510 55.799233 2.490282 55.797878 2.491954 55.797666 2.507261 55.796739 2.522672 55.796239 2.522255 55.796210 2.519291 55.795827 2.518556 55.795804 2.519454 55.795800 2.519914 55.795799 2.520290 55.795798 2.520335 55.795798 Source DF Sum of Squares Regression Residual Uncorrected Total (Corrected Total) 4 15 19 18 22234.393402 55.795798 22290.189200 994.230779 Mean Square 5558.598350 3.719720 271 Normal Probability Plot 2.5+ ++*+++* | +*+*+* | * ***+*** | * * **++++ | ++++++ | ++++++* | +++++ * -4.5++++ * +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 (iv) Nonlinear least squares iterative phase, nonlinear least squares summary statistics and normal probability plot for the negative exponential growth model Iter α β Sum of Squares 0 33.000000 0.750000 378.689391 1 34.984151 0.382846 198.582665 2 37.703541 0.363557 86.365416 3 37.754348 0.408138 78.461783 4 37.453151 0.408425 77.140826 5 37.457453 0.408447 77.140560 6 37.505210 0.404297 77.092328 7 37.501321 0.404674 77.090988 8 37.501711 0.404673 77.090988 Note: Convergence criterion met Source DF Regression 2 Residual 17 Uncorrected Total 19 (Corrected Total) 18 Sum of Squares 22213.098212 77.090988 22290.189200 994.230779 Mean Square 11106.549106 4.534764 Normal Probability Plot 2.5+ ++++* * | **+*+* * | * ***+*++ | **++++ | ++*++ | +++*+* | +++++* -4.5+ ++++ * +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 272 (v) Nonlinear least squares iterative phase, nonlinear least squares summary statistics and normal probability plot for the monomolecular growth model Iter α β 0 33.000000 0.050000 1 33.032719 0.055313 2 33.167509 0.087638 3 33.172952 0.088424 4 33.169883 0.088022 5 33.200838 0.095092 6 35.343956 0.639100 7 35.551615 0.681366 8 35.587558 0.629953 9 35.854233 0.692699 10 35.904523 0.699714 11 35.928524 0.703668 12 35.929749 0.703622 13 35.930117 0.703661 14 37.030801 0.985914 15 37.449930 1.116188 16 37.802734 1.214744 17 37.784063 1.210648 18 37.461509 1.172099 19 37.232354 1.164131 20 37.332588 1.138833 21 37.328771 1.139745 22 37.328512 1.139692 23 37.328118 1.139644 24 37.328278 1.140025 25 37.323417 1.140850 26 37.323469 1.140840 Note: Convergence criterion met. Source DF Regression 3 Residual 16 Uncorrected Total 19 (Corrected Total) 18 κ Sum of Squares 0.750000 952.737196 0.578546 934.320019 0.253167 855.574556 0.212198 855.374636 0.231795 854.973130 0.198418 846.660342 1.573268 825.135717 1.496942 801.224046 0.369259 234.290890 0.448329 224.648373 0.379435 188.420044 0.353969 182.218343 0.350383 182.196122 0.349565 182.185870 0.306075 177.849315 0.527952 97.505012 0.458968 77.006483 0.455475 76.910058 0.451685 72.445808 0.467682 70.908883 0.455252 70.731600 0.458701 70.685734 0.458658 70.685732 0.458671 70.685727 0.458940 70.685700 0.459255 70.685268 0.459251 70.685268 Sum of Squares 22219.503932 70.685268 22290.189200 994.230779 Mean Square 7406.501311 4.417829 Normal Probability Plot 2.5+ +++*+ * | ***+*+*+* | * ***++++ | **++++ | +*+*+ | ++++* | +++++ * -4.5++++++ * +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 273 (vii) Nonlinear least squares iterative phase, nonlinear least squares summary statistics and normal probability plot for the log-logistic growth model Iter α β 0 33.000000 0.050000 1 36.873205 0.586054 2 37.765368 0.683157 3 40.797958 1.375772 4 42.764468 1.286080 5 42.364462 1.163568 6 42.443784 1.764739 7 42.438028 1.603621 8 42.510603 1.788032 9 42.519733 1.969985 10 36.176134 4.723675 11 37.831765 3.606682 12 38.010828 3.100183 13 38.108277 3.134929 14 38.105742 3.148018 15 38.105381 3.145318 16 38.111655 3.188866 17 38.114910 3.190565 18 38.118430 3.189399 19 38.117328 3.195449 20 38.117246 3.194337 21 38.117302 3.194701 22 38.117217 3.194701 Note: Convergence criterion met. Source Regression Residual Uncorrected Total (Corrected Total) DF 3 16 19 18 κ Sum of Squares 0.750000 925.951285 1.994110 496.804068 0.973327 336.498308 1.490174 330.177803 0.673665 301.635265 0.721770 253.547673 1.282947 246.848668 1.162195 219.956210 1.231401 216.205677 1.290002 210.482475 2.087874 166.794624 1.931179 92.358634 1.869265 88.478175 1.872005 88.275154 1.873689 88.270873 1.873690 88.270792 1.887547 88.242252 1.887523 88.241946 1.886678 88.241941 1.887102 88.241687 1.887335 88.241584 1.887450 88.241581 1.887462 88.241581 Sum of Squares 22201.947619 88.241581 22290.189200 994.230779 Mean Square 7400.649206 5.515099 Normal Probability Plot 2.5+ *+*++* * | ***+*+ | ***+++ | ***++ | ++*++ | ++*+* | ++++* -4.5+ ++++* +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 274 (viii) Nonlinear least squares iterative phase, nonlinear least squares summary statistics and normal probability plot for the Richard’s growth model Iter α β 0 38.500000 1.000000 1 37.037098 3.791557 2 37.070057 5.463712 3 37.075208 5.206513 4 37.071095 7.077390 5 37.042877 5.077377 6 37.046336 10.548642 7 36.976194 14.796500 8 36.996401 12.599241 9 37.008334 11.950700 10 37.006412 13.561781 11 37.007115 13.522883 12 37.058076 10.278632 13 37.040608 10.476589 14 37.040910 10.471770 15 37.043505 10.466284 16 37.045387 10.513839 17 37.044652 10.622521 18 37.040266 11.174733 19 37.039364 11.193722 20 37.039959 11.185724 21 37.040101 11.196660 22 37.041969 11.086868 23 37.042311 11.014116 24 37.042011 11.017303 25 37.041910 11.045851 26 37.041868 11.043308 Note: Convergence criterion met. Source Regression Residual Uncorrected Total (Corrected Total) DF 4 15 19 18 κ 0.750000 0.896609 0.835800 0.835216 0.860141 0.761043 0.890059 0.894784 0.894737 0.890894 0.905913 0.905719 0.862950 0.866185 0.866327 0.866290 0.866820 0.868164 0.875043 0.875488 0.875251 0.875299 0.873544 0.872708 0.872742 0.873021 0.872991 δ Sum of Squares 0.250000 101.120493 0.855818 80.655550 1.177625 67.886130 1.136591 67.827042 1.349998 66.974132 0.973428 61.746233 1.605857 61.377600 1.616958 59.914409 1.493960 58.277801 1.524840 56.078416 1.650332 55.838269 1.646851 55.838119 1.465099 55.817393 1.486096 55.797584 1.485281 55.797474 1.484138 55.797358 1.487882 55.797112 1.494567 55.796621 1.528595 55.796294 1.527649 55.796149 1.528173 55.796040 1.528254 55.796031 1.522826 55.795815 1.518612 55.795807 1.518904 55.795807 1.520647 55.795804 1.520482 55.795804 Sum of Squares 22234.393396 55.795804 22290.189200 994.230779 Mean Square 5558.598349 3.719720 Normal Probability Plot 2.5+ ++*+++* | +*+*+* | * ***+*** | * * **++++ | ++++++ | ++++++* | +++++ * -4.5++++ * +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 275 (ix) Nonlinear least squares iterative phase, nonlinear least squares summary statistics and normal probability plot for the Weibull growth model Iter α β 0 38.500000 1.000000 1 36.693634 -5.578793 2 37.186435 -3.876103 3 37.221938 -6.028528 4 37.164563 -5.576102 5 37.189684 -5.807271 6 37.189151 -5.798332 7 37.189417 -5.802804 8 37.189421 -5.802874 9 37.189421 -5.802874 10 37.247139 -5.802853 11 37.307196 -5.165007 12 37.311748 -5.235857 13 37.316802 -5.302401 14 37.314270 -5.268956 15 37.314591 -5.273087 16 37.327455 -5.200161 17 37.323384 -5.245264 18 37.323416 -5.245274 Note: Convergence criterion met Source Regression Residual Uncorrected Total (Corrected Total) DF 4 15 19 18 κ 0.250000 0.389093 0.351401 0.399031 0.398703 0.392160 0.359488 0.349130 0.348513 0.348513 0.348535 0.340476 0.341665 0.342307 0.342075 0.342118 0.341330 0.341550 0.341543 δ Sum of Squares 1.250000 117.719064 1.250000 75.335040 1.349939 74.152620 1.201141 71.303092 1.175434 70.975680 1.203346 70.957721 1.304608 70.906026 1.341709 70.904658 1.344404 70.904630 1.344404 70.904630 1.344311 70.801907 1.344214 70.697946 1.344206 70.687157 1.344198 70.686437 1.344202 70.686418 1.344202 70.686417 1.344181 70.686224 1.344188 70.685275 1.344188 70.685275 Sum of Squares 22219.503925 70.685275 22290.189200 994.230779 Mean Square 5554.875981 4.712352 Normal Probability Plot 2.5+ +++*+ * | ***+*+*+* | * ***++++ | **+++ | +*+*+ | ++++* | +++++ * -4.5++++++ * +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 276 (x) Nonlinear least squares iterative phase, nonlinear least squares summary statistics and normal probability plot for the Morgan-Mercer-Flodin growth model Iter α β 0 38.500000 1.000000 1 38.249392 3.869795 2 37.230205 12.119868 3 37.326554 8.955990 4 37.023578 10.832959 5 36.871996 11.072769 6 36.889486 11.362276 7 37.132723 11.456073 8 37.181558 11.492333 9 37.206113 11.417663 10 37.202889 11.529857 11 37.203170 11.537770 12 37.206314 11.534682 13 37.205619 11.511673 14 37.202367 11.523844 15 37.202640 11.523035 16 37.202705 11.526734 17 37.203101 11.526568 18 37.203064 11.526166 19 37.203431 11.522900 20 37.203295 11.523517 21 37.203293 11.523466 Note: Convergence criterion met. Source Regression Residual Uncorrected Total (Corrected Total) DF 4 15 19 18 κ 0.500000 0.449415 0.341647 0.379149 0.369296 0.363338 0.357945 0.351383 0.352844 0.353986 0.353840 0.353702 0.353656 0.353584 0.353436 0.353450 0.353388 0.353383 0.353379 0.353425 0.353417 0.353418 δ Sum of Squares 2.000000 89.336065 2.191960 81.251936 2.869364 69.940937 2.957411 64.622283 3.546338 62.432376 3.451139 62.327671 3.454736 62.112501 3.562961 61.132272 3.487270 60.924105 3.422928 60.896587 3.428889 60.888174 3.429581 60.888066 3.429003 60.887743 3.426838 60.886582 3.436280 60.886174 3.435540 60.886160 3.435925 60.886156 3.435800 60.886154 3.435800 60.886153 3.434246 60.886148 3.434779 60.886146 3.434784 60.886146 Sum of Squares 22229.303054 60.886146 22290.189200 994.230779 Mean Square 5557.325763 4.059076 Normal Probability Plot 2.5+ +++*++ * | +*+*+* | ***+*** | * * ***+++ | +++++ | +++++* | +++++ * -4.5+++++ * +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 277 (xi) Nonlinear least squares iterative phase, nonlinear least squares summary statistics and normal probability plot for the Chapman-Richards growth model (without initial stage) Iter α β 0 38.500000 0.560000 1 33.993720 0.513276 2 35.792623 0.530343 3 49.643601 0.696943 4 50.538640 0.707388 5 47.031521 0.670853 6 42.550992 0.648050 7 42.997201 0.670309 8 42.652712 0.657826 9 43.028836 0.699233 10 40.845932 0.861524 11 38.961788 0.956035 12 38.581442 0.973301 13 38.306085 0.995402 14 38.026912 1.022873 15 37.062580 1.070834 16 36.413954 1.051037 17 36.408496 1.050035 18 35.893254 0.913519 19 35.908290 0.909064 20 35.887363 0.909787 21 35.851319 0.894635 22 35.810845 0.765624 23 35.810683 0.768957 24 35.831330 0.835459 25 35.818542 0.785766 26 35.811810 0.735750 27 35.816218 0.739790 28 35.800050 0.679127 29 35.808517 0.674202 30 35.804077 0.685928 31 35.848106 0.484023 32 35.846885 0.489853 33 35.850257 0.492709 34 35.850251 0.492708 Note: Convergence criterion met. Source Regression Residual Uncorrected Total (Corrected Total) DF 4 15 19 18 κ 0.045000 0.062003 0.053794 0.072607 0.074171 0.087675 0.120025 0.128805 0.122262 0.130870 0.145267 0.166807 0.170144 0.173753 0.176798 0.206045 0.264899 0.265892 0.314263 0.312353 0.315521 0.330325 0.387839 0.386393 0.358676 0.374956 0.390200 0.384300 0.401744 0.395098 0.394744 0.453282 0.451648 0.448850 0.448852 δ Sum of Squares 0.080000 2468.767845 0.080000 2464.777917 0.083203 2460.989879 -0.071904 482.252977 -0.080944 474.707577 -0.080292 415.564276 -0.158240 327.059886 -0.214598 322.513467 -0.181961 322.074734 -0.282806 318.978586 -0.710588 241.188647 -0.983996 204.952110 -1.037388 204.731139 -1.083441 202.446113 -1.151367 201.205523 -1.110594 193.262002 -0.541846 183.878853 -0.530933 183.864537 -0.186842 171.741554 -0.219698 170.976538 -0.186385 170.786205 -0.059910 166.886532 0.346693 156.097171 0.338015 156.077072 0.103711 151.906854 0.229262 147.710313 0.328394 142.992947 0.272940 141.524912 0.408820 139.606336 0.327855 138.345518 0.341492 138.220997 0.653138 119.614321 0.644287 119.574830 0.615498 117.615892 0.615520 117.615891 Sum of Squares 22172.573309 117.615891 22290.189200 994.230779 Mean Square 5543.143327 7.841059 278 Normal Probability Plot 4.5+ ++++ * | +*+*+ * | * ***+* | * **+++ 0.5+ **++ | +++++ | +++* * * | +++++* -3.5+ +++++ * +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 279 (xii) Nonlinear least squares iterative phase, nonlinear least squares summary statistics and normal probability plot for the Chapman-Richards growth model (with initial stage) Iter α β 0 38.500000 0.500000 1 37.217360 0.484961 2 36.907855 0.481578 3 56.656108 0.682619 4 35.289487 0.448210 5 40.754837 0.521055 6 40.048640 0.526815 7 40.462229 0.552919 8 38.210043 0.600941 9 32.950994 0.893468 10 33.655694 0.885063 11 34.742333 0.899866 12 36.103646 0.919999 13 36.043927 0.893442 14 36.166539 0.965428 15 36.190951 0.978607 16 36.333225 0.988828 17 36.457478 0.992876 18 37.347514 0.992236 19 37.292582 0.997060 20 37.286322 0.995257 21 37.315466 0.992053 22 37.257466 0.958418 23 37.256833 0.958931 24 37.189980 0.886867 25 37.198506 0.908511 26 37.145752 0.901760 27 37.148035 0.902718 28 37.149930 0.902755 29 37.196462 0.877263 30 37.191931 0.842678 31 37.184449 0.839492 32 37.188198 0.855167 33 37.187170 0.855663 34 37.190470 0.855694 35 37.191925 0.855342 36 37.207464 0.856268 37 37.206252 0.855007 38 37.206384 0.854916 39 37.204178 0.854171 40 37.204479 0.854009 41 37.203555 0.853873 42 37.203612 0.853018 43 37.203615 0.853017 Note: Convergence criterion met. κ 0.045000 0.050819 0.052202 0.078506 0.161607 0.170883 0.156056 0.139103 0.177091 0.534869 0.540658 0.553465 0.567058 0.535999 0.549325 0.552540 0.533779 0.520870 0.423379 0.572447 0.518608 0.507655 0.507318 0.507703 0.526041 0.524380 0.537214 0.537531 0.537892 0.547903 0.551048 0.552431 0.551066 0.551220 0.550486 0.550912 0.548523 0.548815 0.548829 0.549263 0.549430 0.549784 0.549842 0.549842 δ 0.010000 0.010000 0.009990 0.004653 0.005024 0.002702 0.003065 0.002956 -0.001396 -0.031843 -0.031509 -0.032686 -0.034214 -0.032581 -0.038331 -0.040504 -0.026776 -0.003680 0.188456 0.379137 0.300896 0.300079 0.322802 0.322427 0.409870 0.391729 0.422911 0.424261 0.425508 0.459972 0.488779 0.493205 0.480301 0.480296 0.479558 0.480596 0.478032 0.479395 0.479470 0.480505 0.480889 0.481396 0.482228 0.482230 Sum of Squares 2071.048903 2031.893903 2030.732887 1152.174660 898.906215 703.919386 660.327502 642.855433 523.028032 363.140209 302.512504 230.445532 197.877670 195.366325 158.537844 154.649146 134.127691 118.898898 78.089598 69.421072 66.842736 66.687613 66.361057 66.360482 66.136845 66.011263 65.838707 65.838427 65.836593 65.707827 65.701849 65.700912 65.687721 65.687651 65.685462 65.684933 65.681859 65.681716 65.681715 65.681553 65.681490 65.681482 65.681481 65.681481 280 Source Regression Residual Uncorrected Total (Corrected Total) DF 4 16 20 19 Sum of Squares 22224.507719 65.681481 22290.189200 2059.028700 Mean Square 5556.126930 4.105093 Normal Probability Plot 2.5+ +++*++ * | * **+*+* | ** ***++ | *+**+++ | +*+*+ | +++++* | +++++ * -4.5+++++ * +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 281 (xiii) Nonlinear least squares iterative phase, nonlinear least squares summary statistics and normal probability plot for the Stannard growth model Iter α β 0 35.000000 -1.045000 1 35.640588 -2.067772 2 35.640588 -1.771936 3 35.640588 -1.373180 4 35.463236 -1.256043 5 35.499276 -1.303724 6 36.089690 -1.105630 7 36.155718 -1.139667 8 36.159860 -1.140183 9 36.892215 -1.308567 10 36.888110 -1.295091 11 36.879269 -1.278124 12 36.911693 -1.355531 13 36.884808 -1.337089 14 36.896398 -1.332579 15 36.887265 -1.336947 16 36.895198 -1.361796 17 36.896634 -1.384481 18 37.005026 -1.464346 19 37.004930 -1.464490 20 37.171606 -1.569225 21 37.167611 -1.559235 22 37.143872 -1.505202 23 37.162064 -1.508168 24 37.159311 -1.517878 25 36.992354 -1.586426 26 36.999329 -1.570217 27 36.999731 -1.569415 28 37.000228 -1.569793 29 37.025016 -1.564521 30 37.058818 -1.583146 31 37.043097 -1.578987 32 37.053282 -1.577412 33 37.053698 -1.577679 34 37.043488 -1.578802 35 37.040573 -1.579539 36 37.040666 -1.579107 37 37.040225 -1.579493 38 37.041198 -1.580190 39 37.041105 -1.579913 40 37.041753 -1.579855 41 37.041517 -1.579922 42 37.041535 -1.579912 Note: Convergence criterion met. κ 0.500000 1.061590 0.800030 0.343750 0.268235 0.295337 0.230696 0.248086 0.247860 0.290225 0.287138 0.283690 0.315211 0.306488 0.304536 0.306815 0.317937 0.326825 0.382989 0.383113 0.519864 0.633889 0.621281 0.606732 0.588598 0.522211 0.532928 0.533323 0.532195 0.552621 0.577006 0.579297 0.576183 0.576647 0.574987 0.571696 0.573094 0.573887 0.574949 0.574530 0.574665 0.574375 0.574317 δ 0.025000 0.025000 0.025000 0.025000 0.022929 0.022982 0.038723 0.039568 0.039633 0.050499 0.051679 0.058741 0.276153 0.230527 0.219595 0.211809 0.236663 0.246988 0.368672 0.368919 0.615387 0.804502 0.783638 0.763620 0.729979 0.555381 0.579347 0.580369 0.578453 0.623102 0.664950 0.669078 0.664059 0.664729 0.659680 0.653109 0.655796 0.657077 0.658573 0.658080 0.658451 0.657871 0.657769 Sum of Squares 519.027261 512.087550 354.520684 104.516052 102.709378 95.856199 85.575323 83.109861 82.957766 75.117621 75.067117 74.687433 64.390085 62.107130 61.857148 61.564232 61.089886 60.930184 59.712098 59.712096 58.473657 56.638522 56.454211 56.429117 56.315092 55.998533 55.872298 55.871923 55.870812 55.820573 55.810653 55.799853 55.798769 55.798768 55.796004 55.795953 55.795872 55.795837 55.795835 55.795806 55.795803 55.795799 55.795799 282 Source DF Regression 4 Residual 15 Uncorrected Total 19 (Corrected Total) 18 Sum of Squares 22234.393401 55.795799 22290.189200 994.230779 Mean Square 5558.598350 3.719720 Normal Probability Plot 2.5+ ++*+++* | +*+*+* | * ***+*** | * * **++++ | ++++++ | ++++++* | +++++ * -4.5++++ * +----+----+----+----+----+----+----+----+----+----+ -2 -1 0 +1 +2 283 Appendix G The parameters estimate using multiple linear regression for MNC as independent variables for inland area Station B Std. Error T value (df) Sig R2 F value/ df 0.392 73.190 (2, 240) ILD1 (Constant) -20.311 P 342.077 CA -14.697 4.103 28.326 2.573 -4.950 (1) 12.076 (1) -5.712 (1) 0.000 0.000 0.000 ILD2 (Constant) 5.707 MG -53.642 P 201.609 K -6.298 N 3.039 4.120 4.308 31.780 1.355 1.500 1.385 (1) -12.453 (1) 6.344 (1) -4.647 (1) 2.026 (1) 0.167 0.000 0.000 0.000 0.043 0.422 93.332 (4, 522) ILD3 (Constant) -30.657 N 21.679 CA 7.863 K -7.738 MG 15.727 7.254 1.924 0.815 2.343 7.535 -4.226 (1) 11.268 (1) 9.648 (1) -3.303 (1) 2.087 (1) 0.000 0.000 0.000 0.001 0.038 0.404 46.235 (4, 282) ILD4 (Constant) N CA P K MG -5.806 8.207 11.621 41.336 -4.932 -7.476 4.212 1.124 2.162 10.595 1.491 3.072 -1.379 (1) 7.303 (1) 5.376 (1) 3.902 (1) -3.307 (1) -2.433 (1) 0.169 0.000 0.000 0.000 0.001 0.015 0.185 28.322 (5, 625) ILD5 (Constant) -15.529 N 12.466 K 4.321 3.329 1.500 1.896 -4.665 (1) 8.311 (1) 2.279 (1) 0.000 0.000 0.024 0.400 64.574 (2, 204) ILD6 (Constant) 26.527 CA -14.099 P 286.344 N -14.127 5.982 3.447 40.512 2.077 4.434 (1) -4.091 (1) 7.068 (1) -6.802 (1) 0.000 0.000 0.000 0.000 0.317 25.254 (3, 164) ILD7 (Constant) 7.008 N 7.131 P -37.198 CA 5.650 2.651 0.891 12.875 2.337 2.643 (1) 8.005 (1) -2.889 (1) 2.417 (1) 0.008 0.000 0.004 0.016 0.118 22.542 (3, 536) ILDT (Constant) -1.007 N 6.782 P 48.554 MG -11.788 CA 4.130 1.346 0.465 7.128 1.353 0.589 -0.748 14.575 (1) 6.812 (1) -8.713 (1) 7.012 (1) 0.455 0.000 0.000 0.000 0.000 0.148 112.422 (4, 2598) 284 Appendix H The parameters estimate using multiple linear regression for MNC as independent variables for coastal area Station CLD1 B Std. Error T value (df) Sig R2 F value/ df 0.380 32.725 (4, 238) Constant K Mg P Ca -44.721 30.526 71.737 198.541 -11.440 9.684 5.361 10.890 60.920 4.796 -4.618 (1) 5.694 (1) 6.588 (1) 3.259 (1) -2.385 (1) 0.000 0.000 0.000 0.001 0.018 Constant Mg N Ca 0.189 22.750 10.443 -7.509 6.732 4.168 2.458 2.779 0.028 (1) 5.459 (1) 4.248 (1) -2.702 (1) 0.978 0.000 0.000 0.007 0.171 15.344 (3, 239) Constant N Mg -9.901 17.664 -18.550 6.258 1.925 6.314 -1.582 (1) 9.178 (1) -2.938 (1) 0.118 0.000 0.004 0.687 81.083 (2, 78) Constant Ca 18.262 12.794 2.472 4.095 7.387 (1) 3.124 (1) 0.000 0.002 0.050 9.762 (1, 322 Constant Ca N P 16.528 11.367 -10.294 185.701 10.879 5.032 3.055 61.097 1.519 (1) 2.259 (1) -3.370 (1) 3.039 (1) 0.130 0.025 0.001 0.003 0.111 8.728 (3, 238) Constant Ca N 40.988 -12.548 -3.330 4.703 2.935 1.477 8.715 (1) -4.275 (1) -2.254 (1) 0.000 0.000 0.025 0.043 9.137 (2, 509) CLD7 (Constant) 6.810 P 66.648 K 8.788 Mg -25.001 N 4.804 4.335 17.194 1.555 4.835 1.421 1.571 (1) 3.876 (1) 5.652 (1) -5.171 (1) 3.380 (1) 0.117 0.000 0.000 0.000 0.001 0.231 36.897 (4, 667) CLDT (Constant) 8.998 P 128.949 Mg -6.424 2.188 14.067 1.852 4.113 (1) 9.167 (1) -3.469 (1) 0.000 0.000 0.001 0.044 44.460 (2, 2315) CLD2 CLD3 CLD4 CLD5 CLD6 285 Appendix I 1.00 1.00 .75 .75 .50 .50 Expected Cum Prob Expected Cum Prob Normal probability plot of multiple linear regression for the inland area .25 0.00 0.00 .25 .50 .75 .25 0.00 1.00 0.00 Observed Cum Prob .25 .50 .75 1.00 Observed Cum Prob ILD1 ILD2 1.00 1.00 .75 .75 .50 Expected Cum Prob Expected Cum Prob .50 .25 .25 0.00 0.00 .25 0.00 .25 .50 .75 1.00 1.00 .50 .75 1.00 ILD4 ILD3 1.00 1.00 .75 .75 .50 .50 Expected Cum Prob Expected Cum Prob .75 Observed Cum Prob Observed Cum Prob .25 0.00 0.00 .25 .50 .75 .25 0.00 0.00 1.00 .25 Observed Cum Prob Observed Cum Prob ILD5 ILD6 1.00 1.00 .75 .75 .50 .50 Expected Cum Prob Expected Cum Prob .50 0.00 .25 0.00 0.00 .25 .50 .75 1.00 .25 0.00 0.00 .25 Observed Cum Prob Observed Cum Prob ILD7 ILDT .50 .75 1.00 286 Appendix J 1.00 1.00 .75 .75 .50 .50 Expected Cum Prob Expected Cum Prob Normal probability plot of multiple linear regression for the coastal area .25 0.00 0.00 .25 .50 .75 .25 0.00 0.00 1.00 .25 .50 .75 1.00 Observed Cum Prob Observed Cum Prob CLD1 CLD2 1.00 1.00 .75 .75 .50 Expected Cum Prob Expected Cum Prob .50 .25 .25 0.00 0.00 0.00 0.00 .25 .50 .75 1.00 .25 CLD3 1.00 1.00 1.00 .75 .75 .50 .50 Expected Cum Prob Expected Cum Prob .75 CLD4 Observed Cum Prob .25 0.00 0.00 .25 .50 .75 .25 0.00 0.00 1.00 .25 .50 .75 1.00 Observed Cum Prob Observed Cum Prob CLD6 CLD5 1.00 1.00 .75 .75 .50 .50 .25 0.00 0.00 .25 .50 .75 1.00 Expected Cum Prob Expected Cum Prob .50 Observed Cum Prob .25 0.00 0.00 .25 .50 Observed Cum Prob CLD7 Observed Cum Prob CLDT .75 1.00 287 Appendix K The parameters estimate using multiple linear regression using MNC and NBR as independent variables for the coastal area. Variables CLD1 Constant P K Def-Mg Mg-Ca CLD2 Constant CLP Def-Mg N-K CLD3 Constant CLP TB CLD4 Constant Ca CLD5 Constant Ca N-P CLD6 Constant TLB CLD7 Constant Def-K P N-P K-CA K-MG Def-Mg Mg-Ca CLP CLDT Constant N P Mg N-P N-Mg B Std. Error T value (df) Sig. R2 F value/ df -75.969 185.968 47.134 1.358 -19.911 11.265 61.015 7.420 0.329 10.238 -6.744 (1) 3.048 (1) 6.352 (1) 4.130 (1) -1.945 (1) 0.000 0.003 0.000 0.000 0.053 0.388 33.985 (4, 214) -18.975 257.755 0.338 -1.589 8.473 52.184 0.057 0.633 -2.239 (1) 4.939 (1) 5.905 (1) -2.511 (1) 0.026 0.000 0.000 0.013 0.181 16.802 (3, 228) -18.173 356.146 -0.152 9.169 40.470 0.051 -1.982 (1) 8.800 (1) -2.983 (1) 0.051 0.000 0.004 0.676 81.451 (2, 78) 18.262 12.794 2.472 4.095 7.387 (1) 3.124 (1) 0.000 0.002 0.035 9.762 (1, 268) 46.292 11.528 -1.658 7.452 4.972 0.402 6.212 (1) 2.318 (1) -4.120 (1) 0.000 0.021 0.000 0.108 13.076 (2, 215) 36.039 -0.137 2.544 0.033 14.166 (1) -4.101 (1) 0.000 0.000 0.033 16.814 (1, 500) -0.132 3.587 529.187 4.341 -32.683 -15.947 -2.769 84.381 -530.956 17.175 0.550 103.760 .898 7.026 2.913 0.503 20.387 128.616 -0.008 (1) 6.521 (1) 5.100 (1) 4.834 (1) -4.652 (1) -5.475 (1) -5.505 (1) 4.139 (1) -4.128 (1) 0.994 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.315 28.678 (8, 498) -64.453 -24.419 701.608 -63.647 5.148 -2.371 16.704 6.562 108.298 6.922 0.959 0.272 -3.859 (1) -3.721 (1) 6.478 (1) -9.195 (1) 5.370 (1) -8.717 (1) 0.000 0.000 0.000 0.000 0.000 0.000 0.094 42.036 (5, 2023) 288 Appendix L Multiple linear regression using foliar analysis and nutrient balance ratio as independent variables for the inland area. Variable ILD1: Const. P Ca ILD2: Const. P K N-P N-Mg ILD3: Const. N K N-P N-K K-Mg ILD4: Const CLP K-Ca N-P Def-K Def-Mg ILD5: Const. P N-P CLP Mg-Ca ILD6: Const. N-P N-Ca ILD7: Const. P Mg CLP N-Ca N-Mg ILDT : Const. N P Mg N-Mg N-Ca B -20.311 342.077 -14.697 4.150 130.174 -4.285 -0.952 1.990 19.393 35.888 -41.774 0.712 -20.692 -0.484 -23.297 234.930 -14.829 -0.706 1.019 0.347 262.286 2621.417 -27.160 3724.885 5.735 64.016 -3.054 3.013 10.776 -43.857 -33.313 188.723 -0.551 -0.236 7.622 8.214 42.607 -24.641 -0.254 -0.591 Std. Error 4.103 28.326 2.573 5.718 24.067 1.185 0.265 0.149 22.051 7.864 16.246 0.304 9.556 0.181 9.688 25.959 3.564 0.150 0.332 0.154 53.064 485.151 4.813 620.871 2.901 T value (df) Sig. -4.950 (1) 12.076 (1) -5.712 (1) 0.726 (1) 5.409 (1) -3.616 (1) -3.597 (1) 13.365 (1) 0.879 (1) 4.564 (1) -2.571 (1) 2.343 (1) -2.165 (1) -2.667 (1) -2.405 (1) 9.050 (1) -4.161 (1) -4.722 (1) 3.075 (1) 2.259 (1) 4.943 (1) (1) -5.403 (1) -5.642 (1) 5.999 (1) 1.977 (1) 0.000 0.000 0.000 0.468 0.000 0.000 0.000 0.000 0.380 0.000 0.011 0.020 0.031 0.008 0.016 0.000 0.000 0.000 0.002 0.024 0.000 0.000 0.000 0.000 0.049 5.374 0.353 0.619 4.001 12.981 11.946 21.870 0.163 0.114 1.517 0.509 7.710 2.353 0.042 0.092 11.911 (1) -8.650 (1) 4.867 (1) 2.694 (1) -3.378 (1) -2.789 (1) 8.629 (1) -3.383 (1) -2.080 (1) 5.026 (1) 16.026 (1) 5.526 (1) -10.472 (1) -6.078 (1) -6.406 (1) 0.000 0.000 0.000 0.007 0.001 0.005 0.000 0.001 0.038 0.000 0.000 0.000 0.000 0.000 0.000 R2 F value/ df 0.379 73.190 (2, 240) 0.438 101.508 (4, 522) 0.478 38.949 (5, 213) 0.215 34.244 (5, 625) 0.474 45.515 (4, 202) 0.325 39.661 (2, 165) 0.132 16.218 (5, 534) 0.168 102.046 (5, 2529) 289 Appendix M The Q-Q plot for inland stations. 242 522 -10 -10 -5 0 Residuals 10 Residuals 243 240 0 5 20 10 527 -3 -2 -1 0 1 2 3 30 Quantiles of Standard Normal -3 -2 -1 ILD1 0 1 2 3 Quantiles of Standard Normal ILD2 632 Residuals -10 0 Residuals 0 5 10 10 247 -10 -20 -5 2 1 568 4 -3 -3 -2 -1 0 1 2 -2 -1 3 0 1 2 3 Quantiles of Standard Normal Quantiles of Standard Normal ILD4 20 ILD3 22462259 -10 Residuals -30 -5 -20 0 Residuals 0 5 10 10 202 168 -10 1122 -2 18 -3 -2 -1 0 1 2 0 2 Quantiles of Standard Normal 3 ILD6 Quantiles of Standard Normal ILD5 -10 5 -10 -5 -5 0 Residuals 0 Residuals 10 5 15 530 536 539 48 51 -3 29 -2 -1 0 1 Quantiles of Standard Normal -2 -1 0 1 2 Quantiles of Standard Normal ILD7 ILDT 2 3 290 Appendix N The Q-Q plot for coastal stations. Residuals 5 0 4 37 -10 -10 -5 Residuals -5 0 10 15 5 169 205 -15 1 62 -3 -3 -2 -1 0 1 2 -2 -1 0 1 2 3 Quantiles of Standard Normal 3 CLD2 Quantiles of Standard Normal CLD1 10 78 Residuals -10 -20 0 -5 Residuals 0 5 79 17 2 27 14 -2 -1 0 1 2 -3 -2 -1 Quantiles of Standard Normal 0 1 2 CLD3 CLD4 502 -10 0 -10 -5 -5 Residuals 0 Residuals 5 5 10 10 212 3 3 10 -3 -2 -1 0 1 2 -3 3 -2 -1 0 1 2 Quantiles of Standard Normal Quantiles of Standard Normal CLD6 2 -3 4 -5 537 569 7 -20 -15 -15 -10 -10 -5 Residuals 0 0 5 5 10 CLD5 Residuals 3 Quantiles of Standard Normal -2 -1 0 1 Quantiles of Standard Normal CLD7 2 3 541 -2 0 Quantiles of Standard Normal CLDT 2 3 495 291 Appendix O Example of the Matlab programming for neural network application(s) clear net; clc; load c11_coastal.txt rawdata=c11_coastal;%(randperm(243),1:6); % Randomize [R Q]=size(rawdata); P1 = rawdata(:,1); % input1: pN P1 = P1'; P2 = rawdata(:,2); % input2: pP P2 = P2'; P3 = rawdata(:,3); % input3: pK P3 = P3'; P4 = rawdata(:,4); % input4: pCa P4 = P4'; P5 = rawdata(:,5); % input4: pMg P5 = P5'; T1 = rawdata(:,6); % input4: ffb yield T1 = T1'; for i=1:R data(1,i) = P1(i); data(2,i) = P2(i); data(3,i) = P3(i); data(4,i) = P4(i); data(5,i) = P5(i); data(6,i) = T1(i); end [N , M] = size(data); for i = 1:N max(i) = data(i,1); min(i) = data(i,1); for j = 1:M if data(i,j)>max(i) max(i)=data(i,j); end if data(i,j)<min(i) min(i)=data(i,j); end end rawdata1(i,:) = ((data(i,:)-min(i))+0.01)/((max(i)-min(i))+0.01); end 292 randdata=rawdata1(1:6,randperm(219)); % Randomize P=randdata(1:5,:); T=randdata(6,:); trP = P(1:5,1:153); % Input for Train v.P = P(1:5,154:197); % Input for Validate t.P = P(1:5,198:219); % Input fot Test trT = T(1,1:153); % Output for Train v.T = T(1,154:197); % Output for Validate t.T = T(1,198:219); % Output for Test % Neural network Modelling S1=input(' please input hidden node(s)' ); % number of nodes net=newff(minmax(P),[S1 1],{'logsig','tansig'},'trainlm'); net.performFcn = 'mse'; net.trainParam.epochs=500; net.trainParam.goal=1e-6; net.trainParam.max_fail=5; net.trainParam.show=5; net=init(net); [net,tr]=train(net,trP,trT,[],[],v,t); % early stoping figure plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.epoch,tr.tperf); legend('training','validation','testing',-1); ylabel('mean squared error'); xlabel('epoch'); %Simulation an1=sim(net,trP); e1=trT-an1; an2=sim(net,v.P); e2=v.T-an2; an3=sim(net,t.P); e3=t.T-an3; 293 tramse=mse(e1) valmse=mse(e2) tesmse=mse(e3) tramae=mae(e1) valmae=mae(e2) tesmae=mae(e3) %Regression an4=sim(net,P); figure [m,b,r]=postreg(an4,T) %dinormalization data %Plotting lg = length (randdata); j = 1:lg; j1 = (1:153); j2 = (154:197); j3 = (198:219); figure plot(j,T,'k+-',j1,an1,'b+--',j2,an2,'g+--',j3,an3,'r+--'); legend('k+-','actual','b+-','training','g+-','validation','r+-','testing',-1); xlabel('data set');ylabel('yield'); 294 Appendix O (i) Graphical illustration for the best regression line of the data for the ILD1 and ILD2 stations. ILD1 ILD2 295 (ii) Graphical illustration for the best regression line of the data for the ILD3 and ILD4 stations. ILD3 ILD4 296 (iii) Graphical illustration for the best regression line of the data for the ILD5 and ILD6 stations. ILD5 ILD6 297 (iv) Graphical illustration for the best regression line of the data for the ILD7 and ILDT stations. ILD7 ILDT 298 Appendix P (i) Graphical illustration for the best regression line of the data for the CLD1 and CLD2 stations. CLD1 CLD2 299 (ii) Graphical illustration for the best regression line of the data for the CLD3 and CLD4 stations. CLD3 CLD4 300 (iii) Graphical illustration for the best regression line of the data for the CLD5 and CLD6 stations. CLD5 CLD6 301 (iv) Graphical illustration for the best regression line of the data for the CLD7 and CLDT stations. CLD7 CLDT 302 Appendix R The MSE, RMSE, MAE and MAPE values for each neural network model in the inland area. Station ILD1: LL LP LT TP TL TT ILD2: LL LP LT TP TL TT ILD3: LL LP LT TP TL TT ILD4 LL LP LT TP TL TT ILD5 LL LP LT TP TL TT ILD6 LL LP LT TP TL TT MSE Inland stations RMSE MAE MAPE 13.1079 16.1591 18.3725 17.3737 16.4507 16.9262 3.6204 4.0198 4.2863 4.1681 4.0559 4.1141 2.6677 3.0529 3.3716 3.3228 3.2195 3.2553 0.1159 0.1318 0.1460 0.1443 0.1429 0.1456 11.4734 13.0502 12.5840 13.1767 12.7739 12.6520 3.3872 3.6125 3.5473 3.6299 3.5740 3.5569 2.5910 2.8193 2.7435 2.8527 2.7885 2.7734 0.1162 0.1257 0.1244 0.1306 0.1263 0.1223 12.5471 12.8565 12.9833 11.9431 12.6355 13.2690 3.5421 3.5855 3.6032 3.4558 3.5546 3.6426 2.8732 2.8602 2.8526 2.7076 2.7736 2.8736 0.1195 0.1190 0.1126 0.1092 0.1141 0.1166 14.9820 14.0759 14.4406 14.3058 15.1920 14.7159 3.8706 3.7517 3.8000 3.7823 3.8976 3.8361 3.1113 2.9545 3.0288 2.9811 3.1389 3.0421 0.1356 0.1276 0.1304 0.1288 0.1350 0.1316 5.9225 8.3286 7.1422 6.4416 6.6394 6.1199 2.4336 2.8859 2.6724 2.5380 2.5767 2.4730 1.8398 2.1426 2.0706 1.9076 1.8867 1.8735 0.0944 0.1133 0.1067 0.0964 0.0956 0.0968 9.6650 10.0368 10.6006 9.6884 10.1928 7.0829 3.1088 3.1681 3.2558 3.1126 3.1926 2.6613 2.4421 2.5010 2.4220 2.4864 2.5638 1.7861 0.0987 0.1010 0.0992 0.1018 0.1052 0.0719 303 ILD7 LL LP LT TP TL TT ILDT LL LP LT TP TL TT 21.4372 20.4788 21.0046 20.8042 20.1759 22.1352 4.6300 4.5253 4.5830 4.5611 4.4917 4.7048 3.6377 3.5087 3.6420 3.5111 3.5242 3.7578 0.1674 0.1580 0.1691 0.1642 0.1614 0.1747 2.9298 2.8263 2.8168 2.9255 2.8642 2.8567 1.7116 1.6811 1.6783 1.7104 1.6923 1.6901 1.3014 1.2689 1.2776 1.3086 1.2870 1.2984 0.0576 0.0560 0.0564 0.0578 0.0570 0.0573 304 Appendix S The MSE, RMSE, MAE and MAPE values for each neural network model in the coastal area. Station CLD1 LL LP LT TP TL TT CLD2 LL LP LT TP TL TT CLD3 LL LP LT TP TL TT CLD4 LL LP LT TP TL TT CLD5 LL LP LT TP TL TT CLD6 LL LP LT TP TL TT MSE Coastal Stations RMSE MAE MAPE 18.6614 15.9936 17.2149 18.4294 15.1412 19.6718 4.3243 3.9991 4.1490 4.2929 3.8911 4.4353 3.3415 3.0397 3.1762 3.2861 2.8715 3.4314 0.1456 0.1337 0.1398 0.1437 0.1220 0.1461 6.8989 7.5109 6.8896 6.0799 7.7396 7.2305 2.6265 2.7406 2.6248 2.4657 2.7820 2.6889 2.1306 1.9190 2.1182 1.9766 2.3227 2.1741 0.0725 0.0638 0.0717 0.0665 0.0779 0.0743 6.0158 6.6892 6.0254 6.0634 6.7345 6.6433 2.4527 2.5863 2.4546 2.4623 2.5950 2.5774 1.9229 1.7651 1.8708 1.7934 2.0811 1.9708 0.0756 0.0657 0.0750 0.0688 0.0844 0.0789 12.7014 14.8279 13.2776 15.6767 14.8006 12.6890 3.5639 3.8507 3.6438 3.9593 3.8471 3.5621 2.6862 2.9406 2.6550 3.0043 2.7499 2.6248 0.1126 0.1298 0.1118 0.1282 0.1163 0.1125 18.8531 19.0369 19.9523 18.5634 19.5699 19.7144 4.3420 4.3631 4.4667 4.3085 4.4237 4.4400 3.3444 3.4643 3.4873 3.3054 3.5144 3.4900 0.1488 0.1498 0.1578 0.1431 0.1530 0.1514 15.8088 15.2817 15.0867 15.2480 16.4554 16.8372 3.9760 3.9091 3.8841 3.9048 4.0565 4.1033 3.1577 3.0726 3.0213 3.0911 3.2557 3.1893 0.1279 0.1237 0.1232 0.1274 0.1335 0.1302 305 CLD7 LL LP LT TP TL TT CLDT LL LP LT TP TL TT 12.8178 13.0888 13.2882 11.6043 12.5051 13.1083 3.5801 3.6178 3.6452 3.4065 3.5362 3.6205 2.8541 2.8347 2.8827 2.6659 2.8061 2.8880 0.1066 0.1086 0.1088 0.1003 0.1056 0.1103 20.8289 20.0818 21.2836 22.1141 21.7819 21.2177 4.5638 4.4812 4.6134 4.7025 4.6671 4.6062 3.6248 3.5495 3.6542 3.7652 3.7182 3.6741 0.1499 0.1456 0.1516 0.1572 0.1531 0.1518 306 Appendix T (i) The calculation of total profit (RM) for the ILD3 and ILD4 stations according to each radius. Fertiliser level (kg/palm/year) Station ILD3 ILD4 Estimated Total Radius N P K Mg FFB yield Profit 0.0 4.7800 0.9000 4.1000 - 29.7351 7429.485 0.1 5.081 0.9358 4.2057 - 29.8744 7421.667 0.2 5.3312 0.9707 4.4629 - 29.9887 7389.767 0.3 5.4781 1.0025 4.8707 - 30.0938 7343.894 0.4 5.5525 1.0305 5.3158 - 30.2047 7301.802 0.5 5.5958 1.0560 5.7555 - 30.3281 7267.386 0.6 5.6247 1.0800 6.1862 - 30.4667 7240.201 0.7 5.6461 1.1035 6.6097 - 30.6212 7219.431 0.8 5.6629 1.1264 7.0282 - 30.7923 7204.67 0.9 5.6768 1.1488 7.4428 - 30.9803 7195.667 1.0 5.6887 1.1711 7.8547 - 31.1853 7192.161 0.0 4.0000 1.8750 4.2000 1.4400 25.8575 6999.028 0.1 4.2044 1.8790 4.3248 1.4542 25.9351 6983.408 0.2 4.3186 1.8998 4.5724 1.4448 26.0419 6969.688 0.3 4.3397 1.9274 4.8514 1.4133 26.1475 6962.206 0.4 4.3373 1.9539 5.1096 1.379 26.2814 6969.527 0.5 4.3284 1.9795 5.3557 1.3446 26.446 6989.154 0.6 4.3168 2.0046 5.5953 1.3104 26.642 7020.059 0.7 4.3039 2.0294 5.831 1.2763 26.8696 7061.783 0.8 4.2901 2.0539 6.0641 1.2423 27.129 7114.161 0.9 4.2758 2.0783 6.2956 1.2083 27.4202 7177.004 1.0 4.2611 2.1026 6.5259 1.1745 27.7434 7250.288 307 (ii) The calculation of total profit (RM) for the ILD5 and ILD6 stations according to each radius Fertiliser level (kg/palm/year) Estimated Station ILD5 ILD6 Radius N P K Mg FFB yield Total Profit 0.0 1.3650 2.2750 2.2750 - 29.5075 7889.188 0.1 1.4014 2.1037 2.4119 - 29.6488 7916.833 0.2 1.4936 1.9752 2.5417 - 29.7814 7934.744 0.3 1.6307 1.9119 2.6462 - 29.9233 7950.476 0.4 1.7757 1.8845 2.7298 - 30.0891 7973.126 0.5 1.9188 1.8712 2.803 - 30.2842 8005.052 0.6 2.0595 1.8646 2.8706 - 30.5106 8046.636 0.7 2.1983 1.8615 2.9352 - 30.7691 8097.879 0.8 2.3358 1.8604 2.9977 - 31.0602 8158.823 0.9 2.4723 1.8606 3.0589 - 31.384 8229.395 1.0 2.6081 1.8619 3.1192 - 31.7407 8309.577 0.0 4.6700 1.8750 3.9750 2.0800 24.4806 5675.753 0.1 4.9085 1.8581 4.0471 2.1299 24.7152 5704.79 0.2 5.1234 1.8453 4.1449 2.1918 24.9275 5724.58 0.3 5.3048 1.8384 4.2665 2.2679 25.1228 5737.59 0.4 5.4476 1.8388 4.4022 2.3587 25.3072 5747.368 0.5 5.5539 1.8479 4.539 2.4628 25.4866 5757.348 0.6 5.6308 1.8661 4.6654 2.5774 25.6659 5770.159 0.7 5.6859 1.8934 4.7744 2.6996 25.8489 5787.439 0.8 5.7256 1.9291 4.8636 2.8265 26.0384 5810.035 0.9 5.7546 1.9716 4.9333 2.9559 26.2365 5838.355 1.0 5.7761 2.0197 4.9856 3.0861 26.4447 5872.447 308 (iii) The calculation of total profit (RM) for the ILD7 station according to each radius Fertiliser level (kg/palm/year) Estimated Station ILD7 Radius N P K Mg FFB yield Total Profit 0.0 4.3250 2.2750 3.4750 1.7400 24.4803 5792.874 0.1 4.5706 2.3088 3.4393 1.7512 24.7287 5841.644 0.2 4.8201 2.3354 3.4304 1.7757 24.9778 5885.423 0.3 5.0655 2.3536 3.4351 1.8127 25.2337 5928.851 0.4 5.3021 2.3635 3.447 1.8623 25.5011 5974.67 0.5 5.5273 2.3664 3.4628 1.9225 25.7835 6024.754 0.6 5.7406 2.3636 3.4806 1.9909 26.0841 6080.512 0.7 5.9427 2.3564 3.4994 2.065 26.405 6142.797 0.8 6.1351 2.3461 3.5186 2.1431 26.7479 6212.125 0.9 6.3192 2.3333 3.538 2.2241 27.1141 6288.832 1.0 6.4966 2.3188 3.5574 2.3068 27.5043 6373.06 309 Appendix U (i) The calculation of total profit (RM) for the CLD1 and CLD2 stations in the coastal areas according to each radius Fertiliser level (kg/palm/year) Station CLD1 CLD2 Estimated Total Radius N P K Mg FFB yield Profit 0.0 1.8200 1.8200 2.7300 1.8200 27.2633 6975.318 0.1 1.8872 1.894 2.9434 1.7663 27.4295 6986.194 0.2 1.9709 1.9481 3.1626 1.7194 27.6082 6998.702 0.3 2.067 1.985 3.3833 1.6779 27.8022 7014.663 0.4 2.1718 2.0082 3.6028 1.6405 28.0132 7035.249 0.5 2.2825 2.0209 3.8199 1.6061 28.2428 7061.291 0.6 2.3975 2.026 4.0342 1.5739 28.4919 7093.169 0.7 2.5152 2.0253 4.2457 1.5433 28.7111 7116.768 0.8 2.6351 2.0202 4.4546 1.514 29.0508 7175.367 0.9 2.7565 2.0117 4.6611 1.4856 29.3615 7225.931 1.0 2.8791 2.0007 4.8655 1.4579 29.6934 7282.87 0.0 1.8200 1.8200 1.3600 1.8200 29.9191 7939.661 0.1 1.8745 1.9835 1.9835 1.8645 30.0254 7859.443 0.2 1.9414 2.1366 2.1366 1.913 30.1287 7845.839 0.3 2.0145 2.2792 2.2792 1.9627 30.2305 7833.232 0.4 2.0906 2.4108 2.4108 2.0117 30.3319 7822.558 0.5 2.1675 2.5315 2.5315 2.0592 30.4344 7814.529 0.6 2.244 2.6422 2.6422 2.1044 30.5389 7809.421 0.7 2.3196 2.7439 2.7439 2.1474 30.6462 7807.296 0.8 2.3937 2.8381 2.8381 2.1882 30.757 7808.106 0.9 2.4665 2.9259 2.9259 2.2271 30.8718 7811.717 1.0 2.5379 3.0086 3.0086 2.2644 30.9909 7817.925 310 (ii) The calculation of total profit (RM) for the CLD3 and CLD4 stations in the coastal areas according to each radius Fertiliser level (kg/palm/year) Estimated Station CLD3 CLD4 Radius N P K Mg FFB yield Total Profit 0.0 3.6400 1.8200 3.6400 1.6200 28.2359 6959.635 0.1 4.0036 1.8121 3.6367 1.8200 28.6748 7030.195 0.2 4.3673 1.8053 3.6425 1.8256 29.0506 7100.774 0.3 4.7298 1.8003 3.6652 1.8450 29.3638 7149.483 0.4 5.081 1.8008 3.7274 1.9124 29.6169 7171.094 0.5 5.3372 1.8192 3.8738 2.127 29.8253 7161.207 0.6 5.4454 1.8477 4.0349 2.4012 30.0304 7156.518 0.7 5.5045 1.8741 4.172 2.6447 30.2560 7169.401 0.8 5.5479 1.8986 4.2952 2.8671 30.5079 7195.708 0.9 5.5844 1.9218 4.4104 3.0766 30.7880 7233.378 1.0 5.6169 1.9442 4.5206 3.2781 31.0969 7281.329 0.0 3.6400 1.8200 3.6400 1.8200 30.9592 7723.786 0.1 3.8391 1.9564 3.6175 1.8868 31.0200 7709.367 0.2 4.1241 2.0637 3.5584 1.9331 31.0846 7696.572 0.3 4.4415 2.1512 3.4809 1.9663 31.1577 7688.178 0.4 4.7692 2.2279 3.3953 1.9927 31.2409 7684.185 0.5 5.1002 2.2984 3.3058 2.0153 31.3352 7684.389 0.6 5.4324 2.3654 3.2141 2.0356 31.4408 7688.494 0.7 5.7648 2.4299 3.1211 2.0544 31.5581 7696.443 0.8 6.0973 2.4929 3.0272 2.0722 31.6869 7708.019 0.9 6.4297 2.5547 3.9327 2.0893 31.8276 7577.663 1.0 6.7619 2.6157 2.8378 2.1059 31.9800 7742.055 311 (iii) The calculation of total profit (RM) for the CLD5 and CLD6 stations in the coastal areas according to each radius Fertiliser level (kg/palm/year) Estimated Station CLD5 CLD6 Radius N P 0.0 2.7300 4.5500 0.1 2.8881 0.2 K Mg FFB yield Total Profit 9.1000 4.5500 26.0478 5162.702 4.7353 8.911 4.2429 26.2357 5247.941 3.0149 4.7345 8.8843 3.8037 26.4774 5352.977 0.3 3.1188 4.6571 8.8651 3.3592 26.8602 5505.119 0.4 3.2131 4.5551 8.8422 2.9228 27.2298 5655.665 0.5 3.3027 4.4429 8.8168 2.4925 27.7504 5850.55 0.6 3.3897 4.3256 8.7898 2.0661 28.369 6074.076 0.7 3.4752 4.2053 8.7616 1.6423 29.0858 6326.131 0.8 3.5596 4.0831 8.7328 1.2204 29.9012 6606.707 0.9 3.6433 3.9596 8.7034 0.7998 30.8152 6915.787 1.0 3.7266 3.8353 8.6736 0.3801 31.8279 7253.349 0.0 2.725 1.365 3.4100 - 32.6262 8541.086 0.1 2.9364 1.3988 3.2121 - 33.0906 8680.256 0.2 3.1074 1.4489 2.9718 - 33.5074 8814.959 0.3 3.2377 1.5115 2.7024 - 33.8947 8948.736 0.4 3.3357 1.5816 2.4183 - 34.2664 9082.954 0.5 3.4111 1.6556 2.1284 - 34.6314 9218.124 0.6 3.4712 1.7317 1.8368 - 34.9953 9354.639 0.7 3.5207 1.8088 1.5453 - 35.3614 9492.779 0.8 3.5628 1.8864 1.2545 - 35.7317 9632.742 0.9 3.5995 1.9642 0.9646 - 36.1077 9774.748 1.0 3.6321 2.0422 0.6756 - 36.4902 9918.895 312 (iv) The calculation of total profit (RM) for the CLD7 station in the coastal areas according to each radius Fertiliser level (kg/palm/year) Estimated Station CLD7 Radius N P 0.0 2.7250 1.8200 0.1 2.9651 0.2 K Mg FFB yield Total Profit 3.4100 - 31.4923 8186.494 1.8948 3.4892 - 31.7818 8229.529 3.1851 1.9985 3.5571 - 32.0639 8272.324 0.3 3.3774 2.1309 3.611 - 32.3452 8317.951 0.4 3.5395 2.2864 3.6506 - 32.6327 8369.066 0.5 3.6743 2.4568 3.6781 - 32.9326 8427.349 0.6 3.7877 2.6356 3.6965 - 33.2493 8493.435 0.7 3.8852 2.8185 3.7084 - 33.5861 8567.606 0.8 3.9711 3.0035 3.7158 - 33.9447 8649.751 0.9 4.0482 3.1891 3.7199 - 34.3267 8739.965 1.0 4.1189 3.3749 3.7216 - 34.7329 8838.131