Scotia Weather Services, Inc. – NSERC Engage Project Report Research Report Predictive Analytics for Melon Harvest Forecasting for Scotia Weather Services, Inc. Dartmouth , NS funded by Natural Sciences and Engineering Research Council Engage Grant Daniel L. Silver, Ph.D. Geoffrey Mason, B.Sc. Acadia University Lasted Revised: 2/9/16 8:26 PM 1 Scotia Weather Services, Inc. – NSERC Engage Project Report Contents Glossary of Terms ................................................................................................................................. 4 1. Introduction .................................................................................................................................... 5 2. Summary of IRAP Project ........................................................................................................... 7 2.1 Business Understanding ...................................................................................................................... 7 2.2 Data Understanding ............................................................................................................................ 8 2.3 Data Preparation ................................................................................................................................. 8 2.3.1 Melon Harvest Data......................................................................................................................... 9 2.3.2 Weather Data ................................................................................................................................ 10 2.4 Data Analysis ..................................................................................................................................... 10 2.5 Modeling ........................................................................................................................................... 11 2.5.1 Model Structure ............................................................................................................................. 11 2.5.2 Model Construction ....................................................................................................................... 11 2.6 Model Evaluation .............................................................................................................................. 13 3. Overview of Work Completed During the NSERC Engage Project .............................. 14 4. Data Preparation and Analysis ............................................................................................... 15 5. Development of Statistical Models to Forecast Future Melon Harvest .................... 15 6. Ongoing Acquisition of Data and Transmission of Melon Harvest Forecasts ........ 17 6.1 Weather Information ........................................................................................................................ 17 6.1.1 Acquisition of Observed Weather Data ...................................................................................... 17 6.1.2 Acquisition of Weather Forecasts............................................................................................... 18 6.2 Acquisition of Observed Melon Data ................................................................................................ 18 6.3 Transmission of Melon Harvest Forecasts ........................................................................................ 19 6.4 Capture of Field Data ............................................................................................................................ 19 7. Evaluation of 2014 Predictions .............................................................................................. 21 7.1 7.2 Melon Harvest Predictions ................................................................................................................ 21 Weather Predictions.......................................................................................................................... 23 8. Summary and Recommendations .......................................................................................... 24 8.1 Summary ........................................................................................................................................... 25 8.1.1 Determine Prediction Variables .................................................................................................. 25 8.1.2 Build and Test Predictive Models ............................................................................................... 25 8.1.3 Install and Test Equipment ......................................................................................................... 25 8.1.4 Evaluate Predictions for 2014..................................................................................................... 26 8.2 Recommendations ............................................................................................................................ 26 9. Requirements for Automated Forecasting System .......................................................... 27 10. References ................................................................................................................................... 27 Lasted Revised: 2/9/16 8:26 PM 2 Scotia Weather Services, Inc. – NSERC Engage Project Report Appendices ............................................................................................................................................ 28 A. B. C. Meta Data Reports .............................................................................................................................. 28 Observed Melon and Weather Data for 2010 - 2012.......................................................................... 29 Model Building Results ........................................................................................................................ 31 Results from Accumulation Models ........................................................................................................ 31 D. 2014 Melon Prediction Accuracy ........................................................................................................ 32 E. 2014 Weather Prediction Accuracy ..................................................................................................... 32 Lasted Revised: 2/9/16 8:26 PM 3 Scotia Weather Services, Inc. – NSERC Engage Project Report Glossary of Terms NSCC – Nova Scotia Community College EC – Environment Canada; normally refers to weather data VF – Vermuellen Farms Ltd, ownd and operated by Andy Vermuellen, Canning, Nova Scotia SWS - Scotia Weather Services Inc, of Dartmouth, Nova Scotia GDD – Growing degree days are a measure of the energy absorbed by the plant throughout the growing season. Most crops become ripe for harvesting once the GDDs accumulated from the time of planting exceed a certain value. Formula: GDD = [ (Max temp for day) + (Min temp for day) ]/2 – (Base temp); where negative values are treated as zero, “Base temp” for cantaloupes it is 10°C. MSE – The mean squared error is the sum of the squared error of each data point, divided by the number of data points; a standard metric for comparing the accuracies of statistical models. Neural Network – A statistical model built using historical data and used to predict future values; each example has an input variables and a target output variable; and the model is constructed to minimize the MSE between the predicted output and the target output over all training examples. A test set is used to independently evaluate the models accuracy. Persistence Model – A persistence model is a simple model that predicts that the next value will be the same as the current value. These models often used a simple baseline model to judge performance. Lasted Revised: 2/9/16 8:26 PM 4 Scotia Weather Services, Inc. – NSERC Engage Project Report 1. Introduction Success in the harvest of crops and their delivery to market depends on knowing when the produce is ready to be picked and the planning of human and material resources to maximize crop yield [1]. Cantaloupe melons are particularly challenging, as the harvest period is short and the required human and transportation resources demanding [2]. This requires careful planning and logistics based on the best possible prediction of crop yield. Scotia Weather Services (SWS) wishes to extend their abilities in weather forecasting to provide forecasts of crop yield to the agri-food industry - from farmers to markets [3,4,5,6]. This project takes a data mining approach and uses machine learning technology to develop forecast models that predict melon harvest (number of melons) per day up to at least two weeks in advance. The predictive models forecast daily harvest given the current day of the year, observed and forecasted weather data and field information [7,8,9]. Vermeulen Farms (VF) of Canning, NS provided domain expertise and melon harvest data from 2007 to 2013. In addition, we obtained (1) observed weather data from Environment Canada as well as an existing Nova Scotia Community College (NSCC) weather station proximal to Vermeulen Farms, and (2) high quality weather forecast data from Environment Canada and Scotia Weather Services for two weeks in advance. The major challenge anticipated in the project was making accurate harvest predictions out over the two week period so that the advance information would be useful for farm and market planning. This NSERC Engage project extended a collaboration originally developed during a small NRC-IRAP CtoOrg project with SWS that confirmed that it is indeed possible to predict melon yield from weather data [8,9]. During the project that occurs during and after the 2014 growing season, we will improved on the quality of the data, refined our forecasting models and tested them with live data using a semi-automated approach that delivered forecasts to Vermeulen Farms by email. We Lasted Revised: 2/9/16 8:26 PM 5 Scotia Weather Services, Inc. – NSERC Engage Project Report also installed and tested automated weather and photography equipment during the 2014 harvest season so as to more accurately record field data. Based on the results of this project, we anticipate collaborating with SWS to develop the requirements for a fully automated forecasting system. The long-term goal is to develop a cloudbased harvest forecasting system for a variety of crops. This project is a significant step toward SWS being able to develop a unique and innovative crop forecasting service in Canada – ultimately benefiting SWS and the agricultural industry as a whole. Objectives: The combined NRC-IRAP and NSERC Engage projects had four objectives: (1) To determine the best way to formulate the data mining problem and the most important predictor variables. (2) To build more accurate predictive models and test them live during the 2014 growing season using a semiautomated delivery method that sends forecasts to Vermeulen Farms by email. The predictive model will forecast daily harvest up to 14 days in advance given the day of the year, observed and forecasted weather data and field information as inputs. (3)To install and test automated weather and photography equipment in the field so as to more accurately record field data during the 2014 season. (4) To develop the requirements for a fully automated forecasting system to be developed in the next phase of the research. Approach: A standard data mining approach was used to develop and test the predictive models. Two Research Assistants were hired – a data collection RA and data mining RA. Beginning with the knowledge gained from the IRAP project we enriched the data we have with additional soil temperature and moisture and solar radiation data from the weather station as well as high quality daily weather forecast data from SWS. The historic melon yield data from Vermeulen Farms was also reviewed and enhanced. The data mining RA used this data to develop and test artificial neural network models that predict the crop yield for up to 14 days in advance. A confidence interval was estimated for each daily prediction. Beginning in mid-July the best model was used to make live forecasts for the 2014 growing season. A semi-automated approach was taken, whereby daily observed weather and new weather forecasts, obtained from the weather station and SWS, respectfully, were used to make yield predictions. These daily forecasts were delivered to Vermeulen Farms by email for their decision making. The machine learning methods we employed are tried and proven in the literature [6]. Innovation occurred in the areas of data representation, the handling of missing weather values and model development for predictions up to two weeks in advance. Following the final harvests in September, the results were analyzed and this report was prepared by the team. The data collection RA collected the raw weather data from the weather station and prepared it for use through data mining. Because of missing data in the historic records, work had to be done to impute values based on best estimates from nearby weather stations. Two new cameras (with temperature loggers) were set up in the melon field. This new equipment provided a means of measuring the temperature in the field and the growth of the melons over the 2014 growing Lasted Revised: 2/9/16 8:26 PM 6 Scotia Weather Services, Inc. – NSERC Engage Project Report season. The camera images were analyzed to monitor melon growth over the season and identify other potentially important field information. ??? Notes were made during the project and in November the team completed a requirements analysis that resulted in a specification for a fully automated forecasting system to be developed in the next phase of the service offering. 2. Summary of IRAP Project A nine-week IRAP project took place from mid-April to mid-June 2014. The NSERC Engage project was a continuation and expansion of this original project. Below is a brief summary of the work accomplished in the IRAP project, as much of the data and results were used as a basis for the NSERC Engage project. In coming to understand the information that would be used to generate the predictive models, we undertook a six stage data mining approach: business understanding, data understanding, data preparation, data analysis, modeling, and evaluation. 2.1 Business Understanding The ability to accurately forecast melon harvest data is directly dependent on access to weather forecast data. Current Numerical Weather Prediction (NWP) models are able to generate weather forecast date up to 14 days into the future. This was determined to be the practical limit for our melon forecasts. Vermuellen Farms felt that this would be more than enough to meet their farm planning needs, although their markets would prefer even more advanced forecasts. We also came to realize that observed and forecasted weather data, and harvest data may be unavailable at times. This meant that our system would have to be robust to missing data. Our solution was to create an predictive model that estimated all required input values for the next melon forecast interval. These estimates could be used to replace any missing input values required by the model. More specifically, as shown in Figure 1, we would develop a model that accepted the previous 14 days’ weather information and melon harvest, and predicted the current day’s weather and melon harvest. This is referred to as an auto-regressive model. By using this model iteratively, we could predict as far out as desired, interchangeably using our own previously predicted values of weather or melon yield as replacements, if the actual data was unavailable. Lasted Revised: 2/9/16 8:26 PM 7 Scotia Weather Services, Inc. – NSERC Engage Project Report Figure 1 – Characterization of the auto-regressive approach taken to predict weather and melon harvest values. The energy provided by the sun to plants is well understood to be the dominant factor in their growth and production of fruit. Specifically, the growing degree days (GDDs) metric is considered the best indicator of plant growth in the agricultural field. GDD is defined as [(Maximum temperature for day) + (Miniumum temperature for day)]/2 – (Base temperature); where negative values are treated as zero, and “Base temperature” for cantaloupes it is 10°C (varies somewhat for each crop). Most crops become ripe for harvesting once the GDDs accumulated from the time of planting exceeds a certain value. Other factors that we assumed would play a major role based on our literature survey were solar radiation, total precipitation and relative humidity. 2.2 Data Understanding The harvest information obtained from Vermeulen Farms is about produce from 2-3 fields making up a total of about 30 acres. The data contained information about the number of melons harvested at each size, however an analysis of the records indicate a number of uncertainties in the breakdown of the total melon count. Vermeulen Farms recommended that for this initial project that we focus on predicting the total number of melons independent of size. It was our understanding going into the project that the count of melons harvested was driven largely by the plant yield in the field, and that the major problem for the project team was predicting this yield. This is a factor of considerable importance that will be revisited later in this report. 2.3 Data Preparation Lasted Revised: 2/9/16 8:26 PM 8 Scotia Weather Services, Inc. – NSERC Engage Project Report The data used in the project can be broadly divided into melon harvest data and weather data. The initial preparation and analysis of these data sets was done independently, before combining them. 2.3.1 Melon Harvest Data The harvest records for the melons spanned from 2007 – 2013. The data for all years were provided on hand-written sheets of paper, except for 2013 which was transferred digitally as a spreadsheet. All harvest data was entered and combined into a single MS Excel spreadsheet file. An initial Meta-Data Report (MDR) was produced (See Appendix Table A.1). Multiple emails and phone calls between Acadia and Vermeulen Farms occurred while the harvest data was checked, screened for errors and corrections made. It was decided that the data from 2007 – 2009 was not sufficiently complete to be used for model development. It was also decided that 2013 was an anomalous year of extremely poor yield that should not be used. This analysis confirmed our decision to not break predictions down by melon size. A final MDR was produced after this data was combined with the weather data (see Appendix Table A.2). Appendices A and B provides tables and graphs of the historical melon data for the years 20102012. The graphs of Appendix B show that the melon count data is not captured daily. Several days of harvest may pass before total count is taken. For this reason we started graphing the accumulated melon harvest over the season. Figure 2 presents the three years of harvest data in this manner. It shows a wide variation in the harvest, with 2011 being a particularly poor year. Lasted Revised: 2/9/16 8:26 PM 9 Scotia Weather Services, Inc. – NSERC Engage Project Report Figure 2 - Accumulated melons for 2010, 2011, and 2012 from July 1 to September 30 2.3.2 Weather Data There were two sets of weather data; one was obtained from the publically available Environment Canada records, and the other one was provided by NSCC from their Station 60 weather station. The Environment Canada (EC) data was retrieved from their website and inspected for missing data and errors. Any problems with the data were fixed using data Acadia had previously acquired from the Kentville Agricultural Centre for other projects. A total of 1096 records spanning 2010-2012 were used in the analysis to prepare for building the statistical models (see Appendix A). The NSCC data was received, and was inspected for missing data and errors. An initial MDR was produced to summarize this data (see Appendix Table A.1). As this table shows, there was enough missing NSCC weather data to warrant using the more complete EC data during this initial analysis as well as for the construction of the models used for 2014 predictions. 2.4 Data Analysis Our initial analysis of the data focused on understanding which variables were most important to predict the harvest date of the melons. See the graphs in Appendix B that show the accumulated melon counts graphed beside accumulated degree days, radiation, and and precipitation. The weather data that was available with high quality included: - average air temperature Lasted Revised: 2/9/16 8:26 PM 10 Scotia Weather Services, Inc. – NSERC Engage Project Report - maximum air temperature minimum air temperature heating degree days cooling degree days dew point temperature relative humidity wind direction wind speed total precipitation Solar radiation appeared to be correlated, but the solar radiation data was missing for 2010 and would most likely not be available as a forecasted value, so it was not used for building the predictive models. It was suspected that soil moisture, water content and temperature would be correlated with the melon harvest as well, but that data was not available with sufficient quantity or quality to investigate further. In the end, it was recommended that the prodictive models should use the previous 14 days’ weather data and melon counts, along with a forecast of the current day’s GDD. In this way, forecast information could be acquired from multiple sources and used to compute the growing degree days. Further details will be discussed in the next section. 2.5 Modeling 2.5.1 Model Structure An auto-regressive method was used to develop models to predict the accumulated melon. To predict the number of melons on day t, the autoregressive model accepted the day of the year, the previous 14 days’ (days t-1 to t-14) weather data and melon data, and the weather forecast data for day t. To produce a 14 day forecast (days t to t+13), the predicted melon harvest for day t was used as an input to predict day t+1’s melon harvest, along with the weather forecast data for day t+1. This process was repeated until all 14 days into the future had a prediction. In this way, any source of weather containing the appropriate weather data could be used to produce a 14 day melon harvest forecast. This simple model would also predict the weather data of day t. So, even in the case where no weather data (observed or forecasts) was available, and no melon harvest data was available, the model could use the last recorded values to predict the next days value. Note that this technique has the advantage of being able to predict beyond the 14 days of weather forecast, but with a higher projected error in later predicted values. 2.5.2 Model Construction Lasted Revised: 2/9/16 8:26 PM 11 Scotia Weather Services, Inc. – NSERC Engage Project Report All of the models built were back-propagation neural networks. A standard approach was taken to determine the best network architecture and configuration and learning parameters. We settled on networks with 28 inputs, 50 hidden nodes, and 2 outputs. We used a learning rate of 0.001 and a momentum value of 0.9. A serious consideration during this project was the lack of data; the harvest season spanned a maximum of 90 days (July – September) of each year, and only the data from 2010, 2011, and 2012 was judged to be accurate. This gave a maximum of 270 examples with which to train and evaluate the models. To deal with this small data set, we used cross validation. Cross validation is a technique where multiple models are built from the same data, but the data that they are tested on is always different, and never used to build that particular model. For these models we used a 5-fold cross validation, which means that 5 models were built and tested, and we divided the data into 5 groups of 20% of the data. We trained the models on 3 (60%) of those, validated on 1 (20%) of them, and tested on the last 20%. If the results are displayed by year, the values from the 5 test set are combined and then separated by year so each year has a meaningful error. 2.5.2.1 Direct Melon Count Models Initial models were built that used the direct counts (non-cumulative) values of the melons picked and all the weather data from the previous 14 days as inputs, and predicted the melon harvest for the current day. These models tended to have a daily RMSE of between 2500 and 3000 melons, and were very noisy, frequently predicting negative values of considerable magnitude (See Figure 3). Clearly, this would not be informative to Vermuellem Farms. Figure 3 - Comparison of the actual harvest for 2010 to the predictions of 5 models that used only the current weather information Lasted Revised: 2/9/16 8:26 PM 12 Scotia Weather Services, Inc. – NSERC Engage Project Report 2.5.2.2 Melon Accumulation Models To decrease the noise/negative values in the models, a different approach was taken. The melon harvest for every day was accumulated as a running total, which equaled the total melons picked to date. This value is monotonically increasing, so it decreased the noise in the data. New models were developed to predict the accumulated melon harvest at each day. The difference between in the predictions of any two consecutive days could then be calculated as the increase in melons to be harvested. As Figure 3 shows, this worked very well. Figure 3 - Actual and predicted accumulated melons using only previous melon harvest as input. 2.6 Model Evaluation To determine which of the weather variables are most important to the prediction of accumulated melon count, models were built that used each of the weather variables independently. The resulting models were compared to a persistence model that predicts the next value to be the same as the current value. Models were also developed using the GDD for each day (“10 Deg Days”) and the accumulated GDDs (“Deg Days Summed from May 5”). The results are presented in Figure 4. Using only the previous accumulated melon harvest gave the lowest error. The most important weather variables were found to be: 1) Accumulated GDD (summed from May 5th, the average planting date) 2) All variables together 3) Maximum temperature Lasted Revised: 2/9/16 8:26 PM 13 Scotia Weather Services, Inc. – NSERC Engage Project Report Figure 4 - RMSE for models built using various weather variables. 3. Overview of Work Completed During the NSERC Engage Project The Acadia team worked with Vermeulen Farms, Nova Scotia Community College, and Scotia Weather Services throughout this project. Vermeulen Farms supplied their records of the melon harvest for 2007 – 2013, which included number of melons picked of various sizes. NSCC supplied weather data daily from a local weather station set up closer to the field than the Kentville-based Environment Canada (EC) station. NSCC also provided observed weather data for that station going back to 2007. SWS provided daily a 14-day weather forecast for the specific location of the field, which differed slightly from EC’s forecast for Kentville. The Acadia group worked with VF, NSCC and SWS to complete the following research and development activities: 1) To determine the best way formulate the data mining problem and what the most important prediction variables were. 2) To build more accurate predictive models and test them live during the 2014 growing season using a semi-automated delivery method that sends forecasts to Vermeulen Farms by email. The predictive model will forecast daily harvest up to 14 days in advance given the day of the year, observed and forecasted weather data and field information as inputs. 3) To install and test automated weather and photography equipment in the field so as to more accurately record field data during the 2014 season. Lasted Revised: 2/9/16 8:26 PM 14 Scotia Weather Services, Inc. – NSERC Engage Project Report 4) Complete a project report detailing the quality of the predictions, and develop the requirements for a fully automated forecasting system to be developed in the next phase of the research. 4. Data Preparation and Analysis The majority of the data preparation and analysis was done in the previous IRAP project (see Sections 2.1 – 2.4). A review of this effort was conducted and a refinement of the data representation was taken were necessary at the beginning of the Engage project The final data that was used to develop the models used for predictions during the 2014 season consisted of the Vermuellen Farms melon harvest data from 2010-2012 and matching observed weather data from Environment Canada. 5. Development of Statistical Models to Forecast Future Melon Harvest All of the models built were neural network models. As before the best network architecture consisted of 28 inputs, 50 hidden nodes, and 2 outputs. We used a learning rate of 0.001 and a momentum value of 0.9. The model used the previous 14 days’ accumulated melon harvest and accumulated GDD to predict the next days’ accumulated melon harvest and accumulated GDDs (which were never used). This model was then used 14 times to produce a 14 day melon harvest forecast (see Section 2.1). Figure 5 shows the cross validation results from 5 models built in this way. The models have relatively low RMSE variation. Figure 5 - RMSE of 5 fold cross validation models and the standard deviation for all years combined and each year individually Lasted Revised: 2/9/16 8:26 PM 15 Scotia Weather Services, Inc. – NSERC Engage Project Report The models were tested to see how their accuracy varied over the 14 day prediction period, and were compared to a basic persistence model that always predicted the previous day’s value. Table 2 and Figure 6 shows these results. Table 2 and Figure 6 - RMSE of a persistence model, and a regressive model using observed or predicted weather data A final production model was built from all of the 2010-2012 data using 80% of the data as the training set, and the remaining 20% as the validation set. Table 3 and Figure 7 shows the estimated error per day of this model based earlier models built using 60/20/20 splits of data. This production model was used to make accumulated melon forecasts twice a week throughout the harvest season of 2014. Table 3 and Figure 7 - RMSE of the model used to make 2014 predictions for each of the 14 days Lasted Revised: 2/9/16 8:26 PM 16 Scotia Weather Services, Inc. – NSERC Engage Project Report 6. Ongoing Acquisition of Data and Transmission of Melon Harvest Forecasts In order to produce a melon harvest forecast for day t, we required the past 14 days of weather data (t-1 to t-14), a 14 day forecast of the weather (t to t+13), and some estimate of the past 14 days of the melon harvest (t-1 to t-14). The NSCC weather station 60 data was used whenever possible as this was the closest station to the field. At times this data had to be replaced by Environment Canada data. Scotia Weather Service weather forecast data was used for a predictions accept for the first two reports. A forecast of the expected number of melons in the field was produced approximately twice a week from July 18 to September 12. The reports were made on the dates shown in Table 4, and using the specified source information. We received information about the melon harvest at VF twice during the 2014 harvest season,. The last set of harvest records arrived on September 7, so it was not possible to process them in time to be used to make predictions for September 9 and 12. Unfortunately, this meant that for most of the season predictions of accumulated melons was being based on prior predictions of accumulated melons. Table 4 – List of dates and data on which melon forecasts were made. 6.1 Weather Information The weather information can be divided into two sets of data; the observed GDD from day t-1 to t-14, and the predicted GDD calculated from the weather forecasts for day t to t+13: 6.1.1 Acquisition of Observed Weather Data Lasted Revised: 2/9/16 8:26 PM 17 Scotia Weather Services, Inc. – NSERC Engage Project Report The observed weather data came from one of two sources: - NSCC Weather Station 60, which was located near the VF fields where the melons were growing. This data was originally manually emailed to us roughly once a week, but later the process was automated so we received updates daily. - Environment Canada’s records from Kentville, which was located further from the field. We obtained this data by downloading it directly from the EC website. We preferred to use the NSCC data, but we would use the EC records if the NSCC values were not available. The NSCC data was frequently unavailable during the first month of the harvest but later improved that considerably, while the EC records were always available. Whichever set of data was available was used to calculate the GDDs from the maximum and minimum temperatures. 6.1.2 Acquisition of Weather Forecasts The weather forecast data came from one of two sources: - Scotia Weather Service, which provided us a forecast localized specifically to the VF fields. We received these predictions daily as an automated email starting on July 29, and continuing daily until we stopped producing melon forecasts. - Environment Canada’s weather predictions for Kentville. Originally, we manually obtained this data from their website only when needed, but in mid-August began to save this information whenever possible. The important data from the weather forecasts was the maximum and minimum temperatures, which were used to calculate each days GDD. We preferred to use the SWS forecast to calculate the GDD, but we would have used the EC forecast if the SWS forecast was not available, or we could have used our own predicted GDD value. 6.2 Acquisition of Observed Melon Data We has assumed that the observed melon data would be be provided to us on a frequent basis, preferably biweekly with minimal delay, but that turned out to be very challenging for Vermuellen Farms. We received melon harvest data twice during the entire harvest period; on August 2 and September 7. This meant that we had to use our model’s previous predictions of the melon harvest as input for future predictions for most forecasts that were issued (except those shortly after August 2). The automated or semi-automated collection of harvest data is subsequently a major area of improvement for any future projects in this area. Lasted Revised: 2/9/16 8:26 PM 18 Scotia Weather Services, Inc. – NSERC Engage Project Report 6.3 Transmission of Melon Harvest Forecasts Each 14 day accumulated melon harvest forecast was placed in a spreadsheet showing the actual values and an easy to read graph that included estimated error bars. This spreadsheet was then emailed to everyone involved in the project. Table 5 and Figure 8 shows an example of one of the melon forecast reports sent out on August 15. Table 5 and Figure 8 - Example of the predictions that were sent to all members of the project 6.4 Capture of Field Data Two Bushnell HD automated cameras were purchased and installed in the field to monitor the growth of the melons, record farm activities, and to log the local temperature for later comparison with NSCC and EC data. Figure X shows one of the cameras in position next to the field. Unfortunately, this camera was stolen in late September indicating a cost of this type of technology in the field. The following are some of the images that were capture by the cameras. <<more words needed here>> Also reference to timelaps video produced from photos. Lasted Revised: 2/9/16 8:26 PM 19 Scotia Weather Services, Inc. – NSERC Engage Project Report Lasted Revised: 2/9/16 8:26 PM 20 Scotia Weather Services, Inc. – NSERC Engage Project Report 7. Evaluation of 2014 Predictions This section provides an evaluation of how well the predictive models performed over the 2014 harvest season. 7.1 Melon Harvest Predictions After we had received the final results of the melon harvest, we were able to compare our accumulated melon predictions over the growing season to the actual records. Figure 9 compares the actual melons harvested to our predictions of accumulated melons. Figure 9 - Comparison of the actual number of melons harvested versus the our predictions of the number available, from July 18 to September 23 It is important to note two things about the 2014 harvest: Lasted Revised: 2/9/16 8:26 PM 21 Scotia Weather Services, Inc. – NSERC Engage Project Report (1) The final set of records we received from VF included both the daily harvest records and the weekly sales records. Unfortunately, the records do not match up, with almost 10,000 more melons reported sold than were harvested by the end of August. It seems likely that at least some of the harvest information may not have been received, particularly because there was a 10 day period (August 9 – 18) in during which no melons were reported harvested. Figure 10 shows the difference in the this data. We have base the following analysis on the harvest data received. (2) We estimate that 50% of the melons yielded by the plants remained in the field and were never picked. This was due to market demand. Figure 11 compares the 2014 harvest to the previous three years used to build the predictive model. Comparing Figure 9 and 11, it is clear that our model was predicting something closer to the true yield of the melon field versus the total number of melons that the market would accommodate. Figure 10 – Weekly harvest and sales information from July 13 to Aug 30 Lasted Revised: 2/9/16 8:26 PM 22 Scotia Weather Services, Inc. – NSERC Engage Project Report Figure 11 - Accumulated melons for 2010, 2011, 2012, and 2014 from July 1 to September 30 Figure 12 compares the results of predicting the melon harvest using the actual weather, SWS’s forecast, and EC’s forecast. Unsurprisingly, the actual weather had the least error. Figure 12 - RMSE of the model when evaluated over the 2014 growing season, using actual weather data and predicted weather data from Scotia Weather Services and Environment Canada 7.2 Weather Predictions Observing the difference in accumulated melon predictions as a function of weather data, we decide to compare the accuracy of the weather forecasted by SWS and EC to the weather recorded at the NSCC and Kentville weather stations. Figures 13, 14 and 15 show the results. Figure 13 - RMSE of Environment Canada/Scotia Weather Services predictions of maximum daily temperature versus Environment Canada/NSCC Station 60 observed data Lasted Revised: 2/9/16 8:26 PM 23 Scotia Weather Services, Inc. – NSERC Engage Project Report Figure 14 - RMSE of Environment Canada/Scotia Weather Services predictions of minimum daily temperature versus Environment Canada/NSCC Station 60 observed data Figure 15 - RMSE of Environment Canada/Scotia Weather Services predictions of degree days versus Environment Canada/NSCC Station 60 observed data SWS NWP prediction system was totally automated, so error, particularly in the minimum daily temperature, may have come from several different sources. Possibilities included the amount of cloud that occurred, or that the soil profile may have been different than expected resulting in different night time cooling. SWS recommends future investigation of the temperature prediction anomalies. 8. Summary and Recommendations Lasted Revised: 2/9/16 8:26 PM 24 Scotia Weather Services, Inc. – NSERC Engage Project Report 8.1 Summary The goal of this project was to develop and improve on models that can predict the number of cantaloupe melons available to harvest. The Acadia group worked with VF, NSCC, and SWS to complete four research and development activities: 8.1.1 Determine Prediction Variables We determined that the GDD was the most important of easily accessible weather variables, and that adding in additional variables did not seem to improve results. Although we suspected that soil and solar data would be helpful, we were unable to find a reliable source of observed data for the location. 8.1.2 Build and Test Predictive Models We built multiple models to try to improve our predictions, and eventually we had models that had similar 60/20/20 cross validation accuracies. We then took 80% of the data as training and 20% as a validation set and built a final model that would be used to generate predictions for 2014. We estimated the error in the predictions based on the average error of the cross validation models. We took the weather predictions that we obtained from either EC or SWS and produced melon forecasts from mid-July until the end of September. Each forecast used the previous 14 days of GDDs and melon harvests to predict the melon harvest for the current day and another 13 days into the future. We produced 15 forecasts from July 18 to September 12, which covered most of the 2014 growing season. The forecasts were emailed out to all participants in the project. We received the actual harvest information infrequently, so most of our predictions used our previous predictions as inputs. 8.1.3 Install and Test Equipment Originally we had planned to install a small weather monitoring station (temperature logger) in one of VF’s fields, and place several cameras to record the growth of the melons throughout the summer. We did purchase and install two automated cameras that recorded local temperature on the images for much of the harvest season. Unfortunately, one of the cameras was stolen in late September taken with it images of the later part of the harvest season. [We need to complete a comparison of camera temperature to EC, NSCC and SWS temperatures] Lasted Revised: 2/9/16 8:26 PM 25 Scotia Weather Services, Inc. – NSERC Engage Project Report 8.1.4 Evaluate Predictions for 2014 Throughout the time we were producing melon forecasts we also recorded our predictions of the melon harvest and the weather predictions we received. The evaluation of the melon harvest predictions were hampered by the lack of actual harvest data, so the analysis did not yield results until after we had halted our forecasting. The weather evaluation also primarily took place after we had stopped generating forecasts. The melon harvest evaluation showed that we dramatically overestimated the number of melons picked; however, the actual harvest data was the number actually picked, not the number available. At several points during the summer, the market demand was low enough that no melons were picked even though there were some available. We also found that our predictions became less accurate as we predicted multiple days into the future, but that the accuracy actually improved beyond 10 days. The cause for this is uncertain, but it may be related to the weather forecasts that were used as inputs to our models. The evaluation of the weather forecast data that we received showed that consistently the EC predictions were better than those of SWS, when compared to either of the locations the weather data was recorded (EC or NSCC stations). The SWS predictions generally increased in error over all 14 days of predictions, while the EC prediction error plateaued or even decreased after approximately 10 days out. The larger SWS prediction errors may have been because the SWS system was completely automated, and they recommended further investigation. 8.2 Recommendations - We were predicting crop yield. But harvest was driven largely by the market. Next year VF will not grow as many melons. Consider major issue for farmers – is market model a need. Use cameras in field to collect agricultural activities and interventions Place weather monitoring station in field to capture local observed weather and calibrate weather forecasts Consider better ways to record actual harvest information daily – ground truth for target values Determine additional markets for harvest forecasting What does Andy feel about the utility of this system? Lasted Revised: 2/9/16 8:26 PM 26 Scotia Weather Services, Inc. – NSERC Engage Project Report 9. Requirements for Automated Forecasting System ?? 10. References [1] S. Veeenadhari, Bharat Misra, and C.D. Singh. Data Mining Techniques fro Predicting Crop Productivity – A Review Article. IJCST 2:1, p. 98-100, 2011. [2] S. Jenni, K. Stewart, G. Bourgeois and D. Cloutier. Predicting Yield and Time to Maturity of. Muskmelons from Weather and Crop Observations. J. Amer. Soc. Hort. Sci. 123:2, p. 195–201. 1998. [3] Moh. Hajiyan. Early Prediction of Crop Yield. Technical Report, School of Engineering, University of Guelph, 2012. [4] Raju Paaswan and Shahin Begum. Regression and Neural Networks Models for Prediction of Crop Production. Int. J. of Science and Engineering Research, 14:9, p.9810, 2013. [5] Byoug-Hoon Lee, Phil Kenkel and B.Wade Brorsen. Pre-harvest Forecasting of County Wheat Yield and Wheat Quality using Weather Information. Agriculture and Forest Meteoroogy, 2013:168, p.26-35, 2012. [6] Jiri Statstny, Valdimir Konecny, and Oldrich Trenz. Agricultural Data Prediction by Means of Neural Network. Agric. Econ. – Czech, 57:7, p. 356-361, 2011. [7] S. Jenni, D. Cloutier, G. Bourgeois and K. Stewart. A Heat Unit Model to Predict Growth and Development of Muskmelon to Anthesis of Perfect Flowers J. Amer. Soc. Hort. Sci. 121:2, p. 274–280. 1996. [8] J.T. Baker, D.I. Leskovar, V.R. Reddy and F.J. Dainello. A Simple Phenological Model of Muskmelon Development. Annals of Botany 87, p.615-621, 2001. [9] J.T. Baker and V. R. Reddy. Temperature Effects on Phenological Development and Yield of Muskmelon. Annals of Botany 87, p.605-613, 2001. Lasted Revised: 2/9/16 8:26 PM 27 Scotia Weather Services, Inc. – NSERC Engage Project Report Appendices A. Meta Data Reports Table A.1 - Combined MDR for Kentville Weather and Melon Harvest Data Table A.2 - Initial MDR for Station 60 Weather Data from NSCC Lasted Revised: 2/9/16 8:26 PM 28 Scotia Weather Services, Inc. – NSERC Engage Project Report Table A.3 - Missing Data from Station 60 Weather Data from NSCC B. Observed Melon and Weather Data for 2010 - 2012 Lasted Revised: 2/9/16 8:26 PM 29 Scotia Weather Services, Inc. – NSERC Engage Project Report Figure B.1 – 2010 melon and weather data Figure B.2 – 2011 melon and weather data Lasted Revised: 2/9/16 8:26 PM 30 Scotia Weather Services, Inc. – NSERC Engage Project Report Figure B.2 – 2012 melon and weather data C. Model Building Results Results from Accumulation Models Table C.1 - RMSE for models built using various weather variables Table C.2 - RMSE of 5 fold cross validation models for all years combined and each year individually Lasted Revised: 2/9/16 8:26 PM 31 Scotia Weather Services, Inc. – NSERC Engage Project Report D. 2014 Melon Prediction Accuracy ??? E. 2014 Weather Prediction Accuracy Table E.1 - RMSE of Environment Canada/Scotia Weather Services predictions of daily temperature extremes and degree days versus Environment Canada/NSCC Station 60 observed data ***What other data should be here?*** Lasted Revised: 2/9/16 8:26 PM 32