Melon Project Final Report V6

advertisement
Scotia Weather Services, Inc. – NSERC Engage Project Report
Research Report
Predictive Analytics for Melon
Harvest Forecasting
for
Scotia Weather Services, Inc.
Dartmouth , NS
funded by
Natural Sciences and Engineering Research Council
Engage Grant
Daniel L. Silver, Ph.D.
Geoffrey Mason, B.Sc.
Acadia University
Lasted Revised: 2/9/16 8:26 PM
1
Scotia Weather Services, Inc. – NSERC Engage Project Report
Contents
Glossary of Terms ................................................................................................................................. 4
1. Introduction .................................................................................................................................... 5
2. Summary of IRAP Project ........................................................................................................... 7
2.1 Business Understanding ...................................................................................................................... 7
2.2 Data Understanding ............................................................................................................................ 8
2.3 Data Preparation ................................................................................................................................. 8
2.3.1 Melon Harvest Data......................................................................................................................... 9
2.3.2 Weather Data ................................................................................................................................ 10
2.4 Data Analysis ..................................................................................................................................... 10
2.5 Modeling ........................................................................................................................................... 11
2.5.1 Model Structure ............................................................................................................................. 11
2.5.2 Model Construction ....................................................................................................................... 11
2.6 Model Evaluation .............................................................................................................................. 13
3. Overview of Work Completed During the NSERC Engage Project .............................. 14
4. Data Preparation and Analysis ............................................................................................... 15
5. Development of Statistical Models to Forecast Future Melon Harvest .................... 15
6. Ongoing Acquisition of Data and Transmission of Melon Harvest Forecasts ........ 17
6.1 Weather Information ........................................................................................................................ 17
6.1.1 Acquisition of Observed Weather Data ...................................................................................... 17
6.1.2 Acquisition of Weather Forecasts............................................................................................... 18
6.2 Acquisition of Observed Melon Data ................................................................................................ 18
6.3 Transmission of Melon Harvest Forecasts ........................................................................................ 19
6.4 Capture of Field Data ............................................................................................................................ 19
7. Evaluation of 2014 Predictions .............................................................................................. 21
7.1
7.2
Melon Harvest Predictions ................................................................................................................ 21
Weather Predictions.......................................................................................................................... 23
8. Summary and Recommendations .......................................................................................... 24
8.1 Summary ........................................................................................................................................... 25
8.1.1 Determine Prediction Variables .................................................................................................. 25
8.1.2 Build and Test Predictive Models ............................................................................................... 25
8.1.3 Install and Test Equipment ......................................................................................................... 25
8.1.4 Evaluate Predictions for 2014..................................................................................................... 26
8.2 Recommendations ............................................................................................................................ 26
9. Requirements for Automated Forecasting System .......................................................... 27
10. References ................................................................................................................................... 27
Lasted Revised: 2/9/16 8:26 PM
2
Scotia Weather Services, Inc. – NSERC Engage Project Report
Appendices ............................................................................................................................................ 28
A.
B.
C.
Meta Data Reports .............................................................................................................................. 28
Observed Melon and Weather Data for 2010 - 2012.......................................................................... 29
Model Building Results ........................................................................................................................ 31
Results from Accumulation Models ........................................................................................................ 31
D. 2014 Melon Prediction Accuracy ........................................................................................................ 32
E. 2014 Weather Prediction Accuracy ..................................................................................................... 32
Lasted Revised: 2/9/16 8:26 PM
3
Scotia Weather Services, Inc. – NSERC Engage Project Report
Glossary of Terms
NSCC – Nova Scotia Community College
EC – Environment Canada; normally refers to weather data
VF – Vermuellen Farms Ltd, ownd and operated by Andy Vermuellen, Canning, Nova Scotia
SWS - Scotia Weather Services Inc, of Dartmouth, Nova Scotia
GDD – Growing degree days are a measure of the energy absorbed by the plant throughout the
growing season. Most crops become ripe for harvesting once the GDDs accumulated from the
time of planting exceed a certain value. Formula: GDD = [ (Max temp for day) + (Min temp for
day) ]/2 – (Base temp); where negative values are treated as zero, “Base temp” for cantaloupes
it is 10°C.
MSE – The mean squared error is the sum of the squared error of each data point, divided by
the number of data points; a standard metric for comparing the accuracies of statistical models.
Neural Network – A statistical model built using historical data and used to predict future
values; each example has an input variables and a target output variable; and the model is
constructed to minimize the MSE between the predicted output and the target output over all
training examples. A test set is used to independently evaluate the models accuracy.
Persistence Model – A persistence model is a simple model that predicts that the next value
will be the same as the current value. These models often used a simple baseline model to
judge performance.
Lasted Revised: 2/9/16 8:26 PM
4
Scotia Weather Services, Inc. – NSERC Engage Project Report
1. Introduction
Success in the harvest of crops and their delivery to market
depends on knowing when the produce is ready to be picked and
the planning of human and material resources to maximize crop
yield [1]. Cantaloupe melons are particularly challenging, as the
harvest period is short and the required human and
transportation resources demanding [2]. This requires careful
planning and logistics based on the best possible prediction of
crop yield. Scotia Weather Services (SWS) wishes to extend their
abilities in weather forecasting to provide forecasts of crop yield to the agri-food industry - from
farmers to markets [3,4,5,6].
This project takes a data mining approach
and uses machine learning technology to
develop forecast models that predict
melon harvest (number of melons) per
day up to at least two weeks in advance.
The predictive models forecast daily
harvest given the current day of the year,
observed and forecasted weather data
and field information [7,8,9]. Vermeulen
Farms (VF) of Canning, NS provided
domain expertise and melon harvest data
from 2007 to 2013. In addition, we
obtained (1) observed weather data from
Environment Canada as well as an existing
Nova Scotia Community College (NSCC)
weather station proximal to Vermeulen
Farms, and (2) high quality weather forecast data from Environment Canada and Scotia Weather
Services for two weeks in advance. The major challenge anticipated in the project was making
accurate harvest predictions out over the two week period so that the advance information would
be useful for farm and market planning.
This NSERC Engage project extended a collaboration originally developed during a small NRC-IRAP
CtoOrg project with SWS that confirmed that it is indeed possible to predict melon yield from
weather data [8,9]. During the project that occurs during and after the 2014 growing season, we
will improved on the quality of the data, refined our forecasting models and tested them with live
data using a semi-automated approach that delivered forecasts to Vermeulen Farms by email. We
Lasted Revised: 2/9/16 8:26 PM
5
Scotia Weather Services, Inc. – NSERC Engage Project Report
also installed and tested automated weather and photography equipment during the 2014 harvest
season so as to more accurately record field data.
Based on the results of this project, we anticipate collaborating with SWS to develop the
requirements for a fully automated forecasting system. The long-term goal is to develop a cloudbased harvest forecasting system for a variety of crops. This project is a significant step toward
SWS being able to develop a unique and innovative crop forecasting service in Canada – ultimately
benefiting SWS and the agricultural industry as a whole.
Objectives:
The combined NRC-IRAP and NSERC Engage projects had four objectives: (1) To determine the best
way to formulate the data mining problem and the most important predictor variables. (2) To build
more accurate predictive models and test them live during the 2014 growing season using a semiautomated delivery method that sends forecasts to Vermeulen Farms by email. The predictive
model will forecast daily harvest up to 14 days in advance given the day of the year, observed and
forecasted weather data and field information as inputs. (3)To install and test automated weather
and photography equipment in the field so as to more accurately record field data during the 2014
season. (4) To develop the requirements for a fully automated forecasting system to be developed
in the next phase of the research.
Approach:
A standard data mining approach was used to develop and test the predictive models. Two
Research Assistants were hired – a data collection RA and data mining RA. Beginning with the
knowledge gained from the IRAP project we enriched the data we have with additional soil
temperature and moisture and solar radiation data from the weather station as well as high
quality daily weather forecast data from SWS. The historic melon yield data from Vermeulen
Farms was also reviewed and enhanced. The data mining RA used this data to develop and test
artificial neural network models that predict the crop yield for up to 14 days in advance. A
confidence interval was estimated for each daily prediction.
Beginning in mid-July the best model was used to make live forecasts for the 2014 growing season.
A semi-automated approach was taken, whereby daily observed weather and new weather
forecasts, obtained from the weather station and SWS, respectfully, were used to make yield
predictions. These daily forecasts were delivered to Vermeulen Farms by email for their decision
making. The machine learning methods we employed are tried and proven in the literature [6].
Innovation occurred in the areas of data representation, the handling of missing weather values
and model development for predictions up to two weeks in advance. Following the final harvests
in September, the results were analyzed and this report was prepared by the team.
The data collection RA collected the raw weather data from the weather station and prepared it
for use through data mining. Because of missing data in the historic records, work had to be done
to impute values based on best estimates from nearby weather stations. Two new cameras (with
temperature loggers) were set up in the melon field. This new equipment provided a means of
measuring the temperature in the field and the growth of the melons over the 2014 growing
Lasted Revised: 2/9/16 8:26 PM
6
Scotia Weather Services, Inc. – NSERC Engage Project Report
season. The camera images were analyzed to monitor melon growth over the season and identify
other potentially important field information.
??? Notes were made during the project and in November the team completed a requirements
analysis that resulted in a specification for a fully automated forecasting system to be developed in
the next phase of the service offering.
2. Summary of IRAP Project
A nine-week IRAP project took place from mid-April to mid-June 2014. The NSERC Engage
project was a continuation and expansion of this original project. Below is a brief summary of
the work accomplished in the IRAP project, as much of the data and results were used as a basis
for the NSERC Engage project.
In coming to understand the information that would be used to generate the predictive models,
we undertook a six stage data mining approach: business understanding, data understanding,
data preparation, data analysis, modeling, and evaluation.
2.1 Business Understanding
The ability to accurately forecast melon harvest data is directly dependent on access to weather
forecast data. Current Numerical Weather Prediction (NWP) models are able to generate
weather forecast date up to 14 days into the future. This was determined to be the practical
limit for our melon forecasts. Vermuellen Farms felt that this would be more than enough to
meet their farm planning needs, although their markets would prefer even more advanced
forecasts.
We also came to realize that observed and forecasted weather data, and harvest data may be
unavailable at times. This meant that our system would have to be robust to missing data. Our
solution was to create an predictive model that estimated all required input values for the next
melon forecast interval. These estimates could be used to replace any missing input values
required by the model. More specifically, as shown in Figure 1, we would develop a model that
accepted the previous 14 days’ weather information and melon harvest, and predicted the
current day’s weather and melon harvest. This is referred to as an auto-regressive model. By
using this model iteratively, we could predict as far out as desired, interchangeably using our
own previously predicted values of weather or melon yield as replacements, if the actual data
was unavailable.
Lasted Revised: 2/9/16 8:26 PM
7
Scotia Weather Services, Inc. – NSERC Engage Project Report
Figure 1 – Characterization of the auto-regressive approach taken to predict weather and
melon harvest values.
The energy provided by the sun to plants is well understood to be the dominant factor in their
growth and production of fruit. Specifically, the growing degree days (GDDs) metric is
considered the best indicator of plant growth in the agricultural field. GDD is defined as
[(Maximum temperature for day) + (Miniumum temperature for day)]/2 – (Base temperature);
where negative values are treated as zero, and “Base temperature” for cantaloupes it is 10°C
(varies somewhat for each crop). Most crops become ripe for harvesting once the GDDs
accumulated from the time of planting exceeds a certain value. Other factors that we assumed
would play a major role based on our literature survey were solar radiation, total precipitation
and relative humidity.
2.2 Data Understanding
The harvest information obtained from Vermeulen Farms is about produce from 2-3 fields
making up a total of about 30 acres. The data contained information about the number of
melons harvested at each size, however an analysis of the records indicate a number of
uncertainties in the breakdown of the total melon count. Vermeulen Farms recommended that
for this initial project that we focus on predicting the total number of melons independent of
size.
It was our understanding going into the project that the count of melons harvested was driven
largely by the plant yield in the field, and that the major problem for the project team was
predicting this yield. This is a factor of considerable importance that will be revisited later in
this report.
2.3 Data Preparation
Lasted Revised: 2/9/16 8:26 PM
8
Scotia Weather Services, Inc. – NSERC Engage Project Report
The data used in the project can be broadly divided into melon harvest data and weather data.
The initial preparation and analysis of these data sets was done independently, before
combining them.
2.3.1 Melon Harvest Data
The harvest records for the melons
spanned from 2007 – 2013. The data for
all years were provided on hand-written
sheets of paper, except for 2013 which
was transferred digitally as a spreadsheet.
All harvest data was entered and
combined into a single MS Excel
spreadsheet file. An initial Meta-Data
Report (MDR) was produced (See
Appendix Table A.1). Multiple emails and
phone calls between Acadia and
Vermeulen Farms occurred while the
harvest data was checked, screened for
errors and corrections made.
It was decided that the data from 2007 – 2009 was not sufficiently complete to be used for
model development. It was also decided that 2013 was an anomalous year of extremely poor
yield that should not be used. This analysis confirmed our decision to not break predictions
down by melon size. A final MDR was produced after this data was combined with the weather
data (see Appendix Table A.2).
Appendices A and B provides tables and graphs of the historical melon data for the years 20102012. The graphs of Appendix B show that the melon count data is not captured daily. Several
days of harvest may pass before total count is taken. For this reason we started graphing the
accumulated melon harvest over the season. Figure 2 presents the three years of harvest data
in this manner. It shows a wide variation in the harvest, with 2011 being a particularly poor
year.
Lasted Revised: 2/9/16 8:26 PM
9
Scotia Weather Services, Inc. – NSERC Engage Project Report
Figure 2 - Accumulated melons for 2010, 2011, and 2012 from July 1 to September 30
2.3.2 Weather Data
There were two sets of weather data; one was obtained from the publically available
Environment Canada records, and the other one was provided by NSCC from their Station 60
weather station.
The Environment Canada (EC) data was retrieved from their website and inspected for missing
data and errors. Any problems with the data were fixed using data Acadia had previously
acquired from the Kentville Agricultural Centre for other projects. A total of 1096 records
spanning 2010-2012 were used in the analysis to prepare for building the statistical models (see
Appendix A).
The NSCC data was received, and was inspected for missing data and errors. An initial MDR was
produced to summarize this data (see Appendix Table A.1). As this table shows, there was
enough missing NSCC weather data to warrant using the more complete EC data during this
initial analysis as well as for the construction of the models used for 2014 predictions.
2.4 Data Analysis
Our initial analysis of the data focused on understanding which variables were most important
to predict the harvest date of the melons. See the graphs in Appendix B that show the
accumulated melon counts graphed beside accumulated degree days, radiation, and and
precipitation.
The weather data that was available with high quality included:
- average air temperature
Lasted Revised: 2/9/16 8:26 PM
10
Scotia Weather Services, Inc. – NSERC Engage Project Report
-
maximum air temperature
minimum air temperature
heating degree days
cooling degree days
dew point temperature
relative humidity
wind direction
wind speed
total precipitation
Solar radiation appeared to be correlated, but the solar radiation data was missing for 2010 and
would most likely not be available as a forecasted value, so it was not used for building the
predictive models. It was suspected that soil moisture, water content and temperature would
be correlated with the melon harvest as well, but that data was not available with sufficient
quantity or quality to investigate further.
In the end, it was recommended that the prodictive models should use the previous 14 days’
weather data and melon counts, along with a forecast of the current day’s GDD. In this way,
forecast information could be acquired from multiple sources and used to compute the growing
degree days. Further details will be discussed in the next section.
2.5 Modeling
2.5.1 Model Structure
An auto-regressive method was used to develop models to predict the accumulated melon. To
predict the number of melons on day t, the autoregressive model accepted the day of the year,
the previous 14 days’ (days t-1 to t-14) weather data and melon data, and the weather forecast
data for day t. To produce a 14 day forecast (days t to t+13), the predicted melon harvest for
day t was used as an input to predict day t+1’s melon harvest, along with the weather forecast
data for day t+1. This process was repeated until all 14 days into the future had a prediction. In
this way, any source of weather containing the appropriate weather data could be used to
produce a 14 day melon harvest forecast.
This simple model would also predict the weather data of day t. So, even in the case where no
weather data (observed or forecasts) was available, and no melon harvest data was available,
the model could use the last recorded values to predict the next days value. Note that this
technique has the advantage of being able to predict beyond the 14 days of weather forecast,
but with a higher projected error in later predicted values.
2.5.2 Model Construction
Lasted Revised: 2/9/16 8:26 PM
11
Scotia Weather Services, Inc. – NSERC Engage Project Report
All of the models built were back-propagation neural networks. A standard approach was taken
to determine the best network architecture and configuration and learning parameters. We
settled on networks with 28 inputs, 50 hidden nodes, and 2 outputs. We used a learning rate of
0.001 and a momentum value of 0.9.
A serious consideration during this project was the lack of data; the harvest season spanned a
maximum of 90 days (July – September) of each year, and only the data from 2010, 2011, and
2012 was judged to be accurate. This gave a maximum of 270 examples with which to train and
evaluate the models. To deal with this small data set, we used cross validation. Cross validation
is a technique where multiple models are built from the same data, but the data that they are
tested on is always different, and never used to build that particular model. For these models
we used a 5-fold cross validation, which means that 5 models were built and tested, and we
divided the data into 5 groups of 20% of the data. We trained the models on 3 (60%) of those,
validated on 1 (20%) of them, and tested on the last 20%. If the results are displayed by year,
the values from the 5 test set are combined and then separated by year so each year has a
meaningful error.
2.5.2.1 Direct Melon Count Models
Initial models were built that used the direct counts (non-cumulative) values of the melons
picked and all the weather data from the previous 14 days as inputs, and predicted the melon
harvest for the current day. These models tended to have a daily RMSE of between 2500 and
3000 melons, and were very noisy, frequently predicting negative values of considerable
magnitude (See Figure 3). Clearly, this would not be informative to Vermuellem Farms.
Figure 3 - Comparison of the actual harvest for 2010 to the predictions of 5 models that used
only the current weather information
Lasted Revised: 2/9/16 8:26 PM
12
Scotia Weather Services, Inc. – NSERC Engage Project Report
2.5.2.2 Melon Accumulation Models
To decrease the noise/negative values in the models, a different approach was taken. The
melon harvest for every day was accumulated as a running total, which equaled the total
melons picked to date. This value is monotonically increasing, so it decreased the noise in the
data. New models were developed to predict the accumulated melon harvest at each day. The
difference between in the predictions of any two consecutive days could then be calculated as
the increase in melons to be harvested. As Figure 3 shows, this worked very well.
Figure 3 - Actual and predicted accumulated melons using only previous melon harvest as input.
2.6 Model Evaluation
To determine which of the weather variables are most important to the prediction of
accumulated melon count, models were built that used each of the weather variables
independently. The resulting models were compared to a persistence model that predicts the
next value to be the same as the current value. Models were also developed using the GDD for
each day (“10 Deg Days”) and the accumulated GDDs (“Deg Days Summed from May 5”).
The results are presented in Figure 4. Using only the previous accumulated melon harvest gave
the lowest error. The most important weather variables were found to be:
1) Accumulated GDD (summed from May 5th, the average planting date)
2) All variables together
3) Maximum temperature
Lasted Revised: 2/9/16 8:26 PM
13
Scotia Weather Services, Inc. – NSERC Engage Project Report
Figure 4 - RMSE for models built using various weather variables.
3. Overview of Work Completed During the NSERC Engage Project
The Acadia team worked with Vermeulen Farms, Nova
Scotia Community College, and Scotia Weather Services
throughout this project. Vermeulen Farms supplied their
records of the melon harvest for 2007 – 2013, which
included number of melons picked of various sizes. NSCC
supplied weather data daily from a local weather station
set up closer to the field than the Kentville-based
Environment Canada (EC) station. NSCC also provided
observed weather data for that station going back to 2007.
SWS provided daily a 14-day weather forecast for the
specific location of the field, which differed slightly from
EC’s forecast for Kentville.
The Acadia group worked with VF, NSCC and SWS to
complete the following research and development
activities:
1) To determine the best way formulate the data mining problem and what the most
important prediction variables were.
2) To build more accurate predictive models and test them live during the 2014 growing
season using a semi-automated delivery method that sends forecasts to Vermeulen
Farms by email. The predictive model will forecast daily harvest up to 14 days in
advance given the day of the year, observed and forecasted weather data and field
information as inputs.
3) To install and test automated weather and photography equipment in the field so as to
more accurately record field data during the 2014 season.
Lasted Revised: 2/9/16 8:26 PM
14
Scotia Weather Services, Inc. – NSERC Engage Project Report
4) Complete a project report detailing the quality of the predictions, and develop the
requirements for a fully automated forecasting system to be developed in the next
phase of the research.
4. Data Preparation and Analysis
The majority of the data preparation and analysis was done in the previous IRAP project (see
Sections 2.1 – 2.4). A review of this effort was conducted and a refinement of the data
representation was taken were necessary at the beginning of the Engage project
The final data that was used to develop the models used for predictions during the 2014 season
consisted of the Vermuellen Farms melon harvest data from 2010-2012 and matching observed
weather data from Environment Canada.
5. Development of Statistical Models to Forecast Future Melon Harvest
All of the models built were neural network models. As before the best network architecture
consisted of 28 inputs, 50 hidden nodes, and 2 outputs. We used a learning rate of 0.001 and a
momentum value of 0.9. The model used the previous 14 days’ accumulated melon harvest
and accumulated GDD to predict the next days’ accumulated melon harvest and accumulated
GDDs (which were never used). This model was then used 14 times to produce a 14 day melon
harvest forecast (see Section 2.1).
Figure 5 shows the cross validation results from 5 models built in this way. The models have
relatively low RMSE variation.
Figure 5 - RMSE of 5 fold cross validation models and the standard deviation for all years
combined and each year individually
Lasted Revised: 2/9/16 8:26 PM
15
Scotia Weather Services, Inc. – NSERC Engage Project Report
The models were tested to see how their accuracy varied over the 14 day prediction period,
and were compared to a basic persistence model that always predicted the previous day’s
value. Table 2 and Figure 6 shows these results.
Table 2 and Figure 6 - RMSE of a persistence model, and a regressive model using observed or
predicted weather data
A final production model was built from all of the 2010-2012 data using 80% of the data as the
training set, and the remaining 20% as the validation set. Table 3 and Figure 7 shows the
estimated error per day of this model based earlier models built using 60/20/20 splits of data.
This production model was used to make accumulated melon forecasts twice a week
throughout the harvest season of 2014.
Table 3 and Figure 7 - RMSE of the model used to make 2014 predictions for each of the 14
days
Lasted Revised: 2/9/16 8:26 PM
16
Scotia Weather Services, Inc. – NSERC Engage Project Report
6. Ongoing Acquisition of Data and Transmission of Melon Harvest
Forecasts
In order to produce a melon harvest forecast for day t, we required the past 14 days of weather
data (t-1 to t-14), a 14 day forecast of the weather (t to t+13), and some estimate of the past 14
days of the melon harvest (t-1 to t-14). The NSCC weather station 60 data was used whenever
possible as this was the closest station to the field. At times this data had to be replaced by
Environment Canada data. Scotia Weather Service weather forecast data was used for a
predictions accept for the first two reports.
A forecast of the expected number of melons in the field was produced approximately twice a
week from July 18 to September 12. The reports were made on the dates shown in Table 4, and
using the specified source information.
We received information about the melon harvest at VF twice during the 2014 harvest season,.
The last set of harvest records arrived on September 7, so it was not possible to process them
in time to be used to make predictions for September 9 and 12. Unfortunately, this meant that
for most of the season predictions of accumulated melons was being based on prior predictions
of accumulated melons.
Table 4 – List of dates and data on which melon forecasts were made.
6.1 Weather Information
The weather information can be divided into two sets of data; the observed GDD from day t-1
to t-14, and the predicted GDD calculated from the weather forecasts for day t to t+13:
6.1.1 Acquisition of Observed Weather Data
Lasted Revised: 2/9/16 8:26 PM
17
Scotia Weather Services, Inc. – NSERC Engage Project Report
The observed weather data came from one of two sources:
- NSCC Weather Station 60, which was located near the VF fields where the melons were
growing. This data was originally manually emailed to us roughly once a week, but later the
process was automated so we received updates daily.
- Environment Canada’s records from Kentville, which was located further from the field. We
obtained this data by downloading it directly from the EC website.
We preferred to use the NSCC data, but we would use the EC records if the NSCC values were
not available. The NSCC data was frequently unavailable during the first month of the harvest
but later improved that considerably, while the EC records were always available. Whichever
set of data was available was used to calculate the GDDs from the maximum and minimum
temperatures.
6.1.2 Acquisition of Weather Forecasts
The weather forecast data came from one of two sources:
- Scotia Weather Service, which provided us a forecast localized specifically to the VF fields.
We received these predictions daily as an automated email starting on July 29, and
continuing daily until we stopped producing melon forecasts.
- Environment Canada’s weather predictions for Kentville. Originally, we manually obtained
this data from their website only when needed, but in mid-August began to save this
information whenever possible.
The important data from the weather forecasts was the maximum and minimum temperatures,
which were used to calculate each days GDD. We preferred to use the SWS forecast to calculate
the GDD, but we would have used the EC forecast if the SWS forecast was not available, or we
could have used our own predicted GDD value.
6.2 Acquisition of Observed Melon Data
We has assumed that the observed melon data would be be provided to us on a frequent basis,
preferably biweekly with minimal delay, but that turned out to be very challenging for
Vermuellen Farms. We received melon harvest data twice during the entire harvest period; on
August 2 and September 7. This meant that we had to use our model’s previous predictions of
the melon harvest as input for future predictions for most forecasts that were issued (except
those shortly after August 2). The automated or semi-automated collection of harvest data is
subsequently a major area of improvement for any future projects in this area.
Lasted Revised: 2/9/16 8:26 PM
18
Scotia Weather Services, Inc. – NSERC Engage Project Report
6.3 Transmission of Melon Harvest Forecasts
Each 14 day accumulated melon harvest forecast was placed in a spreadsheet showing the
actual values and an easy to read graph that included estimated error bars. This spreadsheet
was then emailed to everyone involved in the project. Table 5 and Figure 8 shows an example
of one of the melon forecast reports sent out on August 15.
Table 5 and Figure 8 - Example of the predictions that were sent to all members of the project
6.4 Capture of Field Data
Two Bushnell HD automated cameras were purchased and installed in the field to monitor the
growth of the melons, record farm activities, and to log the local temperature for later
comparison with NSCC and EC data. Figure X shows one of the cameras in position next to the
field. Unfortunately, this camera was stolen in late September indicating a cost of this type of
technology in the field. The following are some of the images that were capture by the
cameras. <<more words needed here>> Also reference to timelaps video produced from
photos.
Lasted Revised: 2/9/16 8:26 PM
19
Scotia Weather Services, Inc. – NSERC Engage Project Report
Lasted Revised: 2/9/16 8:26 PM
20
Scotia Weather Services, Inc. – NSERC Engage Project Report
7. Evaluation of 2014 Predictions
This section provides an evaluation of how well the predictive models performed over the 2014
harvest season.
7.1 Melon Harvest Predictions
After we had received the final results of the melon harvest, we were able to compare our
accumulated melon predictions over the growing season to the actual records. Figure 9
compares the actual melons harvested to our predictions of accumulated melons.
Figure 9 - Comparison of the actual number of melons harvested versus the our predictions of
the number available, from July 18 to September 23
It is important to note two things about the 2014 harvest:
Lasted Revised: 2/9/16 8:26 PM
21
Scotia Weather Services, Inc. – NSERC Engage Project Report
(1) The final set of records we received from VF included both the daily harvest records and
the weekly sales records. Unfortunately, the records do not match up, with almost
10,000 more melons reported sold than were harvested by the end of August. It seems
likely that at least some of the harvest information may not have been received,
particularly because there was a 10 day period (August 9 – 18) in during which no
melons were reported harvested. Figure 10 shows the difference in the this data. We
have base the following analysis on the harvest data received.
(2) We estimate that 50% of the melons yielded by the plants remained in the field and
were never picked. This was due to market demand. Figure 11 compares the 2014
harvest to the previous three years used to build the predictive model. Comparing
Figure 9 and 11, it is clear that our model was predicting something closer to the true
yield of the melon field versus the total number of melons that the market would
accommodate.
Figure 10 – Weekly harvest and sales information from July 13 to Aug 30
Lasted Revised: 2/9/16 8:26 PM
22
Scotia Weather Services, Inc. – NSERC Engage Project Report
Figure 11 - Accumulated melons for 2010, 2011, 2012, and 2014 from July 1 to September 30
Figure 12 compares the results of predicting the melon harvest using the actual weather, SWS’s
forecast, and EC’s forecast. Unsurprisingly, the actual weather had the least error.
Figure 12 - RMSE of the model when evaluated over the 2014 growing season, using actual
weather data and predicted weather data from Scotia Weather Services and Environment
Canada
7.2 Weather Predictions
Observing the difference in accumulated melon predictions as a function of weather data, we
decide to compare the accuracy of the weather forecasted by SWS and EC to the weather
recorded at the NSCC and Kentville weather stations. Figures 13, 14 and 15 show the results.
Figure 13 - RMSE of Environment Canada/Scotia Weather Services predictions of maximum
daily temperature versus Environment Canada/NSCC Station 60 observed data
Lasted Revised: 2/9/16 8:26 PM
23
Scotia Weather Services, Inc. – NSERC Engage Project Report
Figure 14 - RMSE of Environment Canada/Scotia Weather Services predictions of minimum daily
temperature versus Environment Canada/NSCC Station 60 observed data
Figure 15 - RMSE of Environment Canada/Scotia Weather Services predictions of degree days
versus Environment Canada/NSCC Station 60 observed data
SWS NWP prediction system was totally automated, so error, particularly in the minimum daily
temperature, may have come from several different sources. Possibilities included the amount
of cloud that occurred, or that the soil profile may have been different than expected resulting
in different night time cooling. SWS recommends future investigation of the temperature
prediction anomalies.
8. Summary and Recommendations
Lasted Revised: 2/9/16 8:26 PM
24
Scotia Weather Services, Inc. – NSERC Engage Project Report
8.1 Summary
The goal of this project was to develop and improve on models that can predict the number of
cantaloupe melons available to harvest. The Acadia group worked with VF, NSCC, and SWS to
complete four research and development activities:
8.1.1 Determine Prediction Variables
We determined that the GDD was the most important of easily accessible weather variables,
and that adding in additional variables did not seem to improve results. Although we suspected
that soil and solar data would be helpful, we were unable to find a reliable source of observed
data for the location.
8.1.2 Build and Test Predictive Models
We built multiple models to try to improve our predictions, and eventually we had models that
had similar 60/20/20 cross validation accuracies. We then took 80% of the data as training and
20% as a validation set and built a final model that would be used to generate predictions for
2014. We estimated the error in the predictions based on the average error of the cross
validation models.
We took the weather predictions that we obtained from either EC or SWS and produced melon
forecasts from mid-July until the end of September. Each forecast used the previous 14 days of
GDDs and melon harvests to predict the melon harvest for the current day and another 13 days
into the future. We produced 15 forecasts from July 18 to September 12, which covered most
of the 2014 growing season. The forecasts were emailed out to all participants in the project.
We received the actual harvest information infrequently, so most of our predictions used our
previous predictions as inputs.
8.1.3 Install and Test Equipment
Originally we had planned to install a small weather monitoring station (temperature logger) in
one of VF’s fields, and place several cameras to record the growth of the melons throughout
the summer. We did purchase and install two automated cameras that recorded local
temperature on the images for much of the harvest season. Unfortunately, one of the cameras
was stolen in late September taken with it images of the later part of the harvest season.
[We need to complete a comparison of camera temperature to EC, NSCC and SWS
temperatures]
Lasted Revised: 2/9/16 8:26 PM
25
Scotia Weather Services, Inc. – NSERC Engage Project Report
8.1.4 Evaluate Predictions for 2014
Throughout the time we were producing melon forecasts we also recorded our predictions of
the melon harvest and the weather predictions we received. The evaluation of the melon
harvest predictions were hampered by the lack of actual harvest data, so the analysis did not
yield results until after we had halted our forecasting. The weather evaluation also primarily
took place after we had stopped generating forecasts.
The melon harvest evaluation showed that we dramatically overestimated the number of
melons picked; however, the actual harvest data was the number actually picked, not the
number available. At several points during the summer, the market demand was low enough
that no melons were picked even though there were some available. We also found that our
predictions became less accurate as we predicted multiple days into the future, but that the
accuracy actually improved beyond 10 days. The cause for this is uncertain, but it may be
related to the weather forecasts that were used as inputs to our models.
The evaluation of the weather forecast data that we received showed that consistently the EC
predictions were better than those of SWS, when compared to either of the locations the
weather data was recorded (EC or NSCC stations). The SWS predictions generally increased in
error over all 14 days of predictions, while the EC prediction error plateaued or even decreased
after approximately 10 days out. The larger SWS prediction errors may have been because the
SWS system was completely automated, and they recommended further investigation.
8.2 Recommendations
-
We were predicting crop yield. But harvest was driven largely by the market. Next year VF
will not grow as many melons. Consider major issue for farmers – is market model a need.
Use cameras in field to collect agricultural activities and interventions
Place weather monitoring station in field to capture local observed weather and calibrate
weather forecasts
Consider better ways to record actual harvest information daily – ground truth for target
values
Determine additional markets for harvest forecasting
What does Andy feel about the utility of this system?
Lasted Revised: 2/9/16 8:26 PM
26
Scotia Weather Services, Inc. – NSERC Engage Project Report
9. Requirements for Automated Forecasting System
??
10.
References
[1] S. Veeenadhari, Bharat Misra, and C.D. Singh. Data Mining Techniques fro
Predicting Crop Productivity – A Review Article. IJCST 2:1, p. 98-100, 2011.
[2] S. Jenni, K. Stewart, G. Bourgeois and D. Cloutier. Predicting Yield and Time to
Maturity of. Muskmelons from Weather and Crop Observations. J. Amer. Soc. Hort. Sci.
123:2, p. 195–201. 1998.
[3] Moh. Hajiyan. Early Prediction of Crop Yield. Technical Report, School of
Engineering, University of Guelph, 2012.
[4] Raju Paaswan and Shahin Begum. Regression and Neural Networks Models for
Prediction of Crop Production. Int. J. of Science and Engineering Research, 14:9, p.9810, 2013.
[5] Byoug-Hoon Lee, Phil Kenkel and B.Wade Brorsen. Pre-harvest Forecasting of
County Wheat Yield and Wheat Quality using Weather Information. Agriculture and
Forest Meteoroogy, 2013:168, p.26-35, 2012.
[6] Jiri Statstny, Valdimir Konecny, and Oldrich Trenz. Agricultural Data Prediction by
Means of Neural Network. Agric. Econ. – Czech, 57:7, p. 356-361, 2011.
[7] S. Jenni, D. Cloutier, G. Bourgeois and K. Stewart. A Heat Unit Model to Predict
Growth and Development of Muskmelon to Anthesis of Perfect Flowers
J. Amer. Soc. Hort. Sci. 121:2, p. 274–280. 1996.
[8] J.T. Baker, D.I. Leskovar, V.R. Reddy and F.J. Dainello. A Simple Phenological
Model of Muskmelon Development. Annals of Botany 87, p.615-621, 2001.
[9] J.T. Baker and V. R. Reddy. Temperature Effects on Phenological Development and
Yield of Muskmelon. Annals of Botany 87, p.605-613, 2001.
Lasted Revised: 2/9/16 8:26 PM
27
Scotia Weather Services, Inc. – NSERC Engage Project Report
Appendices
A. Meta Data Reports
Table A.1 - Combined MDR for Kentville Weather and Melon Harvest Data
Table A.2 - Initial MDR for Station 60 Weather Data from NSCC
Lasted Revised: 2/9/16 8:26 PM
28
Scotia Weather Services, Inc. – NSERC Engage Project Report
Table A.3 - Missing Data from Station 60 Weather Data from NSCC
B. Observed Melon and Weather Data for 2010 - 2012
Lasted Revised: 2/9/16 8:26 PM
29
Scotia Weather Services, Inc. – NSERC Engage Project Report
Figure B.1 – 2010 melon and weather data
Figure B.2 – 2011 melon and weather data
Lasted Revised: 2/9/16 8:26 PM
30
Scotia Weather Services, Inc. – NSERC Engage Project Report
Figure B.2 – 2012 melon and weather data
C. Model Building Results
Results from Accumulation Models
Table C.1 - RMSE for models built using various weather variables
Table C.2 - RMSE of 5 fold cross validation models for all years combined and each year
individually
Lasted Revised: 2/9/16 8:26 PM
31
Scotia Weather Services, Inc. – NSERC Engage Project Report
D. 2014 Melon Prediction Accuracy
???
E. 2014 Weather Prediction Accuracy
Table E.1 - RMSE of Environment Canada/Scotia Weather Services predictions of daily
temperature extremes and degree days versus Environment Canada/NSCC Station 60 observed
data
***What other data should be here?***
Lasted Revised: 2/9/16 8:26 PM
32
Download