(But not as much fun as watching Stanford dominate Oregon
last night).
In a recent post I discussed the potential of multivariate
adaptive regression splines (MARS) for crop analysis, particularly because they
offer a simple way of dealing with asymmetric and nonlinear relationships. Here
I continue from where I left off, so see previous post first if you haven’t
already.
Using the APSIM simulations (for a single site) to train
MARS resulted in the selection of four variables. One of them was related to
radiation which we don’t have good data on, so here I will just take three of
them, which were related to: July Tmax, May-August Tmax, and May-August Precipitation.
Now, the key point is we are not using those variables as the predictors themselves,
but instead using hinge functions based on them. The below figure shows
specifically what thresholds I am using (based on the MARS results from
previous post) to define the basis hinge functions.
I then compute these predictor values for each county-year
observation in a panel dataset of US corn yields, then subtract county means
from all variables (equivalent to introducing county fixed effects), and fit three
different regression models:
Model 1: Just
quadratic year trends (log(Yield) ~ year + year^2). This serves as a reference “no-weather”
model.
Model 2: log(Yield)
~ year + year^2 + GDD + EDD + prec + prec^2. This model adds the
predictors we normally use based on Wolfram and Mike’s 2009 paper, with GDD and
EDD meaning growing degree days between 8 and 29 °C and extreme degree days (above
29 °C). Note these measures rely on daily Tmin and Tmax data to compute the
degree days.
Model 3: log(Yield)
~ year + year^2 + the three predictors
shown in the figure above. Note these are based only on monthly average Tmax or
total precipitation.
The table below shows the calibration error as well as the mean
out-of-sample error for each model. What’s interesting here is that the model
with the three hinge functions performs just as well as (actually even a little
better than) the one based on degree day calculations. This is particularly
surprising since the hinge functions (1) use only monthly data and (2) were derived from simulations at a single
site in Iowa. Apparently they are representative enough to result in a pretty
good model for the entire rainfed Corn Belt.
Model
|
Calibration R2
|
Average root mean square error for calibration
|
Average root mean square error for out-of-sample data
(for 500 runs)
|
% reduction
in out-of-sample error
|
1
|
0.59
|
0.270
|
.285
|
--
|
2
|
0.66
|
0.241
|
.259
|
8.9
|
3*
|
0.68
|
0.235
|
.254
|
10.7
|
The take home here for me is that even a few predictors
based on monthly data can tell you a lot about crop yields, BUT it’s important
to account for asymmetries. Hinge functions let you do that, and process-based crop
models can help identify the right hinge functions (although there are probably
other ways to do that too).
So I think this is overall a promising approach –
one could use selected crop model simulations from around the world, such as
those out of agmip, to identify hinge functions for different cropping systems,
and then use these to build robust and simple empirical models for actual
yields. Alas I probably won’t have time to develop it much in the foreseeable
future, but hopefully this post will inspire something.
No comments:
Post a Comment