Failure can be good. I don’t mean in the “learn from your
mistakes” kind of way. Or in the “fail year after year to even put up a good
fight in the Big
Game” kind of way. But failure is often the sidekick of innovation and
risk-taking. In places with lots of innovation, like Silicon Valley, failure
doesn’t have the stigma it has in other places. Because people understand that
being new and different is the best way to innovate, but also the best way to
fail.
Another area that failure can be really useful is in making
predictions. Not that bad predictions are useful in themselves, but they can be
useful if they are bad in different ways than other predictions. Then averaging
predictions together can result in something quite good. The same way that a
chorus can sound good even if each person is singing off key.
We have a paper
out today that provides another example of this, in the context of
predicting wheat yield responses to warming. By “we” I mean a group of authors
that contributed 30 different models (29 “process-based” models and 1
statistical model by yours truly), led by Senthold Asseng at U Florida. I think
a few of the results are pretty interesting. For starters, the study used previously
unpublished data that includes experiments with artificial heating, so the
models had to predict not only average yields but responses to warming when
everything else was held constant. Also, the study was designed in phases so
that models were first asked to make predictions knowing only the sow date and
weather data for each experiment. Then they were allowed to calibrate to
phenology (flowering and maturity dates), then to some of the yield data. Not
only was the multi-model median better than any individual model, but it didn’t
really improve much as the models were allowed to calibrate more closely (although
the individual model predictions themselves improved). This suggests that using
lots of models makes it much less important to have each model try as hard as
they can to be right.
A couple of other aspects were particularly interesting to
me. One was how well the one statistical model did relative to the others (but
you’ll have to dig in the supplement to see that). Another was that, when the
multi-model median was used to look at responses to both past and projected
warming at 30 sites around the world, 20 of the 30 sites showed negative
impacts of past warming (for 1981-2010, see figure below). That agrees well with
our high level statement in the IPCC report that negative impacts of warming
have been more common than positive ones. (I actually pushed to add this
analysis after I got grilled at the plenary meeting on our statement being
based on too few data points. I guess this an example of policy-makers feeding
back to science).
Add this recent paper to a bunch of others showing how
multi-model medians perform well (like here,
here,
and here),
and I think the paradigm for prediction in the cropping world has shifted to
using at least 5 models, probably more. So what’s the paradox part? From my
perspective, the main one is that there’s not a whole lot of incentive for
modeling groups to participate in these projects. Of course there’s the benefit
of access to new datasets, and insights into processes that can be had by
comparing models. But they take a lot of time, groups generally are not
receiving funds to participate, there is not much intellectual innovation to be
found in running a bunch of simulations and handing them off, and the resulting
publications have so many authors that most of them get very little credit. In
short, I was happy to participate, especially since I had a wheat model ready
from a previous paper, but it was a non-trivial amount of work and I don’t
think I could advise one of my students to get heavily involved.
So here’s the situation: making accurate predictions is one
of the loftiest goals of science, but the incentives to pursue the best way to make predictions (multi-model ensembles) are slim in most
fields. The main areas I know of with long-standing examples of predictions
using large ensembles are weather (including seasonal forecasts) or political
elections. In both cases the individual models are run by agencies or polling
groups with specific tasks, not scientists trying to push new research frontiers.
In the long-term, I’d guess the benefit of better crop predictions on seasonal
to multi-decadal time scales would probably be worth the investment by USDA and
their counterparts around the world to operationalize multi-model approaches. But relying on the existing incentives for
research scientists doesn’t seem like a sustainable model.
No comments:
Post a Comment