Failure can be good. I don’t mean in the “learn from your mistakes” kind of way. Or in the “fail year after year to even put up a good fight in the Big Game” kind of way. But failure is often the sidekick of innovation and risk-taking. In places with lots of innovation, like Silicon Valley, failure doesn’t have the stigma it has in other places. Because people understand that being new and different is the best way to innovate, but also the best way to fail.
Another area that failure can be really useful is in making predictions. Not that bad predictions are useful in themselves, but they can be useful if they are bad in different ways than other predictions. Then averaging predictions together can result in something quite good. The same way that a chorus can sound good even if each person is singing off key.
We have a paper out today that provides another example of this, in the context of predicting wheat yield responses to warming. By “we” I mean a group of authors that contributed 30 different models (29 “process-based” models and 1 statistical model by yours truly), led by Senthold Asseng at U Florida. I think a few of the results are pretty interesting. For starters, the study used previously unpublished data that includes experiments with artificial heating, so the models had to predict not only average yields but responses to warming when everything else was held constant. Also, the study was designed in phases so that models were first asked to make predictions knowing only the sow date and weather data for each experiment. Then they were allowed to calibrate to phenology (flowering and maturity dates), then to some of the yield data. Not only was the multi-model median better than any individual model, but it didn’t really improve much as the models were allowed to calibrate more closely (although the individual model predictions themselves improved). This suggests that using lots of models makes it much less important to have each model try as hard as they can to be right.
A couple of other aspects were particularly interesting to me. One was how well the one statistical model did relative to the others (but you’ll have to dig in the supplement to see that). Another was that, when the multi-model median was used to look at responses to both past and projected warming at 30 sites around the world, 20 of the 30 sites showed negative impacts of past warming (for 1981-2010, see figure below). That agrees well with our high level statement in the IPCC report that negative impacts of warming have been more common than positive ones. (I actually pushed to add this analysis after I got grilled at the plenary meeting on our statement being based on too few data points. I guess this an example of policy-makers feeding back to science).
Add this recent paper to a bunch of others showing how multi-model medians perform well (like here, here, and here), and I think the paradigm for prediction in the cropping world has shifted to using at least 5 models, probably more. So what’s the paradox part? From my perspective, the main one is that there’s not a whole lot of incentive for modeling groups to participate in these projects. Of course there’s the benefit of access to new datasets, and insights into processes that can be had by comparing models. But they take a lot of time, groups generally are not receiving funds to participate, there is not much intellectual innovation to be found in running a bunch of simulations and handing them off, and the resulting publications have so many authors that most of them get very little credit. In short, I was happy to participate, especially since I had a wheat model ready from a previous paper, but it was a non-trivial amount of work and I don’t think I could advise one of my students to get heavily involved.
So here’s the situation: making accurate predictions is one of the loftiest goals of science, but the incentives to pursue the best way to make predictions (multi-model ensembles) are slim in most fields. The main areas I know of with long-standing examples of predictions using large ensembles are weather (including seasonal forecasts) or political elections. In both cases the individual models are run by agencies or polling groups with specific tasks, not scientists trying to push new research frontiers. In the long-term, I’d guess the benefit of better crop predictions on seasonal to multi-decadal time scales would probably be worth the investment by USDA and their counterparts around the world to operationalize multi-model approaches. But relying on the existing incentives for research scientists doesn’t seem like a sustainable model.