Monday, December 22, 2014

Prettiest pictures of 2014

The next person that says big data, puts a fiver in the "most overused terms in meetings in the year 2014" jar. I am excited about the opportunities of ever larger micro datasets, but even more thrilled by how much thought is going into the visualization of these datasets. One of my favorite macho nerd blogs Gizmodo just put up a number of the 2014 best data visualizations. If you also think that these are of so purdy, come take Sol's class at GSPP where he will teach you how to make graphs worthy of this brave new word.


Source: Gizmodo. 

Predictions, paradigms, and paradoxes

Failure can be good. I don’t mean in the “learn from your mistakes” kind of way. Or in the “fail year after year to even put up a good fight in the Big Game” kind of way. But failure is often the sidekick of innovation and risk-taking. In places with lots of innovation, like Silicon Valley, failure doesn’t have the stigma it has in other places. Because people understand that being new and different is the best way to innovate, but also the best way to fail.

Another area that failure can be really useful is in making predictions. Not that bad predictions are useful in themselves, but they can be useful if they are bad in different ways than other predictions. Then averaging predictions together can result in something quite good. The same way that a chorus can sound good even if each person is singing off key.

We have a paper out today that provides another example of this, in the context of predicting wheat yield responses to warming. By “we” I mean a group of authors that contributed 30 different models (29 “process-based” models and 1 statistical model by yours truly), led by Senthold Asseng at U Florida. I think a few of the results are pretty interesting. For starters, the study used previously unpublished data that includes experiments with artificial heating, so the models had to predict not only average yields but responses to warming when everything else was held constant. Also, the study was designed in phases so that models were first asked to make predictions knowing only the sow date and weather data for each experiment. Then they were allowed to calibrate to phenology (flowering and maturity dates), then to some of the yield data. Not only was the multi-model median better than any individual model, but it didn’t really improve much as the models were allowed to calibrate more closely (although the individual model predictions themselves improved). This suggests that using lots of models makes it much less important to have each model try as hard as they can to be right.

A couple of other aspects were particularly interesting to me. One was how well the one statistical model did relative to the others (but you’ll have to dig in the supplement to see that). Another was that, when the multi-model median was used to look at responses to both past and projected warming at 30 sites around the world, 20 of the 30 sites showed negative impacts of past warming (for 1981-2010, see figure below). That agrees well with our high level statement in the IPCC report that negative impacts of warming have been more common than positive ones. (I actually pushed to add this analysis after I got grilled at the plenary meeting on our statement being based on too few data points. I guess this an example of policy-makers feeding back to science).

Add this recent paper to a bunch of others showing how multi-model medians perform well (like here, here, and here), and I think the paradigm for prediction in the cropping world has shifted to using at least 5 models, probably more. So what’s the paradox part? From my perspective, the main one is that there’s not a whole lot of incentive for modeling groups to participate in these projects. Of course there’s the benefit of access to new datasets, and insights into processes that can be had by comparing models. But they take a lot of time, groups generally are not receiving funds to participate, there is not much intellectual innovation to be found in running a bunch of simulations and handing them off, and the resulting publications have so many authors that most of them get very little credit. In short, I was happy to participate, especially since I had a wheat model ready from a previous paper, but it was a non-trivial amount of work and I don’t think I could advise one of my students to get heavily involved.


So here’s the situation: making accurate predictions is one of the loftiest goals of science, but the incentives to pursue the best way to make predictions (multi-model ensembles) are slim in most fields. The main areas I know of with long-standing examples of predictions using large ensembles are weather (including seasonal forecasts) or political elections. In both cases the individual models are run by agencies or polling groups with specific tasks, not scientists trying to push new research frontiers. In the long-term, I’d guess the benefit of better crop predictions on seasonal to multi-decadal time scales would probably be worth the investment by USDA and their counterparts around the world to operationalize multi-model approaches. But relying on the existing incentives for research scientists doesn’t seem like a sustainable model.

Wednesday, December 10, 2014

Adapting to extreme heat

Since we are nearing the holidays, I figured I should write something a bit more cheerful and encouraging than my standard line on how we are all going to starve.  My coauthor Michael Roberts and I have emphasized for a while the detrimental effect of extreme heat on corn yields and the implications for a warming planet.  When we looked at the sensitivity to extreme heat over time, we found an improvement (i.e., less susceptibility) roughly around the time hybrids were introduced in the 30s,  but that improvement soon vanished again around the 1960s.   Heat is as susceptible to heat now as it was in 1930.  Our study simply allowed the effect of extreme heat to vary smoothly across time, but wasn't tied to a particular event.

David Popp has been working a lot on innovation and he suggested to look at the effect of hybrid corn adaptation on the sensitivity to extreme heat in more detail.  Richard Sutch had a nice article on how hybrid corn was adopted slowly across states, but fairly quickly within each state.  David and I thought we could use the fairly rapid rollout within state but slow rollout across state as source of identification of the role of extreme heat. Here's a new graph of the rollout by state:
The first step was to extended the daily weather data back to 1901 to take a look at the effect of extreme heat on corn yields over time - we wanted a pre-period to rule out that crappy weather data in the early 1900s results in a lot of attenuation bias but get significant results with comparable coefficients when we use data from the first three decades of the 20th century.

In a second step we interact the weather variables with the fraction of the planted area that is hybrid corn. We find evidence that the introduction of corn is reducing the sensitivity of hybrid corn from -0.53 to -0.33 in the most flexible specification in column (3b) below, which is an almost 40% reduction. Furthermore, the sensitivity to precipitation fluctuations seems to diminish as well. (Disclaimer: these are new results, so they might change a bit once I get rid of my coding errors).
The table regresses state-level yields in 41 states on weather outcomes.  All regressions include state-fixed effects as well as quadratic time trends. Columns (b) furthermore include year fixed effects to pick up common shocks (e.g., global corn prices). Columns (1a)-(1b) replicate the standard regression equation we have been estimating before, columns (2a)-(2b) allow the effect of extreme heat to change with the fraction of hybrid corn that is planted, while columns (3a)-(3b) allow the effect of all four weather variables to change in the fraction of hybrid corn.

In summary: there is evidence that at least for the time period when hybrid corn were adopted that innovation in crop varieties lead to an improvement in heat tolerance, which would be extremely useful as climate change is increasing the frequency of these harmful temperatures.  On that (slightly more upbeat note): happy holidays.