Showing posts with label yields. Show all posts
Showing posts with label yields. Show all posts

Monday, August 31, 2020

Seminar talk on Spatial First Differences

Hannah Druckenmiller and I wrote a paper on a new method we developed that allows you to do cross-sectional regressions while eliminating a lot (or all) omitted variables bias. We are really excited about it and it seems to be performing well in a bunch of cases

Since people seem to like watching TV more than reading math, here's a recent talk I gave explaining the method to the All New Zealand Economics Series. (Since we don't want to discriminate against the math-reading-folks-who-don't-want-to-read-the-whole-paper, we'll also post a blog post explaining the basics...)

If you want to try it out, code is here.

Wednesday, February 15, 2017

Some scintillating satellite studies

We’ve had a couple of papers come out that might be of interest to readers of the blog. Both relate to using satellites to measure crop yields for individual fields. This type of data is very hard to come by from ground-based sources. In many places, especially poor ones, farmers simply don’t keep records on production on a field by field basis. In other places, like the U.S., individual farmers often have data, and government agencies or private companies often have data for lots of individuals, but it’s typically not made available to public researchers.

All of this is bad for science. The less data you have on yields, the less we can understand how to improve yields or reduce environmental impacts of crop production. We now know exactly how much a catcher’s glove moves each time they frame a pitch, exactly how well every NBA player shoots from every spot on the floor, and exactly how fast LeBron James flops to ground when he is touched by an opposing player. And all of this knowledge has helped improve team strategies and player performance.

But for agriculture, with all the talk of “big data”, we are still often shooting in the dark. And in some cases it’s getting worse, such as this recent post at farmdoc daily about how the farmer response rate to area and production surveys is fading faster than my hairline (but not quite as fast as Sol’s):


As we’ve discussed before on this blog (e.g., here and here), satellites offer a potential workaround. They won’t be perfect, but anyone who has done phone or field surveys, or even crop cuts within fields, knows that they have problems too. Plus, satellites are getting cheaper and better, and it’s also becoming easier to work with past satellite data thanks to platforms like Google Earth Engine.

Now to the studies. Both employ what we are calling SCYM, which stands either for Scalable Crop Yield Mapper or Steph Curry’s Your MVP, depending on the context. In the first context, the basic idea is to eliminate any reliance on ground calibration by using simulations to calibrate regression models. 

The first study (joint with George Azzari) applied SCYM to Landsat data in the Corn Belt in the U.S. for 2000-2015, and then analyzed the resulting patterns and trends for maize and soybean yields. Here’s the link to the paper, and a brief animation we had made to accompany the story. Also, we’ve made the maps of average yields available for people to visualize and inspect here. Below is a figure that summarizes what I thought was the most interesting finding: that maize yield distributions are getting wider over time.


Overall maize yields are still increasing, but mainly this is driven by growth in high yielding areas. This is true at the scale of counties, and also within individual fields. The paper discusses reasons why (e.g. uptake of precision ag) and possible implications. Maybe I’ll return to it in a future post. A short side note is that since that paper was written we’ve expanded the dataset to a nine state area, and made some useful tweaks to SCYM to improve performance.

The second paper (joint with Marshall) looks at yield mapping with some of the newer high-res sensors in smallholder systems. In this case, we looked at two years of data in Western Kenya, using detailed surveys with hundreds of farmers to test our yield estimates, which were derived using 1m resolution imagery from Terra Bella. The results were pretty good, especially when you consider only fields above half an acre, where problems with self-reported yields are not as big. Just like in the US, the satellites reveal a lot of yield heterogeneity both within and between fields (the top image shows the true-color image, the bottom shows the derived yield estimates for maize fields):



One interesting result is that in many ways it looks like the satellite estimates are no less accurate than the (expensive) field surveys. For example, when we correlate yields with inputs that we think should affect yields, like fertilizer or seed density, we see roughly equal correlations when using satellite or self-report yields.



None of this is to say that satellite-based yields are perfect. But that was never the goal. The goal is to get insights into what factors are (or aren’t) important for yields in different settings. Both studies show how satellites can provide insights that match or even go beyond what is possible with traditional yield measures. We hope to continue to test and apply these approaches in new settings, and plan to make the datasets available to researchers, probably at this site.  

Friday, August 5, 2016

A midsummer’s cross-section

Summer is going too fast. It seems like just yesterday Lebron James was being a sore loser, body slamming and stepping over people – and getting rewarded for it by the NBA. Apart from that, one interesting experience this summer was getting to visit some very different maize (corn) fields within a few weeks in July. First, I was in Kenya and Uganda at some field sites, and then I was visiting some farms in Iowa.

When talking maize, it’s hard to get much different than East Africa and East Iowa. As a national average, Kenya produces a bit less than 4 million tons of maize on 2 million ha, for a yield of about 1.75 t/ha. Iowa has about seven times higher yield (12.5 t/ha), and produces nearly twenty times more maize grain. The pictures below give a sense of a typical field in each place (Kenya on the left).


Lots of things are obviously different between the two areas. There are also some things that people might think are different but really aren’t. For example, looking at annual rainfall or summer temperatures, they are pretty similar for the two areas (figures from www.climatemps.com, note different scales):



But there are also things that are less obviously different. Earlier this year I read this interesting report trying to estimate soil water holding capacity in Africa, and I’ve also been working a bunch with soil datasets in the U.S. from the USDA. Below shows the total capacity of the soil to store water in the root zone (in mm) for the two areas, plotted on the same scale. 

It’s common for people to talk about the “deep” soils of the Corn Belt, but I don’t think people typically realize just how much better they are at storing water than many other places. There’s virtually no overlap between the distribution of root zone storage in the two areas, and on average Kenya soils have about half the capacity of Iowa’s.

How much difference can this one factor make? As a quick thought experiment I ran some APSIM-maize simulations for a typical management and weather setup in Iowa, varying only the root zone storage capacity between 150 and 330mm. Simulated yields by year are shown below, with dashed lines showing the mean for each soil. 


This suggests that having half the storage capacity translates to roughly half the average yields, with much bigger relative drops in hot or dry years like 1988 or 2012. And this assumes that management is identical, when a rational farmer would surely respond by applying much less inputs to the worse soils.


Just something to keep in mind when thinking about the potential for productivity growth in Africa. There’s certainly room for growth, and I saw a lot of promising trends. But just like when it comes to the NBA officials, there's a lot going on under the surface, and I wouldn’t expect too much. 

Monday, May 11, 2015

Introducing SCYM

In the hopes of figuring out how to raise crop yields or farmer incomes around the world, it would be really nice if we had a quick and accurate way of actually measuring yields for individual fields. That has motivated a lot of work over the years on using satellite data, and we have a paper out this week describing another step in that direction.

As I see it there are three main ingredients needed for yield remote sensing to be successful on a meaningful scale. One is the raw data. As Marshall’s recent post explained, there are several new satellite data providers that are really transforming our ability to track individual fields, even in smallholder areas.

Second is the ability to process the data at scale. Five years ago, for example, I would have to hire a research assistant to download imagery, make sure it was geometrically and radiometrically calibrated (i.e. properly lined up and in meaningful units), and then apply whatever algorithms we had. That just didn’t scale very well, in terms of labor or on-site data storage or processing. When a collaborator would ask “could you produce yield estimates for my study area,” I would have to think about how many weeks or months of work that would entail. But a couple of years ago I was introduced to Google’s Earth Engine, which is “a planetary-scale platform for environmental data & analysis.” In practical terms, it means that they have a lot of geospatial data (including all historical Landsat imagery), a lot of built-in algorithms for processing, and an interface to run your own code on the data and visualize or save the output. Part of why it works is that data providers, like the USGS for Landsat, have gotten better at providing well calibrated data. Earth Engine is very cool, and the more I’ve worked with it, the more I can see how this transforms our ability to extract value out of data already collected.

Third, and arguably the rate-limiting step nowadays, is to have algorithms that can translate satellite data into accurate yield estimates. It’s easy enough to do this if you have lots of ground data to calibrate to for a particular site, but that’s generally not scalable (unless people get clever about crowdsourcing ground “truth”). What seemed to be lacking was a very generic, scalable algorithm. So in the last 8 months or so we’ve been working to develop and test one idea about how to do this. I’m calling it a scalable satellite-based crop yield mapper (SCYM, pronounced “skim”), and a description of it has just been published in Remote Sensing of Environment. Conveniently, SCYM also stands for Steph Curry’s Your MVP.

The basic idea is that if you don’t have lots of ground data to calibrate a model, why not generate lots of fake ground data? Then for whatever combination of observations you actually have (say, for instance, satellite images on 2 or 3 specific days, and measures of daily weather), you can look into your fake data to see what the best fit model is to predict the desired variable (“yield”) from the measured predictors. The paper provides more detail, which I won’t bore readers with here. But to give a sense of the type of output, below shows an animation of our maize yield estimates over part of Iowa for 2008-2013. Red are high yields, blue are low.


The figure below shows a comparison between these and ground “truth” estimates for maize, which we take from the dataset described in a previous post.




The cool thing about this is that it’s quite generic. To illustrate that, we reran the model for soybeans, with results nearly as good as for maize.


Hopefully this type of thing will help make faster progress on understanding yields and farm productivity, and figuring out what actually works for improving them. One general lesson out of this for me is that sometimes making something really scalable requires scrapping an old approach. We had been previously running crop models for specific sites and years, but that wasn't possible within the Earth Engine system. I think SCYM (which trains a regression using simulations over lots of sites and years) is more robust than what we had, and along with the new satellite data and Earth Engine-type systems, it might just provide a way to do yield mapping at scale.


Wednesday, February 11, 2015

Measuring yields from space


[This post is co-written with Florence Kondylis at the World Bank, and a similar version was also posted over at the World Bank's Development Impact blog.]

One morning last August a number of economists, engineers, Silicon Valley players, donors, and policymakers met on the UC-Berkeley campus to discuss frontier topics in measuring development outcomes. The idea behind the event was not that economists could ask experts to create measurement tools they need, but instead that measurement scientists could tell economists about what was going on at the frontier of measuring development-related outcomes.  One topic that generated a lot of excitement -- likely due to David Lobell's charm at the podium -- was the potential for a new crop of satellites to remote-sense (i.e. measure) important development outcomes.

Why satellite-based remote sensing?
The potential ability to use satellites to measure common development outcomes of interest excites researchers and practitioners for a number of reasons, chief among them the amount of time and money we typically have to spend to measure these outcomes the “traditional” way (e.g. conducting surveys of households or firms).  Instead of writing large grants, spending days traveling to remote field sites, hiring and training enumerators, and dealing with inevitable survey hiccups, what if instead you could sit at home in your pajamas and, with a few clicks of a mouse, download the data you needed to study the impacts of a particular program or intervention?

The vision of this “remote-sensing” based approach to research is clearly intoxicating, and is being bolstered by the vast amount of high-resolution satellite imagery that is now being acquired and made available.  The recent rise of “nano-“ or “micro”-satellite technology – basically, fleets of cheap, small satellites that image the earth in high temporal and spatial resolution, such as those being deployed by our partner Skybox – could hold particular promise for measuring the types of outcomes that development folks often care about.  This is perhaps most obviously true in agriculture, where unlike in the manufacturing sector, most production takes place outside.

How does it work?
For most agricultural crops – particularly the staple crops grown by African smallholders, such as maize – pretty much anyone can look at a field and see the basic difference between a healthy highly productive crop and low-yielding crop that is nutrient or moisture stressed.  One main clue is color: healthy vegetation reflects and absorbs different wavelengths of light than less-healthy vegetation, which is why leaves on healthy maize plants look deep green and leaves on stressed or dead plants look brown.  Sensors on satellites can also discern these differences in the visual wavelengths, but they also measure differences at other wavelengths, and this turns out to be particularly useful for agriculture.  Healthy vegetation, in turns out, absorbs light in the visible spectrum and reflects strongly in the near infrared (which the human eye can’t see), and simple ratios of reflectance at these two wavelengths form the basis of most satellite-based measures of vegetative vigor – e.g. the familiar Normalized Difference Vegetation Index, or NDVI.   High ratios basically tell you that you’re looking at plants with a lot of big, healthy leaves.

The trick is then to be able to map these satellite-derived vegetation indices into measures of crop yields.  There are two basic approaches (see David's nice review article for more detail).  The first combines satellite vegetation indicies with on-farm yield observations as collected from the typical household or agricultural surveys.  By regressing the “true” survey-based yield measure on the satellite-based vegetation index, you get an estimated relationship between the two that can then be applied to other agricultural plots that you observe in the satellite data but did not survey on the ground.  The second approach combines the satellite data with pre-existing estimates of the relationship between final yield and vegetative vigor under various growing conditions (often as derived from a crop simulation model, which you can think of as an agronomist’s version of a structural model). Applying satellite reflectance measures to these relationships can then be used to estimate yield on a given plot.  A nice feature of this second approach is that it is often straightforward to account for the role of other time-varying factors (e.g. weather) that also affect the relationship between vegetation and final yield.

How well does it work?
These approaches have mainly been applied to larger farm plots in the developed and developing world, at least in part because until very recently the available satellite imagery was generally too coarse to resolve the very small plot sizes (e.g. less than half an acre) common in much of Africa.  For instance, the resolution of the MODIS sensor is 250m x 250m, meaning one pixel would cover more than 15 one-acre plots. Nevertheless, these approaches have been shown to work surprisingly well on these larger fields.  Below are two plots, both again from David and co-author's work, showing the relationship between predicted and observed yields for wheat in Northern Mexico, and maize (aka “corn”, for Americans) in the US Great Plains.   Average plot sizes in both cases are > 20 hectares, equivalent to at least 50 one-acre plots.


Top plot:  wheat yields in northern Mexico, from Lobell et al 2005.  Bottom plot:  corn yields in the US Great Plains, from Sibley et al 2014



Although success is somewhat in the eyes of the beholder here, the fit between observed and satellite-predicted yields is pretty good in both of these cases, with overall R2s of 0.63 in the US case and 0.78 in the Mexico case.  And, at least in both of these cases, the “ground truth” yield data was not actually used to construct the yield prediction – i.e. they are using the second approach described above.  This was possible in this setting because these were crops and growing conditions for which scientists have a good mechanistic understanding of how final yield relates to vegetative vigor.

From rich to poor, big to small
Applying these approaches much of the developing world (e.g. smallholder plots in Africa) has been harder.  This is not only because of the much smaller plot sizes, and thus the difficulty (impossibility, often) of resolving them in existing satellite imagery, but also because of a lack of either (i) ground truth data to develop the satellite-based predictions, and/or (ii) a satisfactory mechanistic understanding in these environments of how to map yields to reflectance measures.

New data sources from both the ground and sky are starting to make this possible.  Sensors on the new micro-satellites mentioned above often have sub-meter resolutions, meaning smallholder plots are now visible from space (a half-acre plot would be covered by over 2000 pixels).  Furthermore, this imagery is being acquired often enough to ensure at least a few cloud-free images during the growing season -- not a small problem in the rainy tropics.

Working with David and some collaborators in Kenya, Uganda, and Rwanda, we are linking this new imagery with ground-based yield data we are collecting to understand whether the satellite data can capably predict yields on heterogeneous smallholder plots.  Below is a map of some of the smallholder maize fields we have mapped and are tracking in Western Kenya, as part of an ongoing experiment with smallholder farmers in the region.

Locations of some plots we are tracking in Western Kenya

Some of the long run goals of this work are to (i) allow researchers who have already have information on plot boundaries and crop choice to use satellite images to estimate yields, and (ii) to allow researchers who do not have plot boundaries but who are interested in broader-scale agricultural performance (eg. at the village or district level) a way to track yields at that scale. This work is ongoing, but given the experience in developed countries, we are hopeful.

Some challenges.
Nevertheless, there are clear challenges to making this approach work at scale, and clear limitations (at least in the near term) to what this technology can provide.   Here are a few of the main challenges:

  1. Which boundaries and which crops. To measure outcomes at the level of the individual farm plot, satellite-based measures will be most easily employable if the researcher already knows the plot boundaries and knows what crop is being grown.   As satellite imagery improves and as computer vision algorithms are developed to remotely identify plot boundaries, both of these constraints will likely be relaxed, but the researcher will still need some ground information on which plots belong to whom. 
  2. Measurement error. Even with plot boundaries in hand, the fact that satellite imagery will not be able to perfectly predict yields means that using satellite-predicted yields as an outcome will likely reduce statistical power (although it’s not immediately clear how much noisier satellite estimates will be, given that survey based measures of these outcomes – e.g. from farmer self reports – are likely also measured with error.)  This almost certainly means that this technology will not be equipped to discern small effects in the smaller-sized ag RCTs that often get run.
  3. Moving beyond yield.  Finally, even with plot boundaries in hand and well-powered study, satellites are going to have a hard time measuring many of the other outcomes we care about – things like profits or consumption expenditure.  Satellites might in the near term be able to get at related outcomes such as assets (something we’re also working on), but it’s clearly going to be hard to observe most household expenditures directly. 


Putting these difficulties together, should we just abandon this whole satellite thing?  We think not, for two reasons.  The first reason is that as we (hopefully) improve our ability to accurately measure smallholder yields from space, this ability would provide a clear compliment to existing surveys.  For instance, if yields are a primary outcome, imagine just being able to do a baseline survey (where field boundaries are mapped) and then measure your outcome at follow-up from the sky.  This will make an entire study both faster and cheaper, which should allow for larger sample sizes, which will in turn help deal with the measurement error issue above.

Second, we still have a surprisingly poor understanding of why some farms, and some farmers, appear to be so much more productive than others.  Is it the case that relatively time-invariant factors like soil type and farmer ability explain most of the observed variation, or are time-varying factors like weather more important?  Satellite data might be particularly useful for this question (David's review paper, and his earlier G-FEED post, gives some really nice examples), because you can assemble huge samples of farm plots that can then be easily followed over time.  Satellite data in this setting therefore might afford more power, and you can do it all in your pajamas.


Saturday, January 10, 2015

Searching for critical thresholds in temperature effects: some R code



If google scholar is any guide, my 2009 paper with Wolfram Schlenker on the nonlinear effects of temperature on crop outcomes has had more impact than anything else I've been involved with.

A funny thing about that paper: Many reference it, and often claim that they are using techniques that follow that paper.  But in the end, as far as I can tell, very few seem to actually have read through the finer details of that paper or try to implement the techniques in other settings.  Granted, people have done similar things that seem inspired by that paper, but not quite the same.  Either our explication was too ambiguous or people don't have the patience to fully carry out the technique, so they take shortcuts.  Here I'm going to try to make it easier for folks to do the real thing.

So, how does one go about estimating the relationship plotted in the graph above?

Here's the essential idea:  averaging temperatures over time or space can dilute or obscure the effect of extremes.  Still, we need to aggregate, because outcomes are not measured continuously over time and space.  In agriculture, we have annual yields at the county or larger geographic level.  So, there are two essential pieces: (1) estimating the full distribution of temperatures of exposure (crops, people, or whatever) and (2) fitting a curve through the whole distribution.

The first step involves constructing the distribution of weather. This was most of the hard work in that paper, but it has since become easier, in part because finely gridded daily weather is available (see PRISM) and in part because Wolfram has made some STATA code available.  Here I'm going to supplement Wolfram's code with a little bit of R code.  Maybe the other G-FEEDers can chime in and explain how to do this stuff more easily.

First step:  find some daily, gridded weather data.  The finer scale the better.  But keep in mind that data errors can cause serious attenuation bias.  For the lower 48 since 1981, the PRISM data above is very good.  Otherwise, you might have to do your own interpolation between weather stations.  If you do this, you'll want to take some care in dealing with moving weather stations, elevation and microclimatic variations.  Even better, cross-validate interpolation techniques by leaving one weather station out at a time and seeing how well the method works. Knowing the size of the measurement error can also help correcting bias.  Almost no one does this, probably because it's very time consuming... Again, be careful, as measurement error in weather data creates very serious problems (see here and here).

Second step:  estimate the distribution of temperatures over time and space from the gridded daily weather.  There are a few ways of doing this.  We've typically fit a sine curve between the minimum and maximum temperatures to approximate the time at each degree in each day in each grid, and then aggregate over grids in a county and over all days in the growing season.  Here are a couple R functions to help you do this:

# This function estimates time (in days) when temperature is
# between t0 and t1 using sine curve interpolation.  tMin and
# tMax are vectors of day minimum and maximum temperatures over
# range of interest.  The sum of time in the interval is returned.
# noGrids is number of grids in area aggregated, each of which 
# should have exactly the same number of days in tMin and tMax
 
days.in.range <- function( t0, t1 , tMin, tMax, noGrids )  {
  n <-  length(tMin)
  t0 <-  rep(t0, n)
  t1 <-  rep(t1, n)
  t0[t0 < tMin] <-  tMin[t0 < tMin]
  t1[t1 > tMax] <-  tMax[t1 > tMax]
  u <- function(z, ind) (z[ind] - tMin[ind])/(tMax[ind] - tMin[ind])  
  outside <-  t0 > tMax | t1 < tMin
  inside <-  !outside
  time.at.range <- ( 2/pi )*( asin(u(t1,inside)) - asin(u(t0,inside)) ) 
  return( sum(time.at.range)/noGrids ) 
}

# This function calculates all 1-degree temperature intervals for 
# a given row (fips-year combination).  Note that nested objects
# must be defined in the outer environment.
aFipsYear <- function(z){
  afips    = Trows$fips[z]
  ayear    = Trows$year[z]
  tempDat  = w[ w$fips == afips & w$year==ayear, ]
  Tvect = c()
  for ( k in 1:nT ) Tvect[k] = days.in.range(
              t0   = T[k]-0.5, 
              t1   = T[k]+0.5, 
              tMin = tempDat$tMin, 
              tMax = tempDat$tMax,
              noGrids = length( unique(tempDat$gridNumber) )
              )
  Tvect
}

The first function estimates time in a temperature interval using the sine curve method.  The second function calls the first function, looping through a bunch of 1-degree temperature intervals, defined outside the function.  A nice thing about R is that you can be sloppy and write functions like this that use objects defined outside of the environment. A nice thing about writing the function this way is that it's amenable to easy parallel processing (look up 'foreach' and 'doParallel' packages).

Here are the objects defined outside the second function:

w       # weather data that includes a "fips" county ID, "gridNumber", "tMin" and "tMax".
        #   rows of w span all days, fips, years and grids being aggregated
 
tempDat #  pulls the particular fips/year of w being aggregated.
Trows   # = expand.grid( fips.index, year.index ), rows span the aggregated data set
T       # a vector of integer temperatures.  I'm approximating the distribution with 
        #   the time in each degree in the index T

To build a dataset call the second function above for each fips-year in Trows and rbind the results.

Third step:  To estimate a smooth function through the whole distribution of temperatures, you simply need to choose your functional form, linearize it, and then cross-multiply the design matrix with the temperature distribution.  For example, suppose you want to fit a cubic polynomial and your temperature bins that run from from 0 to 45 C.  The design matrix would be:

D = [    0          0          0   
            1          1           1
            2          4           8
             ...
           45     2025    91125]

These days, you might want to do something fancier than a basic polynomial, say a spline. It's up to you.  I really like restricted cubic splines, although they can over smooth around sharp kinks, which we may have in this case. We have found piecewise linear works best for predicting out of sample (hence all of our references to degree days).  If you want something really flexible, just make D and identity matrix, which effectively becomes a dummy variable for each temperature bin (the step function in the figure).  Whatever you choose, you will have a (T x K) design matrix, with K being the number of parameters in your functional form and T=46 (in this case) temperature bins. 

To get your covariates for your regression, simply cross multiply D by your frequency distribution.  Here's a simple example with restricted cubic splines:


library(Hmisc)
DMat <- rcspline.eval(0:45)
XMat <- as.matrix(TemperatureData[,3:48])%*%DMat
fit <- lm(yield~XMat, data=regData)
summary(fit)

Note that regData has the crop outcomes.  Also note that we generally include other covariates, like total precipitation during the season,  county fixed effects, time trends, etc.  All of that is pretty standard.  I'm leaving that out to focus on the nonlinear temperature bit. 

Anyway, I think this is a cool and fairly simple technique, even if some of the data management can be cumbersome.  I hope more people use it instead of just fitting to shares of days with each maximum or mean temperature, which is what most people following our work tend to do.  

In the end, all of this detail probably doesn't make a huge difference for predictions.  But it can make estimates more precise, and confidence intervals stronger.  And I think that precision also helps in pinning down mechanisms.  For example, I think this precision helped us to figure out that VPD and associated drought was a key factor underlying observed effects of extreme heat.

Friday, October 10, 2014

Will the dry get drier, and is that the right question?

A “drought” can be defined, it seems, in a million different ways. Webster’s dictionary says it’s a period of dryness especially when prolonged; specifically:  one that causes extensive damage to crops or prevents their successful growth.” Wikipedia tells me “Drought is an extended period when a region receives a deficiency in its water supply.” The urban dictionary has a different take.

But nearly all definitions share the concepts of dryness and of damage or deficiency. We’ve talked a lot on this blog about drought from an agricultural perspective, and in particular how droughts in agriculture can (or at least should) often be blamed as much on high temperatures and strong evaporative demand as on low rainfall. At the same time, there’s lots of interesting work going on trying to assess drought from a hydrological perspective. Like this recent summary by Trenberth et al.

The latest is a clever study by Greve et al. that tries to pin down whether and where droughts are becoming more or less common. They looked at lots of combinations of possible data sources for rainfall, evapotranspiration (ET) and potential evapotranspiration (ETp). They then chose those combinations that produced a reasonable relationship between E/P and ETp/P, defined as the Budyko curve, and used them to calculate trends in dryness for 1948-2005. The figure below shows their estimate of wet and dry areas and the different instances of wet areas getting wetter, wet getting drier, etc. The main point of their paper and media coverage was that these trends don’t follow the traditional expectation of WWDD (wet get wetter and dry get drier) – the idea that warming increases the water holding capacity of the air and thus amplifies existing patterns of rainfall.


Also clear in the figure is that the biggest exception to the rule appears to be wet areas getting drier. There don’t seem to be many dry areas getting wetter over the last 50 years.

Other than highlighting their nice paper, I wanted to draw attention to something that seems to get lost in all of the back-and-forth in the community looking at trends in dryness and drought, but that I often discuss with agriculture colleagues: it’s not clear how useful any of these traditional measures of drought really are. The main concept of drought is about deficiency, but deficient relative to what? The traditional measures all use a “reference” ET, with the FAO version of penman-monteith (PM) the gold standard for most hydrologists. But it’s sometimes forgotten that PM uses an arbitrary reference vegetation of a standard grass canopy. Here’s a description from the standard FAO reference:

“To avoid problems of local calibration which would require demanding and expensive studies, a hypothetical grass reference has been selected. Difficulties with a living grass reference result from the fact that the grass variety and morphology can significantly affect the evapotranspiration rate, especially during peak water use. Large differences may exist between warm-season and cool season grass types. Cool-season grasses have a lower degree of stomatal control and hence higher rates of evapotranspiration. It may be difficult to grow cool season grasses in some arid, tropical climates. The FAO Expert Consultation on Revision of FAO Methodologies for Crop Water Requirements accepted the following unambiguous definition for the reference surface:
"A hypothetical reference crop with an assumed crop height of 0.12 m, a fixed surface resistance of 70 s m-1 and an albedo of 0.23."
The reference surface closely resembles an extensive surface of green grass of uniform height, actively growing, completely shading the ground and with adequate water."

Of course, there are reasons to have a reference that is fixed in space and time – it makes it easier to compare changes in the physical environment. But if the main concern of drought is about agricultural impacts, then you have to ask yourself how much this reference really represents a modern agricultural crop. And, more generally, how relevant is the concept of a static reference in agriculture, where the crops and practices are continually changing. It’s a bit like when Dr. Evil talks about “millions of dollars” in Austin Powers.  

Here’s a quick example to illustrate the point for those of you still reading. Below is a plot I made for a recent talk that shows USDA reported corn yields for a county of Iowa where we have run crop model simulations. I then use the simulations (not shown) to define the relationship between yields and water requirements. This is a fairly tight relationship since water use and total growth are closely linked, and depends mainly on average maximum temperature. The red line then shows the maximum yield that could be expected (assuming current CO2 levels ) in a dry year, defined as the 5th percentile of historical annual rainfall. Note that for recent years, this amount of rainfall is almost always deficient and will lead to large amounts of water stress. But 50 years ago the yields were much smaller, and even a dry year provided enough water for typical crop growth (assuming not too much of it was lost to other things like runoff or soil evaporation).  


An alternative to the PM approach is to have the reference ET defined by the potential growth of the vegetation. This was described originally, also by Penman, as a “sink strength” alternative to PM, and is tested in a nice recent paper by Tom Sinclair. It would be interesting to see the community focused on trends try to account for trends in sink strength. That way they’d be looking not just at changes in the dryness part of drought, but also the deficiency part.


As someone interested in climate change, it’s nice to see continued progress on measuring trends in the physical environment. But for someone concerned about whether agriculture needs to prepare for more drought, in the sense of more water limitations to crop growth, then I think the answer in many cases is a clear yes, regardless of what’s happening to climate. As yield potential become higher and higher, the bar for what counts as "enough" water continues to rise.

Tuesday, August 12, 2014

Big swings and shock absorbers

Two years ago when we started this blog, the Midwest was going through a major drought and ended up eaking out just above 123 bushels per acre (bu/ac) in corn yield. Today the USDA released its latest projection for 2014, with a forecast for record corn yields of 167.4 bu/ac, due to really good weather (as Wolfram summarized in the last post.)

The difference of 44 bu/ac between two years this close apart is bigger than anything experienced in the history of US agriculture. The closest thing was in 1994, when yields were 139 bu/ac after just 101 in the flood year of 1993. When expressed as a percentage swing, to account for the fact that yields overall have been going up, the swing from 2012 to 2014 (36%) is near historic highs but still less than some of the swings seen in the 1980’s and early 1990’s (see figure below).


We’ve talked a lot on this blog about what contributes to big swings in production, and why it shouldn’t be surprising to see them increase over time. Partly it’s because really bad years like 2012 become more likely as warming occurs, and partly it’s because farmers are getting even better at producing high yields in good conditions. Sometimes I think of good weather like a hanging curveball – a decent hitter will probably manage a hit, but a really good hitter will probably hit a home run. So the difference between good and bad weather grows as farmers get better at hitting the easy pitches.

Moving on from bad analogies, the point of this post is to describe some of the changes we might see as the world comes to grips with more volatile production. What kind of shock absorbers will farmers and markets seek out? Four come to mind, though I can’t be sure which if any will actually turn out to be right. First, and most obvious, is that all forms of grain storage will increase. There are some interesting reports of new ways farmers are doing this on-site with enormous, cheap plastic bags. We have a working paper coming out soon (hopefully) on accounting for these storage responses in projections of price impacts.

Second will be new varieties that help reduce the sensitivity to drought. Mark Cooper and his team at Pioneer have some really interesting new papers here and here on their new Aquamax seeds, describing the approach behind them and how they perform across a range of conditions.

A third response is more irrigation in areas where it hasn’t been very common. I haven’t seen any great data on recent irrigation for grain crops throughout the Corn Belt, but I’ve heard some anecdotes that suggest it is becoming a fairly widespread insurance strategy to boost yields in dry years.

The fourth is a bit more speculative, but I wouldn’t be surprised to see new approaches to reducing soil evaporation, including cheap plastic mulches. There are some interesting new papers here and here testing this in China showing yield gains of 50% or more in dry years. Even in the Midwest, where farmers practice no-till and crops cover the ground fairly quickly, as much as a third of the water in the soil is commonly lost to evaporation rather than via plant uptake. That represents a big opportunity cost, and if the price of grain is high enough, and the cost of mulch is low enough, it’s possible that it will take hold as a way of raising yields in drier years. So between storage and mulching, maybe “plastics” is still the future.

Thursday, May 22, 2014

Regressing to the mean

As a general rule, people like to take credit when good things happen to them, but blame bad luck when things go wrong. This probably helps us all get through the day, feeling better about ourselves and other people we like. For example, I’d like to think that the paper rejection I got last month was bad luck, while the award I got last year was totally deserved. But anyone who pays attention to professional sports, or the stock market, knows that success in any single day or even year has a lot to do with luck. It’s not that luck is all you need, but it often makes the difference between very evenly-matched competitors. That’s why people or teams who perform particularly well in one year tend to drop back the next, a.k.a. regressing to the mean.

Take tennis. There’s clearly a skill separation between professionals and amateurs, and between the top four or five professionals and everyone else. But among the top, it’s hard to know who will win on any day, and it’s very hard to sustain a streak of victories against other top players. So even when someone like Raphael Nadal, who has some remarkable winning streaks, talks about how he got a few key bounces, he’s as much being an astute observer as a gracious winner.

Or the stockmarket. It’s well known that even the best fund managers have a hard time outperforming the market for a long time. Even if someone has beaten it five years in a row, there’s a good chance it was just luck given how many fund managers are out there. I don’t tend to watch interviews with fund managers as much as athletes, but something tells me they might be a little less inclined to turn down the credit.

So what about agriculture? It’s not a source of entertainment or income for most of us, so we don’t spend much time thinking about it, and you won’t find any posts on fivethirtyeight about it. But one thing that struck me early on working in agriculture is how farmers are just as prone to thinking they are above average as the rest of us. More specifically, if you ask why some fields around them don’t look as good, they will talk about how that farmer is lazy, has another job, doesn’t take care of his soil, etc.

As far as I can tell, this isn’t a purely academic question. Understanding how much farmers vary in their ability to consistently produce a good crop is important if you want to know the best ways of improving agricultural yields. If there really are a bunch of star performers out there, then letting them take over from the laggards by buying up their land, or training other farmers to use best-practices, could be a good source of growth in the next decade. For example, here’s a cool presentation about a new effort in India to have farmers spread videos of best practices through social networks.

There’s a fairly obvious but not perfect link to the idea of yield gaps. People who say that closing yield gaps are a big “low-hanging fruit” for yield improvement often have a vision of better agronomy being able to drastically raise yields. The link isn’t perfect, because it could be that even the best farmers in a region are underperforming, agronomically speaking, for instance if fertilizers are very expensive. And it could be that some farmers consistently out-perform not because of better management, but because they are endowed with better soils (though this can be sorted out partly by using data on soil properties). Even with these caveats, understanding how much truly better the “best” farmers are could help give a more realistic view of what could be achieved with improved agronomy.

The key here is the “how much” part of it. Nobody can argue that some farmers aren’t better than others, just as nobody can say that Warren Buffet isn’t better than an average fund manager. The question is whether this is a big opportunity, or if it’s best to focus efforts elsewhere. I’ve been trying to get a handle on the “how much” over the years by using satellites to track yields over time. I’ve used various ways of trying to display this in a simple way, but nothing was too satisfying. So let me give it another shot, based on some suggestions from Tony Fischer during a visit to Canberra.

The figures below show wheat yield anomalies (relative to average) for fields ranked in the top 10% for the first year we have satellite data (green line). Each panel is for a different area, and soils don’t vary too much within each area. Then we track those fields over the following years to see if those fields are able to consistently outperform their neighbors. Similarly we can follow fields in any other group, and I show both the bottom decile (0-10% in blue) and the fifth (40-50% in orange). The two horizontal lines show the mean yield anomaly in the first group for the year they ranked in the top, and their mean yield anomaly in all the other years. If it was all skill (or something else related to a place like the soil quality) the second line would be on top of the first. If it was all luck, the second line would be at zero.


So what’s the verdict? The top performers in the first year definitely show signs of regressing to the mean, as their mean yield drops much closer to the overall average in other years. Similarly, the worst performers “regress” back up toward the mean. But neither jump all the way back to zero, which says that some of the yield differences are persistent. In the two left panels, the anomalies are a little more than one-third the size they were in the initial year. In the right panel, the anomalies are about half their original value, indicating relatively more persistence. That makes sense since we know the right panel is a region where some farmers consistently sow too late (see this paper for more detail).

So a naive look at any snapshot of performance over a year or two would be way too optimistic about how big the exploitable yield gap might be. It’s important to remember that performance tends to regress to the mean. At the same time, there are some consistent differences that amount to roughly 10% of average yields in these regions (where mean yield is around 5-6 tons/ha). And with satellites we can pinpoint where the laggards are and target studies on what might be constraining them. At least that's the idea behind some of our current work. 


Whether other regions would look much different than the 3 examples above, I really don’t know. But it shouldn’t be too hard to find out. With the current and upcoming group of satellites, we now have the ability to track performance on individual fields in an objective way, which should serve as a useful reality check on discussions of how to improve yields in the next decade. 

Thursday, May 1, 2014

A flood of crop data, and a study on drought

As the growing season gets under way in the Midwest, we have a paper out today in Science on yield trends in the region, and the role of drought. The paper is here and a summary story here. (I'll also post a link asap to a pdf for those without access to Science here). There's also a very nice companion piece by Don Ort and Steve Long here.

The paper relates to this post a while back on yield sensitivity. The short story is that lots of things could be making crops more or less sensitive to weather, so it’s an interesting empirical question as to whether overall sensitivity is actually rising or falling. And it seems a useful question, at least in the context of estimating the costs of climate change, because if sensitivity is declining then it would mean impacts of climate change might be overstated (i.e. systems are adapting), whereas if sensitivity is growing it would mean impacts could be bigger for future cropping systems than the current ones (what’s the opposite of adapting? Is anti-adapting a word?).

The new paper shows that for corn in the US, sensitivity to drought appears to be rising. Yields now seem about twice as sensitive to hot, dry conditions as they were 20 years ago. It’s not that farmers get worse yields in drought years than they used to, but they are really making fast progress under better conditions. Progress has been much slower for hot, dry conditions, which we think may indicate farmers and their crops are pushing the limits of what’s possible in those years. Or at least what's possible with current technology (Ort and Long mention this paper about how changing canopy structure might help, and there are other ideas out there too. Maybe a topic for another day).

See the paper for more details. What I wanted to focus this post on was something that may get lost in the paper's and coverage's emphasis on the "big picture", which is how cool the dataset we worked with is. And it’s posted here or here for people to play with if they want. The animation below gives a sense for the yield data for three of the states over time (red=high yields, purple=low). There’s over a million observations of corn and soy fields around the country since 1995, and associated weather variables. (We are not allowed to release anything that could be used to identify farmers, so the fields are random samples of a larger dataset and don’t have location information beyond the county level.)



The study also relates to this previous post on how cool the MARS approach to data analysis is. It played a very helpful role in this study, and it was surprising just how well it picked up effects of weather. The figure below shows the six factors identified as important, and the functional form identified. The most important (consistent with previous work) is high vapor pressure deficits in the period of 61-90 days after sowing, which is around flowering for corn. The MARS model with just these six variables explained about 32% of the variance in the dataset (a number we somehow neglected to mention in the paper!). That’s an awful lot of variance for a field level dataset with all sorts of variation going on, including in soils and off-season weather. I’d be curious if any other simple model could do as well (the data are there if you want to have a go at it).  


(Update 1: An interview I did is here, which are things I actually said, as opposed to what I'm sometimes quoted as saying based on a reporter's memory or attempt to jumble three things into one sentence.)

Thursday, December 19, 2013

The three wise men (of agriculture)

There’s a new book coming out soon that should be of interest to many readers of this blog. It’s written by Tony Fischer, Derek Byerlee, and Greg Edmeades, and called Crop yields and global food security: will yield increases continue to feed the world?” At 550 pages, it’s not a quick read, but I found it incredibly well done and worthwhile. I’m not sure yet when the public release will be, but I’m told it will be a free download in early 2014 at the Australian Centre for International Agricultural Research website.

The book starts by laying out the premise that, in order to achieve improvements in global food security without massive land use change, yields of major crops need to increase about 1.3% of current levels per year for the next 20 years. They explain very clearly how they arrive at this number given trends in demand, with a nice comparison with other estimates. The rest of the book is then roughly in two parts. First is a detailed tour of the worlds cropping system to assess the progress over the last 20 years, and second is a discussion of the prospects for and changes needed to achieve the target yield gains.

For some, the scope of the book may be too narrow, and the authors fully recognize that yield progress is not alone enough to achieve food security. But for me, the depth is a welcome change from a lot of more superficial studies of yield changes around the world. These are three men who understand the different aspects of agriculture better than just about anyone.

The book is not just a review of available information; the first part presents a lot of new analysis as well. Tony Fischer has dug into the available data on farm and experimental plot yields in each region, with his keen eye for what constitutes a credible study or yield potential estimate (think Warren Buffet reading a financial prospectus). This effort results in an estimate of yield potential and yield gap (the difference between potential and farm yields) by mega-environment and their linear rate of change for the past 20 years. The authors then express all trends as a percentage of trend yield in 2010, which makes it much easier to compare estimates from various studies that often report in kg/ha or bushels/acre or some other unit.
There are lots of insights in the book, but here is a sample of three that seemed noteworthy:

  1. Yield potential continues to exhibit significant progress for all major crops in nearly all of their mega-environments. This is counter to many claims of stagnating progress in yield potential.
  2. Yield gaps for all major crops are declining at the global scale, and these trends can account for roughly half of farm yield increases globally since 1990. But there’s a lot of variation. I thought it interesting, for example, that maize gaps are declining much faster in regions that have adopted GM varieties (US, Brazil, Argentina) than regions that haven’t (Europe, China). Of course, this is just a simple correlation, and the authors don’t attempt to explain any differences in yield gap trends.
  3. Yield gaps for soy and wheat are both quite small at the global level. Soy in particular has narrowed yield gaps very quickly, and in all major producers it is now at ~30%, which is the lower limit of what is deemed economically feasible with today’s technology. One implication of this is that yield potential increases in soy are especially important. Another is that yield growth in soy could be set to slow, even as demand continues to rise the most of any major crop, setting up a scenario for even more rapid soy area expansion.

Any of these three points could have made for an important paper on their own, and there are others in the book as well. But to keep this post at least slightly shorter than the actual book, I won’t go on about the details. One more general point, though.  The last few years of high food prices has brought a flurry of interest to the type of material covered in this book. For those of us who think issues of food production are important in the long-term, this is generally a welcome change. But one downside is that the attention attracts all sorts of characters who like to write and say things to get attention, but don’t really know much about agriculture or food security. Sometimes they oversimplify or exaggerate. Sometimes they claim as new something that was known long ago. This book is a good example of the complete opposite of that – three very knowledgeable and insightful people homing in on the critical questions and taking an unbiased look at the evidence.


(The downside is that it is definitely not a light and breezy read. I assigned parts of it to my undergrad class, and they commented on how technical and ”dense” it was. For those looking for a lighter read, I am nearly done with Howard Buffet’s “40 Chances”. I was really impressed with that one as well – lots of interesting anecdotes and lessons from his journeys around the world to understand food security. It’s encouraging that a major philanthropist has such a good grasp of the issues and possible solutions.)