Wednesday, February 15, 2017

Some scintillating satellite studies

We’ve had a couple of papers come out that might be of interest to readers of the blog. Both relate to using satellites to measure crop yields for individual fields. This type of data is very hard to come by from ground-based sources. In many places, especially poor ones, farmers simply don’t keep records on production on a field by field basis. In other places, like the U.S., individual farmers often have data, and government agencies or private companies often have data for lots of individuals, but it’s typically not made available to public researchers.

All of this is bad for science. The less data you have on yields, the less we can understand how to improve yields or reduce environmental impacts of crop production. We now know exactly how much a catcher’s glove moves each time they frame a pitch, exactly how well every NBA player shoots from every spot on the floor, and exactly how fast LeBron James flops to ground when he is touched by an opposing player. And all of this knowledge has helped improve team strategies and player performance.

But for agriculture, with all the talk of “big data”, we are still often shooting in the dark. And in some cases it’s getting worse, such as this recent post at farmdoc daily about how the farmer response rate to area and production surveys is fading faster than my hairline (but not quite as fast as Sol’s):

As we’ve discussed before on this blog (e.g., here and here), satellites offer a potential workaround. They won’t be perfect, but anyone who has done phone or field surveys, or even crop cuts within fields, knows that they have problems too. Plus, satellites are getting cheaper and better, and it’s also becoming easier to work with past satellite data thanks to platforms like Google Earth Engine.

Now to the studies. Both employ what we are calling SCYM, which stands either for Scalable Crop Yield Mapper or Steph Curry’s Your MVP, depending on the context. In the first context, the basic idea is to eliminate any reliance on ground calibration by using simulations to calibrate regression models. 

The first study (joint with George Azzari) applied SCYM to Landsat data in the Corn Belt in the U.S. for 2000-2015, and then analyzed the resulting patterns and trends for maize and soybean yields. Here’s the link to the paper, and a brief animation we had made to accompany the story. Also, we’ve made the maps of average yields available for people to visualize and inspect here. Below is a figure that summarizes what I thought was the most interesting finding: that maize yield distributions are getting wider over time.

Overall maize yields are still increasing, but mainly this is driven by growth in high yielding areas. This is true at the scale of counties, and also within individual fields. The paper discusses reasons why (e.g. uptake of precision ag) and possible implications. Maybe I’ll return to it in a future post. A short side note is that since that paper was written we’ve expanded the dataset to a nine state area, and made some useful tweaks to SCYM to improve performance.

The second paper (joint with Marshall) looks at yield mapping with some of the newer high-res sensors in smallholder systems. In this case, we looked at two years of data in Western Kenya, using detailed surveys with hundreds of farmers to test our yield estimates, which were derived using 1m resolution imagery from Terra Bella. The results were pretty good, especially when you consider only fields above half an acre, where problems with self-reported yields are not as big. Just like in the US, the satellites reveal a lot of yield heterogeneity both within and between fields (the top image shows the true-color image, the bottom shows the derived yield estimates for maize fields):

One interesting result is that in many ways it looks like the satellite estimates are no less accurate than the (expensive) field surveys. For example, when we correlate yields with inputs that we think should affect yields, like fertilizer or seed density, we see roughly equal correlations when using satellite or self-report yields.

None of this is to say that satellite-based yields are perfect. But that was never the goal. The goal is to get insights into what factors are (or aren’t) important for yields in different settings. Both studies show how satellites can provide insights that match or even go beyond what is possible with traditional yield measures. We hope to continue to test and apply these approaches in new settings, and plan to make the datasets available to researchers, probably at this site.  


  1. Thank you for this excellent post! I have a question: why are you using GCVI instead of WDRVI for relative high LAI values?

  2. Thanks for comment. It was mainly because gcvi captures some effects of different chlorophyll content, which is useful in africa. in US we originally used WDRVI but switched to GCVI, which worked slightly better