In the hopes of figuring out how to raise crop yields or
farmer incomes around the world, it would be really nice if we had a quick and
accurate way of actually measuring yields for individual fields. That has
motivated a lot of work over the years on using satellite data, and we have a paper out this week
describing another step in that direction.
As I see it there are three main ingredients needed for yield
remote sensing to be successful on a meaningful scale. One is the raw data. As Marshall’s
recent post explained, there are several new satellite data providers that
are really transforming our ability to track individual fields, even in
smallholder areas.
Second is the ability to process the data at scale. Five
years ago, for example, I would have to hire a research assistant to download
imagery, make sure it was geometrically and radiometrically calibrated (i.e.
properly lined up and in meaningful units), and then apply whatever algorithms
we had. That just didn’t scale very well, in terms of labor or on-site data
storage or processing. When a collaborator would ask “could you produce yield
estimates for my study area,” I would have to think about how many weeks or
months of work that would entail. But a couple of years ago I was introduced to
Google’s Earth Engine,
which is “a planetary-scale platform for environmental data & analysis.” In
practical terms, it means that they have a lot of geospatial data (including
all historical Landsat imagery), a lot of built-in algorithms for processing,
and an interface to run your own code on the data and visualize or save the
output. Part of why it works is that data providers, like the USGS for Landsat,
have gotten better at providing well calibrated data. Earth Engine is very
cool, and the more I’ve worked with it, the more I can see how this transforms
our ability to extract value out of data already collected.
Third, and arguably the rate-limiting step nowadays, is to
have algorithms that can translate satellite data into accurate yield
estimates. It’s easy enough to do this if you have lots of ground data to
calibrate to for a particular site, but that’s generally not scalable (unless
people get clever about crowdsourcing ground “truth”). What seemed to be
lacking was a very generic, scalable algorithm. So in the last 8 months or so
we’ve been working to develop and test one idea about how to do this. I’m
calling it a scalable satellite-based crop yield mapper (SCYM, pronounced
“skim”), and a description of it has just been published in Remote
Sensing of Environment. Conveniently, SCYM also stands for Steph Curry’s Your MVP.
The basic idea is that if you don’t have lots of ground data
to calibrate a model, why not generate lots of fake ground data? Then for
whatever combination of observations you actually have (say, for instance,
satellite images on 2 or 3 specific days, and measures of daily weather), you
can look into your fake data to see what the best fit model is to predict the desired
variable (“yield”) from the measured predictors. The paper provides more
detail, which I won’t bore readers with here. But to give a sense of the type
of output, below shows an animation of our maize yield estimates over part of
Iowa for 2008-2013. Red are high yields, blue are low.
The figure below shows a comparison between these and ground
“truth” estimates for maize, which we take from the dataset described in a previous post.
The cool thing about this is that it’s quite generic. To
illustrate that, we reran the model for soybeans, with results nearly as good as
for maize.
Hopefully this type of thing will help make faster progress
on understanding yields and farm productivity, and figuring out what actually
works for improving them. One general lesson out of this for me is that
sometimes making something really scalable requires scrapping an old approach.
We had been previously running crop models for specific sites and years, but
that wasn't possible within the Earth Engine system. I think SCYM (which
trains a regression using simulations over lots of sites and years) is more
robust than what we had, and along with the new satellite data and Earth
Engine-type systems, it might just provide a way to do yield mapping at scale.