Saturday, February 18, 2017

Targeting poverty with satellites

[This post is co-authored with Matt Davis, co-author and RA extraordinaire...]

About six months ago, our Stanford SustainLab crew had a paper in Science showing that you can make pretty good predictions about local-level economic wellbeing in Africa by combining satellite imagery with fancy tools from machine learning.  To us this was (and is) a promising finding, as it suggests a way to address the fundamental lack of data on economic outcomes in much of the developing world.   As has been widely acknowledged, these data gaps inhibit our ability to both evaluate what interventions reduce poverty and to target assistance to those who need it most.

A natural question that comes up (e.g. here) is: are these satellite-based estimates good enough to actually be useful for either evaluation or targeting?  Our original paper didn't really answer that question.  We're in the process of putting together a follow-up paper that looks at this question on an expanded country set and with an improved machine learning pipeline, but in the meantime we [by which I mean "we", meaning Matt and Neal] wanted to use some of the data from our original paper to more quantitatively explore this question.

Folks have been thinking for decades about how whether using geographic information to inform the targeting of anti-poverty programs could improve their efficiency.  The standard thought experiment goes like this.  Imagine you're a policymaker who has a fixed budget F that she can distribute as cash transfers to anyone in the country (this sort of cash transfer program happens all the time these days, it turns out).  Lets say in particular that the poverty metric that this policymakers cares about is the squared poverty gap (SPG), a common poverty measure that takes into account the distance of individuals from the poverty line.  [If you're having trouble sleeping at night:  For a given poverty line P, an individual with income Y<P has a poverty gap of P-Y and an SPG of (P-Y)^2.  The SPG in a region is the average over all individuals in that region, where anyone with Y>=P has a SPG==0.  So this measure gives a lot of weight to people far below the poverty line].  If you're goal is to reduce the SPG, you do best by giving money to the poorest person you can find until they're equal to the next poorest, giving them both enough money until they're equal to the third poorest, and so-on.

So how should the policymaker distribute the cash?  If she knows nothing about where poor people are, a naive approach would be to distribute money uniformly -- i.e. to just give each of n constituents F/n dollars.  Clearly this could be pretty inefficient, since people already above the poverty line will get money and this won't reduce the SPG.

An alternate approach, now a few decades old in the economics literature, has been to construct "small area estimates" (SAE) of poverty by combining a detailed household survey with a less detailed but more geographically-comprehensive census.  The idea is that while only the household survey measures your outcome of interest (typically consumption expenditure), there are a small set of questions common to both the detailed household survey and the census (call these X).  These are typically questions about respondent age, gender, education, and perhaps a few basic questions on assets.  So using the household survey you can fit a model Y = f(X), which tells you how the Xs map into consumption expenditure (your outcome of interest), and then using the same Xs in the census and your model f(X) to predict consumption expenditure for everyone in the census.  Then you can aggregate these to any level of interest (e.g. village or district), and use them to potentially inform your cash transfers.  This has been explored in a number of papers, e.g. here and here, and apparently has been used to inform policy in a number of settings.

Our purpose here is to compare a targeting approach that uses our satellite-based estimates to either the naive (uniform) transfer or a transfer that's informed by SAE estimates.   To actually evaluate these approaches against each other, we are going to just use the household survey data, aggregated to the cluster (village) level.  In particular, we estimate both the SAE and the satellite-based model on a subset of our household survey data in each country, make predictions for the remainder of the data that the models have not seen, and in this holdout sample, evaluate for a fixed budget the reduction in the SPG you'd get if you allocated using either the naive, SAE, or satellite-based model.  The allocation rule for SAE and satellites is the one described above:   giving money to the poorest village until equal to the next poorest, giving them both enough money until they're equal to the third poorest, and so-on.

Below is what we get, with the figure showing how much reduction in SPG you get from each targeting scheme under increasing program budgets.  The table summarizes the results, showing the cross-validated R2 for the SAE and satellite-features models (the goodness-of-fits from the models that we then use to make predictions that are then used in targeting), and the amount of money each approach saves relative to the uniform transfer to achieve a 50% decline in SPG.

R-squared% Reduction in budget to achieve 50% decline in SPG

What do we learn from these simulations?  First, geographical targeting appears to save you money relative to the naive transfer.  You achieve a 50% reduction in SPG for 7-40% less budget than a naive transfer when you use either SAE or satellites to target.  Second, and not surprisingly, when the satellite model and the SAE model fit the data roughly equally well (e.g. Malawi, Nigeria), they deliver similar savings relative to a uniform transfer.  But the amount of budget that you save by using SAE or satellites to target transfers can differ even for similar R2.  Compare Malawi to Nigeria:  the targeted approaches help a lot more in Nigeria than in Malawi, which is consistent with Malawi having poor people all over the place (e.g. see the maps we produced for Malawi and Nigeria)  including in somewhat better-off vilages, which in turn makes targeting on the village mean not as helpful.  Third, SAE leads to more efficient targeting in the two countries where the SAE model has more predictive power -- Tanzania and Uganda.

We're somewhat biased of course, but this to us is fairly promising from a satellite perspective.   First, these SAE estimates are probably and upper bound on actual SAE performance, since it's very rarely the case that you have a household survey and a census in the same year, and we've been generous in the variables we included to calculate the SAE (some of which are not be available in many censuses).  Second, since many countries lack either a census or a household survey, it's not clear whether we can use SAE in these countries, whereas in our Science paper we showed decent out-of-country fits for the satellite-based approach.  Third, we're working on improvements to the satellite-based estimates and anticipate meaningfully higher R2 relative to these benchmarks.  And finally, and perhaps most importantly, the satellite-based approach is going to be incredibly cheap to implement relative to SAE in areas where surveys don't already exist.  So you might be willing to trade off some loss in targeting performance given the low expense of developing the targeting tool. 

So our tentative conclusion is that satellites might have something to offer here.  They're probably going to be even more useful when combined with other approaches and data -- something that we are exploring in ongoing work. 

Wednesday, February 15, 2017

Some scintillating satellite studies

We’ve had a couple of papers come out that might be of interest to readers of the blog. Both relate to using satellites to measure crop yields for individual fields. This type of data is very hard to come by from ground-based sources. In many places, especially poor ones, farmers simply don’t keep records on production on a field by field basis. In other places, like the U.S., individual farmers often have data, and government agencies or private companies often have data for lots of individuals, but it’s typically not made available to public researchers.

All of this is bad for science. The less data you have on yields, the less we can understand how to improve yields or reduce environmental impacts of crop production. We now know exactly how much a catcher’s glove moves each time they frame a pitch, exactly how well every NBA player shoots from every spot on the floor, and exactly how fast LeBron James flops to ground when he is touched by an opposing player. And all of this knowledge has helped improve team strategies and player performance.

But for agriculture, with all the talk of “big data”, we are still often shooting in the dark. And in some cases it’s getting worse, such as this recent post at farmdoc daily about how the farmer response rate to area and production surveys is fading faster than my hairline (but not quite as fast as Sol’s):

As we’ve discussed before on this blog (e.g., here and here), satellites offer a potential workaround. They won’t be perfect, but anyone who has done phone or field surveys, or even crop cuts within fields, knows that they have problems too. Plus, satellites are getting cheaper and better, and it’s also becoming easier to work with past satellite data thanks to platforms like Google Earth Engine.

Now to the studies. Both employ what we are calling SCYM, which stands either for Scalable Crop Yield Mapper or Steph Curry’s Your MVP, depending on the context. In the first context, the basic idea is to eliminate any reliance on ground calibration by using simulations to calibrate regression models. 

The first study (joint with George Azzari) applied SCYM to Landsat data in the Corn Belt in the U.S. for 2000-2015, and then analyzed the resulting patterns and trends for maize and soybean yields. Here’s the link to the paper, and a brief animation we had made to accompany the story. Also, we’ve made the maps of average yields available for people to visualize and inspect here. Below is a figure that summarizes what I thought was the most interesting finding: that maize yield distributions are getting wider over time.

Overall maize yields are still increasing, but mainly this is driven by growth in high yielding areas. This is true at the scale of counties, and also within individual fields. The paper discusses reasons why (e.g. uptake of precision ag) and possible implications. Maybe I’ll return to it in a future post. A short side note is that since that paper was written we’ve expanded the dataset to a nine state area, and made some useful tweaks to SCYM to improve performance.

The second paper (joint with Marshall) looks at yield mapping with some of the newer high-res sensors in smallholder systems. In this case, we looked at two years of data in Western Kenya, using detailed surveys with hundreds of farmers to test our yield estimates, which were derived using 1m resolution imagery from Terra Bella. The results were pretty good, especially when you consider only fields above half an acre, where problems with self-reported yields are not as big. Just like in the US, the satellites reveal a lot of yield heterogeneity both within and between fields (the top image shows the true-color image, the bottom shows the derived yield estimates for maize fields):

One interesting result is that in many ways it looks like the satellite estimates are no less accurate than the (expensive) field surveys. For example, when we correlate yields with inputs that we think should affect yields, like fertilizer or seed density, we see roughly equal correlations when using satellite or self-report yields.

None of this is to say that satellite-based yields are perfect. But that was never the goal. The goal is to get insights into what factors are (or aren’t) important for yields in different settings. Both studies show how satellites can provide insights that match or even go beyond what is possible with traditional yield measures. We hope to continue to test and apply these approaches in new settings, and plan to make the datasets available to researchers, probably at this site.  

Tuesday, February 14, 2017

Food fights

Eoin McGuirk and I have a new NBER working paper out that (we argue) sheds important new light on the economic origins of conflict in Africa.  This is a topic that has received a lot of scrutiny over the years, but one which has remained tricky to sort out empirically, as conflict could be both a cause and a consequence of deteriorating economic conditions.

Our approach is to utilize plausibly random variation in local-level food prices induced by (1) variation in global food prices and (2) variation in what crops are grown and consumed in different regions across Africa.  There's a lot of variation in both, which is helpful.  For instance, here's a plot from the paper of where some of the main crops in Africa are grown, with the colors showing the percentage of each cell planted to different crops (using the SAGE data):

Clearly, world price movements for maize are going to have geographically distinct impacts from global price movements in rice or coffee.

We find strong evidence that changes in economic conditions induced by food price movements have substantial effects on conflict across the continent.  Eoin wrote a nice summary of the paper for the IGC blog, which I shamelessly copy below:


Food Fights: Food Prices and Civil Conflict in Africa

Over one million people have died from direct exposure to civil conflict in Africa since the end of the Cold War alone. What causal role do economic factors play in this tragedy? We study the effect of world food prices on violence across the continent, finding that the impact depends on both the type of conflict and whether a region mainly produces or consumes food.

The role that economic conditions have in shaping conflict risk is notoriously difficult to identify. Civil conflict can cause enormous economic damage by destroying human and physical capital, so any correlations between economic conditions and conflict could be explained by reverse causality. Moreover, the type of institutional environment in which conflict occurs is likely to be the type that also depresses economic activity, raising the additional concern of omitted factors that confound the analysis.

Identifying economic effects using high-resolution data on prices and conflict

To overcome these challenges, we study the impact of food price shocks on local conflict using panel data constructed at the level of a 0.5 x 0.5-degree subnational grid cell (roughly 55km x 55km at the equator) across the African continent. Studying conflict at this local level has several advantages over the more traditional country-level approach. Since food crops represent a higher average share of both production and consumption in Africa than in any other region, a price spike can be expected to significantly raise income for agricultural producers and reduce real income for consumers at the same time. With the help of spatial data on where specific crops are grown and consumed, we are able to separate these producer and consumer effects within countries. Moreover, we can confidently dismiss the potentially confounding concern that African civil conflict affects world cereal prices, given that the entire continent accounts for less than 6% of global production over our sample period. The recent emergence of highly detailed data on African conflict also allows us to consider the effects of price movements on different varieties of violence within countries.

Factor conflict: for permanent control of land

We first look at the type of large-scale armed conflict that tends to be the subject of most attention among researchers. We label this “factor conflict”, as these battles typically relate to the permanent control of land. Our results identify the differences that one would expect to see between places with crop agriculture and those without. A rise in food prices of one standard deviation leads to a 17% decline in conflict in regions where food production is common and farmers benefit from higher prices, and a 9% increase in conflict in populated areas with no food production – areas where consumers are hurt by rising food prices.

This finding is consistent with the idea that opportunity costs are important drivers of violence: when food prices are high, it makes more sense for producers to harvest crops; when they are low, farmers must look for other ways to make ends meet. At the same time, high prices can push some of the poorest consumers toward joining armed conflict groups in order to maintain a living wage; a fall in prices, on the other hand, pushes their real wages up and allows them to return to the productive sector. In both cases, real income is a fundamental factor in the decision to fight.

Output conflict: the taking of movable goods

The second type of violence we consider relates not to the control of territory but to the appropriation of output. This “output conflict”, as we call it, is more transitory, more atomistic, and less organised than factor conflict. The goal of output conflict is simply to take food, money, or other movable goods. It can appear in our dataset as looting, raiding, rioting, or interpersonal theft.

As with factor conflict, higher food prices cause an increase in output conflict in areas without crop agriculture (e.g., urban centres). This finding is consistent with the idea that declining real wages push some consumers towards appropriation in order to make ends meet. The more interesting distinction, however, comes in areas with crop agriculture. In contrast to factor conflict, we see clear evidence that output conflict increases with rising food prices in regions that produce the associated crops. What can explain this difference? Even in food-producing regions, not everyone makes a living from farming. For net consumers in these regions, higher food prices simultaneously (i) lower real income, as a given wage buys less food, and (ii) increase the value of appropriable output, as nearby farmers have assets that have appreciated in value. These forces provide incentives for consumers to engage in output conflict when prices rise.

We corroborate this finding using a large multi-country survey dataset with information on self-reported crime. Commercial farmers who grow food crops in food-producing regions are more likely to report being victims of theft and physical assault following a spike in food prices. Interestingly, farmers who instead grow cash crops such as cotton or coffee do not report being victimised following an equivalent price spike. In this case, higher prices are unlikely to have reduced real income for consumers, who tend not to purchase cash crops in order to get by.

Key relationships and policy implications

In sum, the effect of food prices on conflict in Africa depends on both the sub-national location and the type of conflict. Global price increases lower large-scale factor conflict in agricultural areas, and increase it in urban areas. For output conflict, the effect is more straightforward: higher prices will increase the incidence of conflict in both rural and urban areas. Overall, our results provide clear evidence that much of the observed conflict in Africa has economic roots. Our approach allows us to isolate the role of changing economic conditions from other factors that might affect conflict risk (such as governance), and to show that these changes can have quantitatively large effects on different types of conflict.

Our findings suggest that policies to minimise conflict risk in the face of food-price shocks must be tailored to sub-national areas. Incentives to farm rather than fight can reduce the risk of factor conflict in rural areas when global prices are critically low. Similarly, policy makers ought to be prepared for potential instability in urban areas when global prices are critically high. Potential policy responses include guaranteed prices for producers and well-timed releases of buffer-stock food for consumers when global prices reach critical levels in either direction. A complementary approach could take the form of local “workfare” programmes—welfare conditional upon work—that shift from urban to rural regions as prices fall, and vice versa.