Monday, May 8, 2017

Chat with Nature editor Michael White about publishing, interdisciplinary research, and the state of climate economics

I recently had a long discussion with Dr. Michael White (the editor at Nature who handles climate science and economics) about a whole bunch of issues that probably interest g-feeders: working in an interdisciplinary space, deciding when to publish in econ vs. general interest journals, what we're all doing in climate econ these days, and the things he thought were funny (as a non-economist) about the AEA meetings. (There's also a few stories about my oddball childhood in there for Marshall to laugh at.)

You can listen here (1 hr).

He thought he was interviewing me for his podcast, but really I was interviewing him for our blog;) The session was an episode of Michael's podcast Forecast, which he produces to cover climate science.  Michael has been a regular attendee at our climate economics lunch and it's been helpful to get his take on the recent developments in our field.

Saturday, May 6, 2017

Are the curious trends in despair and diets related?

There’s a new working paper out by Anne Case and Angus Deaton on one of the most curious (and saddest) trends in America – mortality rates for whites have been rising. There have been various stories about these trends, such as this one that first turned me onto it. It’s clear that the proximate causes are an increase in “deaths of despair,” namely suicides, drugs, and alcohol. It’s also clear that self-reported mental health has been declining in this group. But why?

Any explanation has to account for the fact that the same trends aren’t seen in other racial groups, even though many of them have lower incomes (see figure below from Case and Deaton):

It should also account for the fact that the same trend isn’t seen in other predominantly white countries:

One explanation that seems to have gained most traction is that whites “lost the narrative of their lives.” That is, maybe rising economic inequality and other economic trends have affected all groups, but whites expected more. To me this seems plausible but not really convincing.

Ok, so let me offer another theory. I’ll first say that I think it’s a bit off the wall, but I don’t think I’m going crazy (despite Marshall’s frequent hinting that I am). The idea basically stems from another body of literature that I’ve recently been exploring, mainly because I was interested in allergies. Yeah, allergies. Specifically, a bunch of people I know have sworn that their allergies were fixed by eliminating certain foods, and given that some people in my family have bad seasonal allergies, I decided to look into it.

It turns out that wheat is thought by many to trigger inflammation and allergies. But what’s relevant here is that it’s also thought to affect mental health. More than that, there are actually clinical studies like this one showing that depression increases with gluten intake. There are only 22 subjects in that study, which seems low to me but obviously I don’t do that sort of work. A good summary of scientific and plenty of non-scientific views on the topic can be found in this article. Incredibly, there was even a study in the 1960s showing how hospital admission rates for schizophrenia varied up and down with gluten grain rations during World War 2.

So what’s the connection to the trends in deaths of despair? Well the striking thing to me is that wheat effects are generally only seen in white non-hispanics. Celiac disease, for instance, is much lower in other racial groups. Second, it’s apparently known that celiac has been rising over time, which is thought to indicate increased exposure (of all people) to gluten early in life. And the trends are most apparent in whites, such as seen in the figure below from this paper.

Just to be clear, I realize this is mostly speculation. Not only is this not my area of expertise, but I don’t have any data on the regional trends in gluten or wheat intake in the U.S. to compare to the regional trends in death. I’m not even sure that such data exist. It seems that studies like this one looking at trends in gluten consumption just assume the gluten content of foods is fixed, but it also seems a lot of products now have gluten added to make them rise quicker and better. (Some blame the obsession with whole grain foods, which don't rise as quickly.) If anyone knows of good data on trends in consumption, let me know. It would also be interesting to know if they add less gluten in other countries, where mortality rates haven’t risen.

(As an aside: there’s also a recent study looking at wheat and obesity in cross-section. Apparently country obesity rates are related to wheat availability, but not much else.)

Also to be clear, I still like wheat. Maybe having spent most of my career studying wheat producing systems has made me sympathetic. Or maybe it’s the fact that it has sustained civilization since the dawn of agriculture. But I think it’s possible that we recently have gone overboard in how much is eaten or, more specifically, in how much gluten is added to processed food in this country. And even if there’s only a small chance it’s partly behind the trends of despair (which aren’t just causing mortality, but all sorts of other damage), it’s worth looking into.

Wednesday, April 26, 2017

Better than the real thing

Why read a stack of research papers when you can watch a 3 min cartoon?

Thanks to Eric Roston of Bloomberg for putting this together.

Saturday, February 18, 2017

Targeting poverty with satellites

[This post is co-authored with Matt Davis, co-author and RA extraordinaire...]

About six months ago, our Stanford SustainLab crew had a paper in Science showing that you can make pretty good predictions about local-level economic wellbeing in Africa by combining satellite imagery with fancy tools from machine learning.  To us this was (and is) a promising finding, as it suggests a way to address the fundamental lack of data on economic outcomes in much of the developing world.   As has been widely acknowledged, these data gaps inhibit our ability to both evaluate what interventions reduce poverty and to target assistance to those who need it most.

A natural question that comes up (e.g. here) is: are these satellite-based estimates good enough to actually be useful for either evaluation or targeting?  Our original paper didn't really answer that question.  We're in the process of putting together a follow-up paper that looks at this question on an expanded country set and with an improved machine learning pipeline, but in the meantime we [by which I mean "we", meaning Matt and Neal] wanted to use some of the data from our original paper to more quantitatively explore this question.

Folks have been thinking for decades about how whether using geographic information to inform the targeting of anti-poverty programs could improve their efficiency.  The standard thought experiment goes like this.  Imagine you're a policymaker who has a fixed budget F that she can distribute as cash transfers to anyone in the country (this sort of cash transfer program happens all the time these days, it turns out).  Lets say in particular that the poverty metric that this policymakers cares about is the squared poverty gap (SPG), a common poverty measure that takes into account the distance of individuals from the poverty line.  [If you're having trouble sleeping at night:  For a given poverty line P, an individual with income Y<P has a poverty gap of P-Y and an SPG of (P-Y)^2.  The SPG in a region is the average over all individuals in that region, where anyone with Y>=P has a SPG==0.  So this measure gives a lot of weight to people far below the poverty line].  If you're goal is to reduce the SPG, you do best by giving money to the poorest person you can find until they're equal to the next poorest, giving them both enough money until they're equal to the third poorest, and so-on.

So how should the policymaker distribute the cash?  If she knows nothing about where poor people are, a naive approach would be to distribute money uniformly -- i.e. to just give each of n constituents F/n dollars.  Clearly this could be pretty inefficient, since people already above the poverty line will get money and this won't reduce the SPG.

An alternate approach, now a few decades old in the economics literature, has been to construct "small area estimates" (SAE) of poverty by combining a detailed household survey with a less detailed but more geographically-comprehensive census.  The idea is that while only the household survey measures your outcome of interest (typically consumption expenditure), there are a small set of questions common to both the detailed household survey and the census (call these X).  These are typically questions about respondent age, gender, education, and perhaps a few basic questions on assets.  So using the household survey you can fit a model Y = f(X), which tells you how the Xs map into consumption expenditure (your outcome of interest), and then using the same Xs in the census and your model f(X) to predict consumption expenditure for everyone in the census.  Then you can aggregate these to any level of interest (e.g. village or district), and use them to potentially inform your cash transfers.  This has been explored in a number of papers, e.g. here and here, and apparently has been used to inform policy in a number of settings.

Our purpose here is to compare a targeting approach that uses our satellite-based estimates to either the naive (uniform) transfer or a transfer that's informed by SAE estimates.   To actually evaluate these approaches against each other, we are going to just use the household survey data, aggregated to the cluster (village) level.  In particular, we estimate both the SAE and the satellite-based model on a subset of our household survey data in each country, make predictions for the remainder of the data that the models have not seen, and in this holdout sample, evaluate for a fixed budget the reduction in the SPG you'd get if you allocated using either the naive, SAE, or satellite-based model.  The allocation rule for SAE and satellites is the one described above:   giving money to the poorest village until equal to the next poorest, giving them both enough money until they're equal to the third poorest, and so-on.

Below is what we get, with the figure showing how much reduction in SPG you get from each targeting scheme under increasing program budgets.  The table summarizes the results, showing the cross-validated R2 for the SAE and satellite-features models (the goodness-of-fits from the models that we then use to make predictions that are then used in targeting), and the amount of money each approach saves relative to the uniform transfer to achieve a 50% decline in SPG.

R-squared% Reduction in budget to achieve 50% decline in SPG

What do we learn from these simulations?  First, geographical targeting appears to save you money relative to the naive transfer.  You achieve a 50% reduction in SPG for 7-40% less budget than a naive transfer when you use either SAE or satellites to target.  Second, and not surprisingly, when the satellite model and the SAE model fit the data roughly equally well (e.g. Malawi, Nigeria), they deliver similar savings relative to a uniform transfer.  But the amount of budget that you save by using SAE or satellites to target transfers can differ even for similar R2.  Compare Malawi to Nigeria:  the targeted approaches help a lot more in Nigeria than in Malawi, which is consistent with Malawi having poor people all over the place (e.g. see the maps we produced for Malawi and Nigeria)  including in somewhat better-off vilages, which in turn makes targeting on the village mean not as helpful.  Third, SAE leads to more efficient targeting in the two countries where the SAE model has more predictive power -- Tanzania and Uganda.

We're somewhat biased of course, but this to us is fairly promising from a satellite perspective.   First, these SAE estimates are probably and upper bound on actual SAE performance, since it's very rarely the case that you have a household survey and a census in the same year, and we've been generous in the variables we included to calculate the SAE (some of which are not be available in many censuses).  Second, since many countries lack either a census or a household survey, it's not clear whether we can use SAE in these countries, whereas in our Science paper we showed decent out-of-country fits for the satellite-based approach.  Third, we're working on improvements to the satellite-based estimates and anticipate meaningfully higher R2 relative to these benchmarks.  And finally, and perhaps most importantly, the satellite-based approach is going to be incredibly cheap to implement relative to SAE in areas where surveys don't already exist.  So you might be willing to trade off some loss in targeting performance given the low expense of developing the targeting tool. 

So our tentative conclusion is that satellites might have something to offer here.  They're probably going to be even more useful when combined with other approaches and data -- something that we are exploring in ongoing work. 

Wednesday, February 15, 2017

Some scintillating satellite studies

We’ve had a couple of papers come out that might be of interest to readers of the blog. Both relate to using satellites to measure crop yields for individual fields. This type of data is very hard to come by from ground-based sources. In many places, especially poor ones, farmers simply don’t keep records on production on a field by field basis. In other places, like the U.S., individual farmers often have data, and government agencies or private companies often have data for lots of individuals, but it’s typically not made available to public researchers.

All of this is bad for science. The less data you have on yields, the less we can understand how to improve yields or reduce environmental impacts of crop production. We now know exactly how much a catcher’s glove moves each time they frame a pitch, exactly how well every NBA player shoots from every spot on the floor, and exactly how fast LeBron James flops to ground when he is touched by an opposing player. And all of this knowledge has helped improve team strategies and player performance.

But for agriculture, with all the talk of “big data”, we are still often shooting in the dark. And in some cases it’s getting worse, such as this recent post at farmdoc daily about how the farmer response rate to area and production surveys is fading faster than my hairline (but not quite as fast as Sol’s):

As we’ve discussed before on this blog (e.g., here and here), satellites offer a potential workaround. They won’t be perfect, but anyone who has done phone or field surveys, or even crop cuts within fields, knows that they have problems too. Plus, satellites are getting cheaper and better, and it’s also becoming easier to work with past satellite data thanks to platforms like Google Earth Engine.

Now to the studies. Both employ what we are calling SCYM, which stands either for Scalable Crop Yield Mapper or Steph Curry’s Your MVP, depending on the context. In the first context, the basic idea is to eliminate any reliance on ground calibration by using simulations to calibrate regression models. 

The first study (joint with George Azzari) applied SCYM to Landsat data in the Corn Belt in the U.S. for 2000-2015, and then analyzed the resulting patterns and trends for maize and soybean yields. Here’s the link to the paper, and a brief animation we had made to accompany the story. Also, we’ve made the maps of average yields available for people to visualize and inspect here. Below is a figure that summarizes what I thought was the most interesting finding: that maize yield distributions are getting wider over time.

Overall maize yields are still increasing, but mainly this is driven by growth in high yielding areas. This is true at the scale of counties, and also within individual fields. The paper discusses reasons why (e.g. uptake of precision ag) and possible implications. Maybe I’ll return to it in a future post. A short side note is that since that paper was written we’ve expanded the dataset to a nine state area, and made some useful tweaks to SCYM to improve performance.

The second paper (joint with Marshall) looks at yield mapping with some of the newer high-res sensors in smallholder systems. In this case, we looked at two years of data in Western Kenya, using detailed surveys with hundreds of farmers to test our yield estimates, which were derived using 1m resolution imagery from Terra Bella. The results were pretty good, especially when you consider only fields above half an acre, where problems with self-reported yields are not as big. Just like in the US, the satellites reveal a lot of yield heterogeneity both within and between fields (the top image shows the true-color image, the bottom shows the derived yield estimates for maize fields):

One interesting result is that in many ways it looks like the satellite estimates are no less accurate than the (expensive) field surveys. For example, when we correlate yields with inputs that we think should affect yields, like fertilizer or seed density, we see roughly equal correlations when using satellite or self-report yields.

None of this is to say that satellite-based yields are perfect. But that was never the goal. The goal is to get insights into what factors are (or aren’t) important for yields in different settings. Both studies show how satellites can provide insights that match or even go beyond what is possible with traditional yield measures. We hope to continue to test and apply these approaches in new settings, and plan to make the datasets available to researchers, probably at this site.  

Tuesday, February 14, 2017

Food fights

Eoin McGuirk and I have a new NBER working paper out that (we argue) sheds important new light on the economic origins of conflict in Africa.  This is a topic that has received a lot of scrutiny over the years, but one which has remained tricky to sort out empirically, as conflict could be both a cause and a consequence of deteriorating economic conditions.

Our approach is to utilize plausibly random variation in local-level food prices induced by (1) variation in global food prices and (2) variation in what crops are grown and consumed in different regions across Africa.  There's a lot of variation in both, which is helpful.  For instance, here's a plot from the paper of where some of the main crops in Africa are grown, with the colors showing the percentage of each cell planted to different crops (using the SAGE data):

Clearly, world price movements for maize are going to have geographically distinct impacts from global price movements in rice or coffee.

We find strong evidence that changes in economic conditions induced by food price movements have substantial effects on conflict across the continent.  Eoin wrote a nice summary of the paper for the IGC blog, which I shamelessly copy below:


Food Fights: Food Prices and Civil Conflict in Africa

Over one million people have died from direct exposure to civil conflict in Africa since the end of the Cold War alone. What causal role do economic factors play in this tragedy? We study the effect of world food prices on violence across the continent, finding that the impact depends on both the type of conflict and whether a region mainly produces or consumes food.

The role that economic conditions have in shaping conflict risk is notoriously difficult to identify. Civil conflict can cause enormous economic damage by destroying human and physical capital, so any correlations between economic conditions and conflict could be explained by reverse causality. Moreover, the type of institutional environment in which conflict occurs is likely to be the type that also depresses economic activity, raising the additional concern of omitted factors that confound the analysis.

Identifying economic effects using high-resolution data on prices and conflict

To overcome these challenges, we study the impact of food price shocks on local conflict using panel data constructed at the level of a 0.5 x 0.5-degree subnational grid cell (roughly 55km x 55km at the equator) across the African continent. Studying conflict at this local level has several advantages over the more traditional country-level approach. Since food crops represent a higher average share of both production and consumption in Africa than in any other region, a price spike can be expected to significantly raise income for agricultural producers and reduce real income for consumers at the same time. With the help of spatial data on where specific crops are grown and consumed, we are able to separate these producer and consumer effects within countries. Moreover, we can confidently dismiss the potentially confounding concern that African civil conflict affects world cereal prices, given that the entire continent accounts for less than 6% of global production over our sample period. The recent emergence of highly detailed data on African conflict also allows us to consider the effects of price movements on different varieties of violence within countries.

Factor conflict: for permanent control of land

We first look at the type of large-scale armed conflict that tends to be the subject of most attention among researchers. We label this “factor conflict”, as these battles typically relate to the permanent control of land. Our results identify the differences that one would expect to see between places with crop agriculture and those without. A rise in food prices of one standard deviation leads to a 17% decline in conflict in regions where food production is common and farmers benefit from higher prices, and a 9% increase in conflict in populated areas with no food production – areas where consumers are hurt by rising food prices.

This finding is consistent with the idea that opportunity costs are important drivers of violence: when food prices are high, it makes more sense for producers to harvest crops; when they are low, farmers must look for other ways to make ends meet. At the same time, high prices can push some of the poorest consumers toward joining armed conflict groups in order to maintain a living wage; a fall in prices, on the other hand, pushes their real wages up and allows them to return to the productive sector. In both cases, real income is a fundamental factor in the decision to fight.

Output conflict: the taking of movable goods

The second type of violence we consider relates not to the control of territory but to the appropriation of output. This “output conflict”, as we call it, is more transitory, more atomistic, and less organised than factor conflict. The goal of output conflict is simply to take food, money, or other movable goods. It can appear in our dataset as looting, raiding, rioting, or interpersonal theft.

As with factor conflict, higher food prices cause an increase in output conflict in areas without crop agriculture (e.g., urban centres). This finding is consistent with the idea that declining real wages push some consumers towards appropriation in order to make ends meet. The more interesting distinction, however, comes in areas with crop agriculture. In contrast to factor conflict, we see clear evidence that output conflict increases with rising food prices in regions that produce the associated crops. What can explain this difference? Even in food-producing regions, not everyone makes a living from farming. For net consumers in these regions, higher food prices simultaneously (i) lower real income, as a given wage buys less food, and (ii) increase the value of appropriable output, as nearby farmers have assets that have appreciated in value. These forces provide incentives for consumers to engage in output conflict when prices rise.

We corroborate this finding using a large multi-country survey dataset with information on self-reported crime. Commercial farmers who grow food crops in food-producing regions are more likely to report being victims of theft and physical assault following a spike in food prices. Interestingly, farmers who instead grow cash crops such as cotton or coffee do not report being victimised following an equivalent price spike. In this case, higher prices are unlikely to have reduced real income for consumers, who tend not to purchase cash crops in order to get by.

Key relationships and policy implications

In sum, the effect of food prices on conflict in Africa depends on both the sub-national location and the type of conflict. Global price increases lower large-scale factor conflict in agricultural areas, and increase it in urban areas. For output conflict, the effect is more straightforward: higher prices will increase the incidence of conflict in both rural and urban areas. Overall, our results provide clear evidence that much of the observed conflict in Africa has economic roots. Our approach allows us to isolate the role of changing economic conditions from other factors that might affect conflict risk (such as governance), and to show that these changes can have quantitatively large effects on different types of conflict.

Our findings suggest that policies to minimise conflict risk in the face of food-price shocks must be tailored to sub-national areas. Incentives to farm rather than fight can reduce the risk of factor conflict in rural areas when global prices are critically low. Similarly, policy makers ought to be prepared for potential instability in urban areas when global prices are critically high. Potential policy responses include guaranteed prices for producers and well-timed releases of buffer-stock food for consumers when global prices reach critical levels in either direction. A complementary approach could take the form of local “workfare” programmes—welfare conditional upon work—that shift from urban to rural regions as prices fall, and vice versa.

Friday, January 27, 2017

Measuring temperature from space

[We are going to try to bring G-FEED out of its fall/winter hibernation with a few posts on papers that various of us have coming out.  Below is the first of these, written by our excellent post-doc Sam Heft-Neal.]

Can we use satellites instead of ground stations to measure temperature and estimate climate response functions?
Guest post by Sam Heft-Neal

Temperature data are commonly used to estimate the sensitivity of societally relevant outcomes to ongoing climate changes. In order to derive a temperature measure, researchers typically interpolate nearby ground station data or use one of the publicly available global gridded data products put out by groups like CRU or UDel. However, Max and Sol and Wolfram and Adam have shown that there are a number of pitfalls associated with using interpolated station data to measure temperature in areas with sparse coverage. Temperature estimates far from weather stations, or in areas where stations go on and offline frequently, are subject to substantial measurement error that can bias impact estimates. This problem is particularly acute in tropical regions, where many of the largest climate impacts are expected to occur, but where we still have a limited understanding of key climate/society relationships. 

David, Marshall and I have a new paper that evaluates whether using temperature measures derived only from satellites can be used to estimate climate response functions. Several satellites measure surface emission of thermal energy, which can be converted into estimates of skin surface temperature (Ts) – a product that MODIS has provided at 1km resolution daily for nearly a decade. Past studies have evaluated agreement between MODIS and air temperature measured by weather stations (Ta) on daily time scales, often finding weak correlations for daytime temperatures because factors other than Ta, such as cloudiness and soil moisture, can affect Ts. However, these results could be of limited relevance for our purpose, since the climate response functions we care about usually rely on year-to-year variations in seasonally aggregated measures of temperature exposure, and correlations between station and satellite data tend to increase as the period of aggregation lengthens.
In order to test whether Ts could be used in place of Ta to study temperature impacts, we revisited three previous studies (each led by a different G-Feed blog contributor) that had used standard measures of Ta to study climate impacts. For each study we replicated the analysis with both Ta and Ts and compared model performance. The first study examined temperature effects on maize yields in Africa using historical field trial data from plots under optimal management and plots under drought management. The second study looked at county level maize yields in the U.S. and the third study, which we selected to look at an application outside of agriculture, estimated temperature effects on county-level GDP in the U.S..
In order to assess model performance we calculated out-of-sample prediction error by repeatedly estimating the models on randomly selected 75% subsets of locations, predicting values for the 25% of locations that had been excluded from estimation, and calculating the RMSE of out-of-sample predicted values relative to actual values. In each case we found similar relationships between temperature and the outcome of interest (panels 1 and 2 below). For both types of African maize field trials we actually found that models with satellite temperature had lower prediction error than models with ground station temperature. However, when we compared models in the U.S. (panel 3) with its high density of stations, the models had very similar predictive powers. This finding suggests that Ts is more useful in regions with poor station coverage where Ta is measured with significant levels of error.               
Response functions for maize yields in Africa (left 2 plots) and maize yields in US (right plots).  Orange is surface temperature from MODIS, blue is air temperature from stations. 

While Ts predicts crop yields well, it is less clear whether it could be used to estimate response functions for non-agricultural applications like GDP. The GDP study also differs from the agricultural studies because instead of using seasonal averages for temperature we used temperature bins. The satellite data we used were 8-day composites so for every observation that fell into a given temperature bin we assigned eight days to that bin (in other words we assumed constant – or at least within a constant bin - temperature for each 8-day period). Even with this assumption, the model with Ts (somewhat surprisingly to us) reproduced a similar non-linear response function over most of the temperature support, particularly at the upper end of the temperature distribution where income appears to be most sensitive to temperature.  

Replication of Deryugina and Hsiang (2014), using ground stations (left) and MODIS (right).
Lastly, as a final comparison, we compared the aggregated impacts from 1C warming estimated with both models. In doing so we again find similar estimates for all applications.

Impact of +1C warming for different models. 
This overall consistency is perhaps somewhat surprising, given the often low correlations between anomalies in Ts and Ta at the daily or 8-day time scale. In the paper we argue that there are at least four reasons Ts could outperform Ta:

1.   Some of the “noise” in Ts vs. Ta relationships stems from errors in the Ta measures, particularly in regions such as Africa where Ta is often interpolated from anomalies at stations tens of kilometers away. So just because Ts and Ta don’t always agree it doesn’t necessarily mean Ts is wrong.
2.   Much of the noise likely cancels out when aggregating temperatures to the monthly or seasonal time scales that are used in regressions that relate outcomes to temperature. For applications that require finer temporal resolution of temperature measures, the noise in Ts may become more important – although again, whether it is larger than noise in high-temporal-resolution Ta remains an empirical question.
3.  Unlike ground measurements, satellite data come from a consistent sensor. Relative spatial variations could therefore be captured more precisely with satellites than with ground measurements from different instruments.
4.   There is reason to believe that Ts could be more appropriate than Ta for agricultural applications. In vegetated areas much of the noise in the daytime Ts vs. Ta relationship arises from anomalous canopy transpiration rates, with stressed canopies often several degrees warmer than Ta whereas healthy canopies are typically several degrees below Ta. Thus, Ts provides a more direct measure of crop condition than Ta, and this represents an advantage of Ts for agricultural applications that may compensate for some of its deficiencies.

Overall this exercise increased our optimism that Ts can serve as a replacement for Ta in some applications. One of the primary downsides of using satellite data is that the records do not go back as far as ground station records so this approach will not be appropriate for many studies with longer timescales. There are also some issues surrounding the transformation of Ts units into Ta units that we discuss in the paper. Despite these caveats, for studies covering recent years our results suggest that Ts is a viable option for replacing Ta, and that in areas with poor ground station coverage, using satellites to measure temperature may in fact be the better option.