Sunday, November 25, 2012

Apps or agriculture?

In one of the most emailed NYTimes articles from last week, it was reported that there are now more software engineers in the US (about a million) than there are farmers.  A follow-up article on the "Silicon Prairie" suggests that a growing percentage of these software jobs are, ahem, cropping up in farming strongholds like Iowa and Kansas.

The articles point out that while building iPhone apps nobody wants isn't particularly lucrative, building apps that people do want can be (e.g. the guys who thought to launch birds at pigs pulled in over $100 million in revenue in 2011).  And on the whole, getting a job as a software engineer appears to pay pretty well, at least judging by what private shuttle bus access to Silicon Valley has done to rents in the Mission neighborhood in SF where I live. 

This transformation from agriculture to apps is what economists call the "structural transformation" of economies, in which they metamorphose from poorer, rural, and agrarian societies to societies that are rich(er), more urbanized, and more devoted to industry and services.  You start out with everyone poor and farming, and you end with everyone wealthy and making apps for each other.  

Understanding why and how this transition occurs is a central (and old) question in development economics, and one that remains crucial for economic policy in the developing world.  A key debate is to what extent improvements in agricultural productivity help contribute to broader economic development.  That is, do poor countries undergo the structural transformation because they have improved the productivity of their (majority) rural population, or do they undergo it because they have focused their attention and investments on other, seemingly more modern sectors?

Agricultural development and economic growth are clearly related. Below are some plots for African countries (left) and all countries (right) over the period 1961-2008 using annual data from the World Bank, with overall GDP growth on the y-axis and agricultural growth on the x-axis.  On one level this relationship is mechanical:  in countries where agriculture makes up a large percentage of total output, it has to be the case that growth in agricultural productivity increases overall output and makes people better off.  But because agriculture is a relatively small share of total output in most countries outside of Africa, and the relationship is still relatively strong and highly significant using all countries in the world (right plot), there is probably something more interesting going on.  But which way does the causal arrow go?  Does an improving economy lead to agricultural growth, or is agriculture itself an important engine of that growth?

There are a few ways that agricultural growth could spur growth in other sectors.  First, a more productive agriculture could provide stuff that other sectors need to grow:  an agricultural surplus generates capital that can be invested in non-agricultural enterprises and frees up labor to work in them.  Second, increased agricultural productivity means better-off farmers which means increased demand for non-agricultural stuff:  rich(er) people spend a smaller and smaller percentage of their income on food, and have extra money to spend on a new widget or that Angry Birds app.  Agricultural growth could harm other sectors too, however, by raising their costs for land and labor:  increases in ag productivity will likely raise wages and land prices for everyone. Furthermore, if capital and labor are mobile, maybe non-agricultural sectors can get what they need from elsewhere.

A couple recent papers try to shed some causal relationship between agricultural development and broader economic performance.  The basic empirical approach is familiar:  find something that shifts around agricultural productivity in a plausibly random way, and see what this implies for economic performance outside agriculture.  Finding that magical "exogenous" shifter is where the cleverness come in.

In a new working paper, Hornbeck and Keskin study what happens when changes in technology (better pumps) suddenly allowed farmers in parts of the US Great Plains to tap the Ogallala Aquifer for irrigation.  They show that access to the Ogallala greatly increased agricultural productivity in the counties where it could be accessed, and they then compare how the non-agricultural economy performed in these counties to its performance in nearby counties that were otherwise the same but that could not access the Aquifer.  They show that agricultural productivity gains do not seem to have helped the long-run performance of either industry or services in these counties.  

Hornbeck and Keskin conclude that "a large windfall gain in the agricultural sector does not appear to have encouraged broader economic development of the local non-agricultural economy", and that "public support of the agricultural sector does not appear to generate positive economic spillovers that might justify its distortionary impacts."  Citing other work by Hornbeck and co-authors , they note that the opening of large manufacturing plants in the US appeared to have much larger local spillovers.  

Call this support for the "apps not agriculture" view:  if you want to improve outcomes, invest in something besides agriculture.  You hear this a lot.  Here, for instance, is econo-celebrity blogger Chris Blattman a couple weeks ago: "Helping poor women set up a market stall, or small farmers double their profits, is a good and noble goal.... But it is a humanitarian strategy, not a growth strategy. Real economic change will wait for industry."

So, apps not agriculture?  A paper by Nunn and Qian published last year in the QJE, a top econ journal, paints a very different picture.  They study what happened when European sailors brought potatoes back from the Americas to the "Old World" (i.e. Eastern Hemisphere) beginning in the 1500 and 1600s.  They show that relative to what was being grown in much of the Old World at the time, growing potatoes offered a remarkable bang for the buck in terms of both calories and vitamins.  Their empirical approach, similar to Hornbeck and Keskin, compares how outcomes evolved in Old World areas that were able to adopt the potato to how they evolved in nearby areas that were not suitable to potato cultivation.  

They find that potato-related increases in agricultural productivity explain a remarkable 25-30% of the (dramatic) increase in population and urbanization that these areas experienced between 1700-1900.  That is, agricultural productivity improvements played a driving role in the ensuing structural transformation of these economies.  This finding suggests a very different policy prescription:  if you want to set the structural transformation in motion, you would do well to invest in agricultural productivity improvements. Call this the "from agriculture grows apps" view. 

Which view is correct? And what explains the difference between the findings in these two papers? It seems to me that the Hornbeck and Keskin finding describes an economic setting (post WWII United States) that is very different from many rural agricultural settings of today's poor countries:  trade and capital markets in the US were/are fairly well integrated, which would limit local demand spillovers (just buy that widget you want off Amazon) and limit the need for capital to come from local sources (borrow from Chase instead of your buddy down the street).  

The setting in Nunn and Qian is perhaps more relevant for many of today's developing countries -- and in fact includes a lot of them -- but much of their data and anecdotes are from Europe and so the implications for, say, Africa are not automatic.  Some equally rigorous studies on modern developing countries are going to be key for understanding how we get from agriculture to apps  -- in particular, in understanding whether the insights from papers like Nunn and Qian hold for an increasingly globalized world.  As in the two papers above, this is going to require some serious cleverness in identifying exogenous shifters of longer-term agricultural productivity.

(Concluding sidenote:  much like March Madness, apps can have their own decidedly negative economic spillovers...)

Monday, November 12, 2012

Grow, Canada?

A quick post on an interesting article in Bloomberg last week about expansion of corn in Canada. I’ve been keeping an eye out for stories about possible adaptations in the wake of this summer’s poor corn harvest in the U.S (or as Stephen Colbert called it – our shucking disaster). As we’ve discussed before on this blog, having crops migrate northward is a commonly cited adaptation response. In addition to being encouraged by the warming trends, the northward migration of corn could be helped by new varieties that have shorter cycles and better cold hardiness.

It’s certainly interesting to see the expansion of corn in areas like Alberta and Manitoba, or for that matter in North Dakota. But too often media stories, or even scientific studies, present the changes as de facto proof of adaptation. The question is not really if crops will move around – they always have moved around in the past, and will continue doing so in the future. And I don’t even think the key question is whether climate change will be an important factor in them moving around – the evidence is still slim on this question but the Canada transition is clearly one made easier by the climate trends. Instead, the most important issue is how much is gained by these adaptive moves relative to the overall impacts of climate trends.

Sol has been analyzing this in some detail for different regions around the world, so he will hopefully have some more to say with numbers to back it up. But let me just point to two relevant questions that were each alluded to in the Bloomberg article. First, is the scale of expansion large relative to the main zones of production? In the case of Canadian corn, the article mentions about 120,000 Ha of corn area sown in three provinces of Canada (Manitoba, Saskatchewan, and Alberta). That is less than one-half of one percent of the U.S. corn area.

Second, what is being displaced by the crop expansion? In most cases, including Canada’s, corn is being grown by farmers that used to grow wheat or barley. So the gain in corn production is offset to a large degree by the loss in wheat production (although not completely, since corn typically produces more grain per hectare than wheat). Modeling studies of adaptation typically assume there is a net expansion of total cropland in cold areas, not just expansion of individual crops. Without net area expansion, it is hard to offset losses incurred at lower latitudes.

There is certainly an argument to be made that big expansions of net area will only be seen with more substantial shifts in climate, since only then will it pay to make the large capital investments to open up new areas for agriculture. But it’s also possible that the constraints on expanding into new areas (poor soils, lack of infrastructure, property rights, rules on foreign investment) are large enough that only a modest amount of expansion will happen.

The main point is that changes in crop area have to be judged not just by whether or not they happen, but by whether their impact is large enough to matter. For most readers of Bloomberg, “large enough to matter” may simply mean that it’s an opportunity to make a lot of money. But for those of us interested in global food supply and price dynamics, the scale of interest is much larger. A 25% drop in U.S. corn production leaves a big hole to fill – it’s not enough to drop a few shovels full of dirt in and call it a day. 

Friday, November 9, 2012

Climate and Conflict in East Africa

Andrew Revkin asked what I thought about this recent PNAS article:

John O’Loughlina, Frank D. W. Witmer, Andrew M. Linke, Arlene Laing, Andrew Gettelman, and Jimy Dudhia
Abstract Recent studies concerning the possible relationship between climate trends and the risks of violent conflict have yielded contradictory results, partly because of choices of conflict measures and modeling design. In this study, we examine climate–conflict relationships using a geographically disaggregated approach. We consider the effects of climate change to be both local and national in character, and we use a conflict database that contains 16,359 individual geolocated violent events for East Africa from 1990 to 2009. Unlike previous studies that relied exclusively on political and economic controls, we analyze the many geographical factors that have been shown to be important in understanding the distribution and causes of violence while also considering yearly and country fixed effects. For our main climate indicators at gridded 1° resolution (∼100 km), wetter deviations from the precipitation norms decrease the risk of violence, whereas drier and normal periods show no effects. The relationship between temperature and conflict shows that much warmer than normal temperatures raise the risk of violence, whereas average and cooler temperatures have no effect. These precipitation and temperature effects are statistically significant but have modest influence in terms of predictive power in a model with political, economic, and physical geographic predictors. Large variations in the climate–conflict relationships are evident between the nine countries of the study region and across time periods.
Here is my full reply:

This is useful paper, and the results are important, but the framing by the authors and the press coverage (which I suppose that the author's guide) is strange.

The authors report that months with hotter and drier conditions have much more violence. And the size of this effect is *large*: rates of violence climb 29.6% when temperatures are 2 degrees higher than normal, and 30.3% when rainfall declines from wet to normal/dry (a 2 standard deviation change).  These numbers are really big! To get a sense of scale, shifting from a violent hot/dry anomaly to a more peaceful cool/wet anomaly decreases violence by more than 50%, which is similar to the reduction in crime that NYC felt during Rudy Guiliani's tenure! (see here). Whether or not you think Guiliani was responsible for that decline, it is widely recognized that the reduction in NYC violence on that scale had a large effect on the welfare of New Yorkers.

Now, while these effects are large, they are not new. Our 2011 Nature paper found almost the identical result.  And the 2009 PNAS paper by Burke et al. also reported the same sized effects. Burke et al. was at the national scale, and our paper was at the global scale, so I think that the main contribution of O'Loughlin et al is to demonstrate that the findings of those two earlier papers continue to hold up a the local scale.

What I find surprising about the paper's presentation and the press coverage is that it looks like the authors are trying to bury this finding within the paper by suggesting that these effects are small or unimportant. I don't know why (but it certainly has the feel of Nils Petter Gleditsch's attempt to bury similar findings in his 2012 Special Issue of the Journal of Peace Research here).  Had I had found this results, I would be putting them front and center in the article rather than reporting them in the text on page 3.

The main way that these authors try to downplay their findings is to argue that temperature and precip anomalies don't have a lot of predictive power compared to "other variables", but this is a red herring. The comparison the authors make is not an apples-to-apples comparison. The statistical model the team uses has two dimensions, time and space. In modeling violence, the team first tries to model *where* violence will occur to get a location-specific baseline. Then, conditional on a location's baseline level of violence, they model *when* violence in a specific location is higher or lower than this baseline.  Their main finding has to do with the "when" part of the question, showing that violence within a specific location fluctuates in time, reflecting temperature and precip anomalies. But then they go on to compare whether they can predict the timing or location of violence better, which is not a useful exercise. They conclude that location variables like "population density" or "capital city" are much stronger predictors of violence than timing variables like "temperature" or "presidential election", but spatial variation and temporal variation in violence are completely different in magnitude and dynamics, so it is unclear what this comparison tells us. The authors argue that this tells them that doublings of violence brought on by hot/dry conditions is only a "modest" effect, but this claim doesn't have any statistical foundation and doesn't jibe with common sense. 

To see why, consider the NYC/Guiliani example. If we ran the O'Loughlin et al. model for New York State during 1980-2010 with variables describing "population density" and a trend (to capture the decline in the 1990s), we'd see that on average higher population densities would have higher crime (i.e. NYC has more crime than small towns in Upstate New York) and there is a fall in violence by 50% in the 1990s. But if one used the same measures as O'Loughlin to ask whether location or trend was "more important", we'd find that location is a much more "important" predictor of violence because the difference between violence in NYC and the rest of the state is much more dramatic than the 50% decline within NYC during the 1990s. Using this logic, we would conclude that the 50% decline in the 1990's was "modest" or "not important," but anyone who's been living in NYC for the last couple of decades will say that conclusion is nuts. Halving violence is a big deal. One reason this statement doesn't make a lot of sense is because many rural locations have very low crime rates (in part because they don't have many people) so violence could double, triple or increase by a factor of ten and those changes would be trivial (in terms of total violent events) compared to a 50% change in NYC. The fallacy of these kinds of apples-to-oranges comparisons (mixing "where" with "when" variables) is why using "goodness of fit" statistics to assess "importance" doesn't make sense when working with data that covers both time and space. An aside: this mistake also shows up as one of the critical errors make by Buhaug in his 2010 PNAS article that got a lot of press and is widely misunderstood and misinterpreted.

So in short: The paper is important and useful because the authors confirm at the local-scale previously discovered large-scale results linking climatic events and violence. But their conclusion that these effects are "modest" is based on an incorrect interpretation of goodness-of-fit statistics.

In a follow-up email, I sent him a graph that he ended up posting. This is the full explanation:
Marshall Burke and I were looking at the data from this paper and put together this plot (attached).  It displays the main result from the paper, but more clearly than any of the figures or language in the paper does (we think).  Basically, relatively small changes in temperature lead to really large changes in violence, illustrating my earlier point that this study finds a very large effect of climate on violence in East Africa. 
click to enlarge
About the plot:
The thin white line is the average response of violence to temperature fluctuations after rainfall, location, season and trend effects are all removed. The red "ink" indicates the probability that the true regression line is at a given location, with more ink reflecting a higher probability (more certainty).  There's a fixed amount of "ink" at each temperature value, it just appears lighter if the range of possible values is more spread out (less certain).  We estimate the amount of uncertainty in the regression by randomly resampling the data 500 times, re-estimating the regression each time and looking at how much the results change (so-called "bootstrapping"). This "watercolor regression" is a new way of displaying uncertainty in these kinds of results, I describe it in more detail here.  A similar and related plot showing the relationship between rape and temperature is here.

Wednesday, November 7, 2012

An American, a Canadian and a physicist walk into a bar with a regression... why not to use log(temperature)

Many of us applied staticians like to transform our data (prior to analysis) by taking the natural logarithm of variable values.  This transformation is clever because it transforms regression coefficients into elasticities, which are especially nice because they are unitless. In the regression

log(y) = b* log(x)

b represents the percentage change in y that is associated with a 1% change in x. But this transformation is not always a good idea.  

I frequently see papers that examine the effect of temperature (or control for it because they care about some other factor) and use log(temperature) as an independent variable.  This is a bad idea because a 1% change in temperature is an ambiguous value. 

Imagine an author estimates

log(Y) = b*log(temperature)

and obtains the estimate b = 1. The author reports that a 1% change in temperature leads to a 1% change in Y. I have seen this done many times.

Now an American reader wants to apply this estimate to some hypothetical scenario where the temperature changes from 75 Fahrenheit (F) to 80 F. She computes the change in the independent variable  D:

DAmerican = log(80)-log(75) = 0.065

and concludes that because temperature is changing 6.5%, then Y also changes 6.5% (since 0.065*b = 0.065*1 = 0.065).

But now imagine that a Canadian reader wants to do the same thing.  Canadians use the metric system, so they measure temperature in Celsius (C) rather than Fahrenheit. Because 80F = 26.67C and 75F = 23.89C, the Canadian computes

DCanadian = log(26.67)-log(23.89) = 0.110

and concludes that Y increases 11%.

Finally, a physicist tries to compute the same change in Y, but physicists use Kelvin (K) and 80F = 299.82K and 75F = 297.04K, so she uses

Dphysicist = log(299.82) - log(297.04) = 0.009

and concludes that Y increases by a measly 0.9%.

What happened? Usually we like the log transformation because it makes units irrelevant. But here changes in units dramatically changed the predication of this model, causing it to range from 0.9% to 11%! 

The answer is that the log transformation is a bad idea when the value x = 0 is not anchored to a unique [physical] interpretation. When we change from Fahrenheit to Celsius to Kelvin, we change the meaning of "zero temperature" since 0 F does not equal 0 C which does not equal 0 K.  This causes a 1% change in F to not have the same meaning as a 1% change in C or K.   The log transformation is robust to a rescaling of units but not to a recentering of units.

For comparison, log(rainfall) is an okay measure to use as an independent variable, since zero rainfall is always the same, regardless of whether one uses inches, millimeters or Smoots to measure rainfall.

Sunday, November 4, 2012

Too close to call?

All of the attention on the presidential election has brought up some issues that are familiar to those of us who work in the world of anticipating and preparing for climate change impacts. In particular, there's been a clear contrast in the election coverage between, on the one hand, a lot of media stories that describe the race as a "toss-up" or "too close to call" and, on the other hand, careful analysis of the actual data on polls in swing states that say the odds are overwhelmingly in favor of another term for President Obama. Nate Silver has become a nerd celebrity for his analysis and daily blog posts (his new book is also really good). But there are many others who come to similar or even stronger conclusions. Like Sam Wang at Princeton who has put Obama's chances at over 98%.

I think there are a few things going on here. One is that the popular media has basically no incentive to report anything but a very close race. It keeps readers checking back frequently, and campaigns may be more likely to spend more money to media outlets for advertising if the narrative is for a very close race (although admittedly, they have so much money that the narrative may not make much difference). A more fundamental reason, though, is just a basic misunderstanding of probability. Not being able to entirely rule out something from happening (e.g., Romney winning) is not the same as saying it could easily happen. People mistake the possible for the probable. They want black and white, not shades of gray (at least not fewer than 50 shades of gray).

(Also in the news this week: hurricane Sandy. Another case where people who understand probabilities, like Mayor Bloomberg, have little trouble seeing the link to global warming, while others continue the silly argument that if it was possible for such things to happen in the past, then global warming can't play a role. In their black and white world, things can either happen or they can't. There is no understanding of probability or risk. I call this the Rava view of the world, based on the episode of Seinfeld when Elaine tries to convince Rava that there are degrees of coincidence:

RAVA: Maybe you think we're in cahoots.
ELAINE: No, no.. but it is quite a coincidence.
RAVA: Yes, that's all, a coincidence!
ELAINE: A big coincidence.
RAVA: Not a big coincidence. A coincidence!
ELAINE: No, that's a big coincidence.
RAVA: That's what a coincidence is! There are no small coincidences and big coincidences!
ELAINE: No, there are degrees of coincidences.
RAVA: No, there are only coincidences! ..Ask anyone! (Enraged, she asks everone in the elevator) Are there big coincidences and small coincidences, or just coincidences? (Silent) ..Well?! Well?!..)

Back to my point (you have a point!?), when we turn to climate impacts on agriculture, it's still quite common to hear people say that we just don't know what will happen. Usually this comes in some form of a "depends what happens to rainfall, and models aren't good with rainfall" type of argument. It's true that we do not know with complete certainty which direction climate change will push food production or hunger. But we do know a lot about the probabilities. Given what we know about how fast temperature extremes are increasing, and how sensitive crops are to these extremes, it's very probable in many cases, like U.S. corn, that impacts on crop yields will be negative. (For example, a few years back I tried with Claudia Tebaldi to estimate the probabilities that climate change would negatively impact global production of key crops by 2030. For maize, we put the odds at over 95%). Even in cases where rainfall goes up, the negatives tend to predominate. It's also also very likely that in some cases, like potatoes in England, that impacts will be positive. In either case we cannot say anything with absolute certainty, but that doesn't mean we should describe impacts as "too close to call"

Us academics can probably learn a thing or two from how Nate Silver is trying to explain risk and probability in his daily posts. But it's also fair to say that our task is a little hard for a couple of reasons. First, there are lots of data on past polls and election results, which people can use to figure out empirically how accurate their methods would have been in past cases. With climate change, we are often talking about changes that have not been seen in the past, or at least not by enough cases to develop a large sample size for testing. A second and, in my view, more critical difference is that climate impacts happen on top of many other changes in society. Elections provide a clear outcome - a candidate wins or loses. But what does a climate impact look like? How do we know if our predictions are right or not?  A lot of the entries in this blog are around that question, but the short answer is we can't directly measure impacts, we have to be clever in thinking of ways to pull them out of the data.

So maybe all of the attention to the election forecasts will help the public understand probabilities a little better. If nothing else, people should understand the difference between a 50% chance and an 80% chance of something happening. Reporting the latter as if it were the former is annoying in the context of the election, or as Paul Krugman says "Reporting that makes you stupid". But confusing the two in the case of climate impacts is more than annoying, it can lead to a lot more wishful thinking and a lot fewer smart investments than would otherwise be the case.

One final note: even when people are on board with the meaning of probabilities, it's still not so easy to get them right. Silver has the election at ~85% chances for Obama. That's high, but his chances of Romney winning are about 10 times higher than what Wang has. So just like with climate impacts, smart people can disagree, and it usually comes down to what they assume about model bias (Silver seems to admit a much higher chance that all polls are wrong in the same direction.) But even if smart analysts disagree, very few if any of them think the election results (or climate impacts) are a toss-up..