G-FEED

Monday, March 18, 2013

Land Prices

There's a nice article in the New York Times about farmland prices. The article suggests (as do most NYT articles on land values) that it's just another bubble. Since we've had a couple bubbles in relatively recent memory, now it seems like everything's a bubble.

I'm not so sure. There is a lot that's different about farmland values as compared to houses. For one, we don't have quite the same level of debt relative to asset value. It's reported as 25.5% for farmland. That's the highest since the 1980s boom. But then given today's low interest rates, it's not really a fair comparison to the 1980s. It's also less than half the 60%+ level of debt we had for housing at the peak of the bubble. Also, we don't have stated-income mortgages. Banks are more cautious with farms than they were with houses during the Countrywide Mortgage years.

The simple analysis compares the price to rent ratio to current long-run interest rates. Today it looks like land that sells for around $10,000/acre rents for $500-600/year. That's a 5-6% yield where 10 year T-bill rates at this writing are a little under 2% and 30-year T-bill rates are a little over 3%.

Now, there are costs and benefits to owning land instead of T-bills. A benefit to land is that is has built-in inflation protection. Inflation will push up commodity and land prices like everything else. So, the better comparison may be real rates, which are currently negative for 10-20 years out, which makes a 5-6% yield look pretty good.

A cost to owning land is that it's risky. Rents are high because commodity prices are high. If commodity prices fall, so will rents and yield. So, what's the forecast for prices? I'd say that's hard to tell. They could go up further, they might crash if rapid productivity growth resumes. The best bet is to look at futures prices, which suggests prices will fall somewhat from today's levels, but not that much. Then there is the question of how to price uncertainty around that best guess.

It's not clear how costly that risk really is. For a family with all their wealth piled into the farm, the risk is a big deal. Most family farms own their land outright, or have relatively small mortgages. A downturn in commodity prices, rents and land values would hurt for sure. But typically there's not the kind of leverage we had in homes (the Times article has a notable exception).

Consider, in contrast, the risk to outside investors. Wall Street types are pilling into land, probably because they see a reasonable yield. But their exposure to risk is quite different. These are highly diversified investors who have large portfolios of assets. If farmland values crash, it's unlikely to be closely associated with the rest of the economy. The fact that agriculture is a small share of our economy, and that commodity prices have little association with the rest of the economy, means that risk is far less important to these outside investors than it is to the family farm. This basic point was implicitly made by Gary Gorton and Greet Rouwenhorst awhile back.

So, the basic math suggest that land investment is a reasonable deal right now. With prices this high, it may not be quite as good a deal as buying a home in some post-bubble cities with low price-to-rent ratios. But it's not bad, and probably not as difficult for an arms-length investor to get into.

So, I don't think this is a bubble. However, you should be forewarned that I've been wrong about this kind of thing before.

Monday, March 4, 2013

Why heat hurts

In a recent post I wrote about why rainfall is sometimes given too much credit for variations in crop production. Or, put differently, temperature deserves more credit than it often gets. A paper we have out this week delves into the reasons for this. Below is a brief background and summary, but see full paper here.

As background, there are now several studies showing close correlations between crop yields and temperature, in both rainfed or irrigated areas. More specifically, we now see that a lot of these correlations are driven by the tails of the temperature distribution – more very hot days means lower yields. Those of you familiar with our blog will know we often represent this effect by summing up degree days above some threshold, like 29 or 30 °C. See posts here, and here.

But correlations don’t tell us much about mechanisms of causation. And as a result, many people are wary about using correlations to project future impacts. So why might hot days in particular be so important? Agronomists are usually quick to talk about the key time of flowering when some really hot days can spell disaster. This is confirmed by many experimental studies, including one recent one here. But there are other things that could be going on in farmers’ fields in addition to this. One is the fact that hot air can hold more moisture, so that hotter days tend to have higher vapor pressure deficits (VPD). When air has higher VPD, plants lose more water per amount of CO₂ uptake, which lowers their water use efficiency. The response of most plants is to then slow down growth, which avoids losing too much water during the parts of the day where efficiency is lowest. Tom Sinclair, among others, has a boatload of interesting papers on this dynamic and the tradeoffs involved with selecting varieties that slow down more or less under high VPD.

Now to the study. We wanted to see if the VPD effect could explain the observed importance of hot days for corn in the U.S. So we took an Australian crop model, APSIM, that handles the VPD effect and simulated some long time series of yields at different locations. We then look at whether the model can reproduce the observed relationships, and the match was quite good:

We then look at some diagnostics in the model to confirm that it’s high VPD causing growth to slow, which causes lower yields. Then, to be sure it is not just that hot days or years tend to happen when rainfall is low, we artificially manipulate temperature or rainfall one at a time, and see how the model responds. The figure below shows the change in water stress (lower values on the y-axis mean higher amounts of water stress) for 3 key months for corn (July being the most important for final yield). This shows that a warming of 2°C causes a much bigger (3x) increase in water stress than a drop of in-season rainfall by 20%.

Why does this matter? First, it suggests that a lot of the effects of extreme heat (at least for this crop, in this region, in today’s climate) are related to drought stress. So when the media refers to the big drought, that is technically correct (at least in this case). But it’s important to be clear what we mean by drought – we don’t necessarily mean low rainfall (a common meteorological definition of drought), or even low soil moisture (a common definition of “agricultural drought”), but we mean a more agronomic definition of drought, such as “not enough water to grow as fast as a plant can”. The key is that water stress is not just about water supply to the plant, but about how much water it has relative to how much water it needs to maintain growth rates. The “needs” or water demand part depends on VPD, and hotter days on average tend to have higher VPD (and extended heat waves tend to have much higher VPD).

Second, it implies that efforts to adapt to warming in U.S. maize should probably, at least for the near future, focus on dealing with water stress associated with high VPD, rather than, say, the direct effects of heat damage during flowering.

One thing we didn’t have the horsepower to do for this study would be to repeat the analysis with other types of crop models, most of which handle water stress slightly differently than APSIM. That would tell us how many models that have been used to project climate change impacts on U.S. agriculture are actually getting the key process right. It’s the type of model comparison that hopefully AGMIP will tackle – not just comparing projections of different models, but seeing how well they perform in reproducing historical responses to extreme heat.

Friday, March 1, 2013

Dr. Seuss on G-FEED

Apologies for the lack of posting. Should pick up more soon. Meanwhile, my kids are celebrating Dr. Seuss's birthday at school today. So on my way to work, without time for a real post, I thought about what good old Theodore might post on our site. Here's my attempt:

Haze and heat for maize and wheat

Are not helpful at all.

They wilt and sweat and then worse yet

Yield trends may start to stall.

What happens next is hard to guess,

but may be change in diets.

For us that’s fine, but if poor can’t dine,

it may mean death and riots.

But maybe things won’t be so bad and we can find salvation,

With new genes or crops or lots more drops of pricey irrigation.

But how to know? And what to do?

Ask many worried leaders

What’s in the cards? Well, ask the blowhards

who call themselves G-FEEDers

The data, they say, is where to look,

if we are really wise

To know what works or what will just

keep prices on the rise

That’s great you say, but since last May,

they have only five posts

But that’s the speed you get, indeed,

when working on left coasts

So don’t forget to check the site and come again real soon

What’s soon, you ask? Well, that depends. For Schlenker could be June.

Until then, they work and work,

taking breaks only for soup

Except the one that they call Burke,

who blogs on wars and poop

But in the spring, our tweets will ring,

and we’ll have no excuses

With no wisdom to spread, for now instead,

I leave you with Dr. Seuss’s:

Just never forget to be dexterous and deft. And never mix up your right foot with your left.

Wednesday, February 20, 2013

From cell towers cometh rainfall estimates?

Getting trustworthy data on rainfall and temperature over space and time is really critical to science across a range of disciplines. Sadly, the world's basic weather monitoring network has been in serious decline over the last few decades. Here is a depressing plot from a recent paper by Lorenz and Kuntsman showing the decline in the number of reporting weather stations around the world:

As the number of stations declines, our estimates of how much rain fell in a given area at a given time gets worse and worse, and this can create really bad problems for applied social scientists. Rainfall and temperature typically show up as independent variables in social scientists' statistical analyses. If these variables are measured with error and this error is "classical", i.e. uncorrelated with the "true" underlying thing being measured, then your estimates of the effect of these variables on the outcome of interest get biased towards zero. That is, you (perhaps erroneously) conclude that rainfall or temperature had no effect -- or a much smaller effect -- on crop yields or infant mortality or conflict than what actually occurred.

There are some work-arounds to having limited gauge data. One is to use "reanalysis" data, in which some kind climate scientists sitting in a basement somewhere take the available observations and use them as inputs to a climate model, spitting out a physically consistent record of weather over time. Another option is to use data from satellites, but satellites appear to measure rainfall better than temperature, and the available satellite output is often at a fairly coarse scale (e.g. 200km grids). A third is to use interpolations of the gridded data, which imputes missing observations using a weighted average of nearby observations. This works well when there are nearby observations, but the station density is so low in places like Africa that interpolation might introduce substantial error.

Enter a fourth option. Cell phones (and cell phone infrastructure) can apparently be used to do more than update your Facebook status. From a new paper in PNAS this week:

Country-wide rainfall maps from cellular communication networks
Aart Overeem, Hidde Leijnse, and Remko Uijlenhoet

Accurate and timely surface precipitation measurements are crucial for water resources management, agriculture, weather prediction, climate research, as well as ground validation of satellite-based precipitation estimates. However, the majority of the land surface of the earth lacks such data, and in many parts of the world the density of surface precipitation gauging networks is even rapidly declining. This development can potentially be counteracted by using received signal level data from the enormous number of microwave links used worldwide in commercial cellular communication networks. Along such links, radio signals propagate from a transmitting antenna at one base station to a receiving antenna at another base station. Rain-induced attenuation and, subsequently, path-averaged rainfall intensity can be retrieved from the signal’s attenuation between transmitter and receiver. Here, we show how one such a network can be used to retrieve the space–time dynamics of rainfall for an entire country (The Netherlands, ∼35,500 km2), based on an unprecedented number of links (∼2,400) and a rainfall retrieval algorithm that can be applied in real time. This demonstrates the potential of such networks for real-time rainfall monitoring, in particular in those parts of the world where networks of dedicated ground-based rainfall sensors are often virtually absent.

This seems potentially really cool, particularly for places like Africa where the cell network is expanding incredibly fast. Here's a plot of how they did in the Netherlands, comparing their cell-tower derived estimates (y-axis) to what you get off radar (x-axis).

I'm not sure why they didn't compare against rainfall gauge estimates (have the Dutch stopped operating their rainfall stations too?), but nevertheless this doesn't look too bad. Really small average difference between the two measurements, and a high correlation. It's pretty much impossible to generate a similar plot over a region of interest in, say, Africa, comparing what you get from interpolated station data or satellite data to what the "true" observation is on the ground, but I would be surprised if it looks this good.

The authors do note that cell phone towers in the tropics tend to operate at a lower frequency, which "can increase errors in rainfall estimates due to the increased nonlinearity of the relationship between rainfall intensity and specific attenuation at lower frequencies". But given the low baseline number of weather stations in much of the tropics, and the fact that the number of stations is getting even smaller, it seems like this sort of tool might have a lot of potential.

Hopefully, as the authors say, "this research report will contribute to persuade cellular communication companies worldwide to provide received signal level (RSL) data from their radio link networks free of charge for both research purposes and other applications of societal relevance." It would be really neat if they could show how this works in some place with slightly sparser cell coverage. Oh, and it would be nice if they could get us temperature too.

Tuesday, February 19, 2013

Hot to trot: does Genghis Khan owe it all to climate change?

There is now a voluminous literature seeking to understand links between climate and various types of human conflict. So far this literature has been a lot better at describing the "reduced form" relationship between climate and conflict -- the statistical effect of a changes in a given climate variable on a given conflict outcome -- than in understanding the underlying mechanism(s) that link climate variables to conflict events.

There are many such mechanisms that have been proposed. To stylize the debate somewhat, economists' favorite explanation has to do with opportunity costs: an individual becomes more likely to start or join a conflict if a change in climate harms his alternate modes of income generation, such as farming. Political scientists often favor explanations about state capacity: an adverse shift in climate can reduce government revenues (e.g. through reduced tax revenue or export earnings from agricultural commodities), and thus weaken the government's ability to repress insurgents. Psychologists often focus on aggression: for whatever reason, hotter temperatures cause humans to respond more aggressively and violently to one another.

A fourth potential mechanism concerns direct climatic effects on the logistics of war: it's hard to wage war if your cavalry is stuck in the mud, if floods wash out all the roads and bridges, or if your troops are freezing. This mechanism is echoed by noted war-mongerer Frederick the Great, who apparently said that "It is always necessary to shape operation plans... on estimates of the weather". The logistics mechanism has also been the focus of some recent research that has received significant coverage in high-visibility outlets (see here, here, and here) - somewhat remarkable because, as far as I can tell, these researchers have yet to produce any sort of publication or working paper on the topic!

What this group of researchers has apparently done is to use tree rings to reconstruct a long-run record of climate in Mongolia, with a focus on the climate around the emergence of Genghis Khan in the 13th century. They show that Khan's rise to power around 1206 correlated with two decades in the climate record that were wetter than any other 20-year stretch over the previous 900 years. Mongolia is typically quite dry, so this wetness was likely a boon to fodder production, which in turn would have improved Khan's meat supply and (particularly) his ability to feed the horses that he and his hordes depended on for transportation. And this, in turn, would have allowed Khan to exercise his charming habit of "wholesale massacres of civilian populations" and would have abetted his remarkably rapid establishment of the largest contiguous land empire in the history of the world. All of this would seem to make G. Khan one of history's premier climate change beneficiaries.

Genghis Khan: a climate-made man? (Source:Wikipedia)

What does this tell us about the logistics hypothesis that links climate to conflict? First, aside from testing this specific mechanism, hopefully the authors will use their long time series to perform a rigorous statistical test of how variation in the strength of the Mongol Empire was linked to variation in climate. As mentioned previously, these sorts of rigorous tests are largely absent in the paleo literature on climate and conflict, making you worried that changes in climate could be correlated with other time-trending variables that also affect conflict. Then the trick to establishing support for the logistics hypothesis would be to try to shut down other potential explanations for what's going on. This is generally really hard to do empirically, as you almost never have data on all the intervening variables of interest, and as it is often the case that the variables you do have all happen to respond to climate. (For instance, distinguishing between "state capacity" and "opportunity cost" arguments is pretty thorny, because both individual incomes and government revenues respond to changes in climate in a measurable way).

So promoting the "logistics" hypothesis here will probably boil down to good 'ol storytelling. But in this particular case, this story might have legs. It's hard to see how climatic conditions that created more favorable economic conditions would lower the opportunity cost of joining a rebellion or reduce state capacity. The argument here is the opposite: climatic change momenta enriched the state and its capabilities. A bigger worry is that rainfall and temperature appear positively correlated in this setting -- these guys note that it was relatively warm during the same period -- and so more rainfall could just be proxying for higher temperatures, and these could have had a direct effect on Mongol aggression. But you would have to REALLY be pissed off by higher temperatures to repeatedly go on a pan-continental killing spree, so this also seems a little unlikely.

The big question at the end of the day is probably more about whether climate's effects on 13th century conflict logistics has any implication for today's world, and today's climate/conflict debates. Hopefully the researchers will be as adept at answering this question as they have been in generating pre-publication press coverage!

Friday, January 18, 2013

Blame it on the rain?

One of the things I’ve been puzzling over lately is why so many agronomists and others in agriculture seem to have the mantra that July rainfall makes the corn crop. These are generally smart people whose opinion I respect. For example, I’m a fan of Scott Irwin and Darrel Good’s blog over at Farmdoc daily, where they recently discussed the prospects for next July’s rainfall. They are definitely not the only ones to focus on rainfall. Whenever I present empirical work on weather and yield to a group of agronomists, at least one will invariably argue that the strong temperature effects are just an artifact of temperature tending to be high when rainfall is low. One problem with that argument is that a lot of the recent empirical work shows temperature as being important, and not all of it uses datasets with high correlations between temperature and rainfall.

Usually I try to explain the various mechanisms that link temperature to yields. And these discussions can often lead to interesting studies about which mechanisms matter more, such as one paper we have coming out soon. But I never really delved into the reason that rainfall is given so much credit for good or bad years. One likely reason is it’s just a lot easier to see whether or not it rains than to detect a shift in temperatures. So people will tend to remember dry years more than hot years. But there’s also some analysis that appears to support the rainfall hypothesis.

A lot of the work on US corn and weather traces back to Louis M. Thompson’s work 30 years ago. These were basically time series models at the state level, looking for instance at how Illinois yields changed over time in relation to weather. More recent work has updated this type of analysis, and emphasizes the role of July rainfall and temperature, but with a bigger role for rainfall. To recreate that type of analysis, I plot below detrended corn yields for Illinois (detrended by fitting a linear slope to represent gradual technology change, and then adjusting all years to 2006 technology) versus July rainfall (prec) and average maximum temperature (Tmax). I also plot prec and Tmax against each other. (Correlation coefficients are given in the bottom panels). Thanks to Wolfram for providing updated data.

You can see that yields are clearly low at low levels of rainfall, and that yields are also low at high Tmax. But you can also see that Tmax and rainfall have a strong negative correlation, which makes it hard to say whether Tmax, rainfall, or some combination (or neither) is the actual cause of yield loss. I also show three recent years in color (red = 2009, green = 2010, blue = 2011). What’s mildly interesting about these years is they don’t follow the normal correlation between Tmax and rainfall. 2009 was especially cool but with medium rainfall, and 2010 and 2011 were both unusually warm for the given amount of rainfall.

The main point, though, is that the colinearity issue cuts both ways. You can’t just decide it’s rainfall and say that temperatures are less important, no more than you can decide it’s all temperature. Many empirical studies try to move beyond these simple time series specifically because of the colinearity problem. One way to reduce colinearity between Tmax and rainfall is to restrict yourself to looking over a narrow range of one of the variables, so that it is essentially being held fixed while the other one is varying. This is hard to do with a time series that is about 50 years long in total, because you quickly run into problems with small sample sizes.

But it’s easier to do this if we look at time series from lots of counties at the same time, or a so-called panel analysis. As a simple illustration, the plot below shows all points for 1950-2011 for counties in the “three I” states: Illinois, Iowa, and Indiana. The left-hand plot just shows the same scatter as before between Tmax and rainfall. Notice there is still a strong negative correlation. But we can now select only points that are within a narrow range of rainfall (shown as green points) or a narrow range of temperature (shown as red). Then we can take the red points and see how rainfall matters when holding temperature constant (middle panel). Or we can take the green points and see how temperature matters when holding precipitation constant (right panel). The black lines in the right two panels show local polynomial fits to the data (using loess in R).

What do we see? Well, at least for these particular places and values of Tmax and rainfall, there does not appear to be much effect of changing rainfall when July Tmax is constant (at around 30C). But temperature changes do appear to be important when rainfall is held constant (at around 100mm).

Now, there are obviously lots of other combinations we could try, and the point isn’t that rainfall is always and completely unimportant. For example, if we hold Tmax constant at a higher level, where I’d expect rainfall to be more critical, we do in fact see a bigger effect of rainfall at low levels (see below). But if nothing else, it should be clear that a lot of the credit given to July rainfall for US corn is not necessarily well deserved. Coincidentally, the singers of “blame it on the rain” also got a lot of underserved credit!

In future posts, I’ll try to get more into the reasons that temperature can dominate the effects of rainfall, even in a rainfed system.

Wednesday, January 2, 2013

Some new - and not that hopeful - evidence on adaptation

Quantitative estimates of the potential impacts of future climate change are an important input to policy discussions, and researchers across a wide range of disciplines are starting to get in on the action. Most impact studies proceed something like this:

Choose an outcome and location of interest (say, corn yields in the US)
Assemble some historical data and investigate how this outcome has responded to past changes in climate in that location
Use these historical estimates to say something about what might happen in the future, given estimates of future changes in the climate variables you examined in (2).

A key assumption in this approach is that past responses to climate variables (step 2) tell you something about how future populations might respond to similar changes (step 3). But one problem is that there is often a mismatch between the types of climatic changes that are used to estimated responses in step 2, and the types of future changes in climate that we are particularly worried about in step 3.

For instance, the historical response of (say) corn yields to climate is often estimated using year-to-year variation in temperature (what we typically call "weather") -- e.g. these estimates ask, if temperatures at a given location were 1C hotter than average in a given year, how much lower were corn yields in that year? But the future climate changes that we're mainly worried about are gradual changes in temperature and precipitation that will play out over many decades. The worry is then that if farmers can recognize and adapt to these gradual changes, for instance by changing when they plant or switching the cultivar or crop they grow, and these adaptions offset some or all of the losses they otherwise would have experienced, then estimates of "short-run" responses to climate fluctuations might be a poor guide to the impacts of longer-run more gradual changes.

How can we know whether longer run responses to gradual changes in climate differ from shorter-run responses to climate fluctuations? In a new working paper, Kyle Emerick and I try to shed some light on this by exploiting the surprisingly large variation in recent longer-run trends in temperature and precipitation across US agricultural areas. It turns out that some US counties have warmed up a bunch over the last 30 years, and others have actually cooled slightly. Why this is has happened is an active area of climate research, but appears tied to some combination of aerosols and natural climate variability (e.g. see new paper by Meehl et al, or google "U.S. warming hole" for more on this - probably better to have safe search on…).

Below is a plot of changes in average growing season temperature and precipitation between 1980-2000 for US counties east of the 100th meridian. The color scale for each map is given by the histogram beneath, and as you can see, some places have cooled almost half a degree C, others warmed 1.5C, and some counties have seen precip fall by 40% and others rise by 40%.

Change in average growing season temperature, precipitation, and corn yield between 1980 and 2000.

What we do in the paper is estimate how yields of the main US crops (corn and soy) have responded to these gradual, multi-decade changes in climate. We can then compare these longer run responses to estimates of shorter-run response to inter-annual fluctuations in temperature and precipitation. The difference between these two responses is our quantitative estimate of "adaptation". If farmers indeed have a lot of options available to them in the long run that are not available in the short run, then we would expect exposure to hot temperatures to be much less damaging in the long run than in the short run.

We find very little evidence for longer-run adaptation to climate: yield responses to longer-run increases in temperature are large, negative, and statistically indistinguishable from responses to shorter-run (annual) fluctuations in temperature. This appears true for both corn and soy, and doesn't seem to depend much on the period over which we measure the changes or how we do the econometrics.

From a future impact perspective this is bad news, because it would imply that farmers do not appear to currently have a lot of options for mitigating damages to extreme heat exposure. Indeed, we can generate impact estimates by combining our estimates of these recent longer-run responses with projections of future climate change, and our median projection is a roughly 15% decline in corn yields by 2050 -- almost on par with yield declines during the well-publicized 2012 heat wave/drought across the US.

Nevertheless, there are some potential concerns with our approach and with our impact estimates. One is that we are focusing on yield impacts, and it could be the case that farmers are making other adjustments that a narrow focus on yield will not pick up -- for instance they could be switching to other crops that we are not looking at, or they could be getting out of agriculture altogether, planting golf courses instead of corn. Were this the case, though, we would expect to see substantial declines in area planted to corn in the counties that warmed dramatically, and we find little evidence that this is the case.

Another concern is that maybe farmers have simply not realized that temperatures are changing, and that if future changes are particularly large or salient, they will be recognized and quickly responded to. We have a couple responses to this. The first is that recognition is an important part of adaptation: because warming over the past few decades in many counties is roughly on par with warming expected over the next 3-4 decades, it's not immediately clear how near-term future changes in climate will somehow be way more obvious than past changes. Second, we show in the paper that some factors that you might think could be correlated with recognition of or belief in recent climate change do not predict how counties responded to recent warming. In particular, neither counties that are better educated (who might have better access to information on climate change, be loyal readers of G-FEED, etc) nor counties who voted more Democratic in the 2000 Presidential election (who studies show are more likely to believe in climate change) were any less harmed by recent warming than less educated or more Republican counties. These are obviously not perfect tests of "recognition", but were the best we could think of (other suggestions welcome!).

A final concern is that we are again relying on historical relationships to make claims about the future impacts. That is, although we now use historical longer-run responses (instead of short-run responses) to predict impacts under future climate changes, we are still assuming that past responses to climate variables are a guide to how future populations will respond. While in principle you could arbitrarily specify any relationship between past response and future response that you wanted to (e.g. future responses to 1C will be half as large as past responses), our "business-as-usual" assumption seems the most relevant to the policy question at hand: what's a reasonable estimate of what might happen in the absence of new investments in adaptation. To us, our findings of substantial losses in the face of past long-run changes in climate suggest that it would be dangerous to assume that farmers will easily adapt to future changes. More likely, adaptation is going to require investment, and without these investments the summer of 2012 is not a bad picture of what the "new normal" might look like.

Friday, December 21, 2012

The good and bad of fixed effects

If you ever want to scare an economist, the two words "omitted variable" will usually do the trick. I was not trained in an economics department, but I can imagine they drill it into you from the first day. It’s an interesting contrast to statistics, where I have much of my training, where the focus is much more on out-of-sample prediction skill. In economics, showing causality is often the name of the game, and it’s very important to make sure a relationship is not driven by a “latent” variable. Omitted variables can still be important for out-of-sample skill, but only if their relationships with the model variables change over space or time.

A common way to deal with omitted variable bias is to introduce dummy variables for space or time units. These “fixed effects” greatly reduce (but do not completely eliminate) the chance that a relationship is driven by an omitted variable. Fixed effects are very popular, and some economists seem to like to introduce them to the maximum extent possible. But as any economist can tell you (another lesson on day one?), there are no free lunches. In this case, the cost of reducing omitted variable problems is that you throw away a lot of the signal in the data.

Consider a bad analogy (bad analogies happen to be my specialty). Let’s say you wanted to know whether being taller caused you to get paid more. You could simply look at everyone’s height and income, and see if there was a significant correlation. But someone could plausibly argue that omitted variables related to height are actually causing the income variation. Maybe very young and old people tend to get paid less, and happen to be shorter. And women get paid less and tend to be shorter. And certain ethnicities might tend to be discriminated against, and also be shorter. And maybe living in a certain state that has good water makes you both taller and smarter, and being smarter is the real reason you earn more. And on and on and on we could go. A reasonable response would be to introduce dummy variables for all of these factors (gender, age, ethnicity, location). Then you’d be looking at whether people who are taller than average given their age, sex, ethnicity, and location get paid more than an average person of that age, sex, ethnicity, and location.

In other words, you end up comparing much smaller changes than if you were to look at the entire range of data. This helps calm the person grumbling about omitted variables (at least until they think of another one), and would probably be ok in the example, since all of these things can be measured very precisely. But think about what would happen if we only could measure age and income with 10% error. Taking out the fixed effects means removing a lot of the signal but not any of the noise, which means in statistical terms that the power of the analysis goes down.

Now to a more relevant example. (Sorry, this is where things may get a little wonkish, as Krugman would say). I was recently looking at some data that colleagues at Stanford and I are analyzing on weather and nutritional outcomes for district level data in India. As in most developing countries, the weather data in India are far from perfect. And as in most regression studies, we are worried about omitted variables. So what is the right level of fixed effects to include? Inspired by a table in a recent paper by some eminent economists (including a couple who have been rumored to blog on G-FEED once in a while), I calculated the standard deviation of residuals from regressions on different levels of fixed effects. The 2^nd and 3^rd columns in the table below show the results for summer (June-September) average temperatures (T) and rainfall (P). Units are not important for the point, so I’ve left them out:

	sd(T)	sd(P)	Cor(T1,T2)	Cor(P1,P2)
No FE	3.89	8.50	0.92	0.28
Year FE	3.89	4.66	0.93	0.45
Year + State FE	2.20	2.18	0.84	0.26
Year + District FE	0.30	1.63	0.33	0.22

The different rows here correspond to the raw data (no fixed effect), after removing year fixed effects (FE), year + state FE, and year + district FE. Note how including year FE reduces P variation but not T, which indicates that most of the T variation comes from spatial differences, whereas a lot of the P variation comes from year-to-year swings that are common to all areas. Both get further reduced when introducing state FE, but there’s still a good amount of variation left. But when going to district FE, the variation in T gets cut by nearly a factor of 10, from 2.2 to 0.30! That means the typical temperature deviation a regression model would be working with is less than a third of a degree Celsius.

None of this is too interesting, but the 4^th and 5^th columns are where things get more related to the point about signal to noise. There I’m computing the correlation between two different datasets of T or P (details of which ones are not important). When there is a low correlation between two datasets that are supposed to be measuring the same thing, that’s a good indication that measurement error is a problem. So I’m using this correlation here as an indication of where fixed effects may really cause a problem with signal to noise.

Two things to note. First is that precipitation data seems to have a lot of measurement issues even before taking any fixed effects. Second is that temperature seems ok, at least until state fixed-effects are introduced (a correlation of 0.842 indicates some measurement error, but still more signal than noise). But when district effects are introduced, the correlation plummets by more than half.

The take-home here is that fixed effects may be valuable, even indispensible, for empirical research. But like turkey at thanksgiving, or presents at Christmas, more of a good thing is not always better.

UPDATE: If you made it to the end of this post, you are probably nerdy enough to enjoy this related cartoon in this week's Economist.

Wednesday, December 12, 2012

What poop tells us about the social impacts of climate change

A growing literature in paleoclimate and archeology explores the extent to which past fluctuations in climate have shaped the evolution of human societies. These papers get to tackle pretty sexy topics: did climate help cause the collapse of the Maya? Is climate implicated in dynastic transitions in China? How about in the fall of Angkor Wat?

That a lot of these papers are answering "yes" to the question of whether climate is implicated in large historical social upheavals could tell us something important about the impact of future climatic changes on social outcomes. But there are a few things you might worry about in this literature. One is that the studies are actually measuring what they say they're measuring -- i.e that they're picking up meaningful variation in human activity, and that changes in societies and in climate happened when they say they did. Most of the papers published that you see on these topics spend most of their time convincing you that this is the case, and given my mere hobbyist's understanding of paleolimnology I have to take them at their word.

The second concern is one that is more familiar to folks that are used to running regressions: can we say with certainty that the variations in climate are causally linked to the socioeconomic variation of interest? The hard part with these papers is that they're often dealing with one-off events -- e.g. the collapse of the Maya -- that don't give you the repeated observations you need to carry out the typical statistical tests. Basically, you'd be worried that even though the collapse event you measured was coincident with a large climate shock, by chance something unobserved might also have happened at the same time that in fact caused the collapse. Given this, you might be worried that these studies are looking under the proverbial lamppost for the proverbial keys: we'd like to observe the universe of all climate events and all collapse events over time, but instead we focus on a few iconic ones.

A new paper in PNAS helps overcome some of these concerns. D'Anjou and coauthors use coprostanol concentrations (Wikipedia: chemical compounds found in fecal matter of higher order mammals - i.e. poop) that they dug up in a Norwegian lake to estimate the variation in local human activity in the nearby area over the last 2000 or so years. They then compare this to existing reconstructions of local summertime temperature, which is the time of year when agriculture would have been possible. The nice thing about their paper is that they have a lot of observations of the same place over time, and so can run some of the basic statistical tests you often want to see in these papers (and can't). The other nice thing is that poop appears to be a much better indicator of human activity that many of the proxies used in the past, which could have been directly affected by climate (e.g. charcoal from fires, which could have been manmade or could have risen naturally as temperatures changed).

Here is the money plot comparing poop and temperature (their Figure 5):

While there are a couple things you can still complain about -- e.g. you probably want to see Panel C as a plot of the time-detrended data -- this to me is one of the more convincing relationships that has shown up in these Paleo papers. As in other studies looking at cold regions, they show that human activity responded strongly and positively to warmer temperatures: drops of ~4C caused total abandonment of (poop-related) human activity in the region.

While both the broader welfare effects and the modern implications of this and related studies are not immediately obvious (did people die or just migrate south? what do Iron Age societies' sensitivities to climate imply for modern societies?), the methodological differences between this and most of the past studies is to me a nice contribution. And hopefully the grad students who had to dig up the poop got a PNAS paper out of the deal...

Tuesday, December 11, 2012

The summer of 2013

Last week at AGU I gave a talk about the lessons of the US corn harvest in 2011 and 2012, both of which were below trend line (see figure). That got me thinking a little more about what to look for in 2013. The obvious point is that it is likely to be better than 2012, because it can’t get much worse. But that’s not too insightful, it’s like saying that Cal’s football team will be better next year, since they were so bad this year (By the way, welcome to Max Aufhammer, our newest blogger! With Wolfram’s move to Berkeley that brings our Cal contingent up to 3. I sure hope I don’t say anything to offend them.)

As we’ve talked about in other posts, the summer of 2012 might be considered the normal in a few decades, but not now. And some recent work from Justin Sheffield and colleagues in Nature argues that drought trends globally, and in North America, are not significantly positive if calculated properly (which contradicts some earlier work). We can leave aside for now the question of whether soil moisture trends are the best measure of drought exposure if one cares about corn yields (though a good topic for a future post), and simply say that conditions in 2012 were well below trend.

This means we’d expect next year to be closer to the trend, and that seems to be the overriding sentiment of markets. As Darrell Good over at farmdoc daily explains “In the past five decades, extreme drought conditions in the U.S., like those experienced in 2012, have been followed by generally favorable growing conditions and yields near trend values.”

But two things work against this tendency to revert to the mean. First, the drought still persists throughout much of the country, as seen at UNL’s drought monitor site. As Good goes on to say, “current dry soil moisture conditions in much of the U.S. and some recent forecasts that drought conditions could persist well into next year have raised concerns that such a rebound in yields may not occur in 2013.” In other words, if the Corn Belt does not get a wet winter and/or spring, expect prices to start climbing again.

Second, though, is that good initial moisture does not eliminate the chance of drought during the season. There’s an interesting piece by folks at the National Climate Data Center (NCDC) in the AGU newsletter I got today (it was actually published Nov. 20, but it takes about 3 weeks for me to get it!). They note that the 2012 was not like previous droughts in the 1930’s and 1950’s, or even 1988, in that it was very much driven by high temperatures rather than low starting moisture. As they say:

“For example, at the end of February in both 2011 and 2012 the national PDSI (calculated using the observed monthly mean temperature and precipitation averaged across the contiguous United States) was 1.2 (mildly wet) and –2.5 (moderate drought), respectively, compared to 1934 and 1954 of –5.7 and –4.6, respectively.”

This is also shown pretty effectively in an animation by climate central. So the high temperatures in recent years have made drought come on much more quickly than usual. As the NCDC piece says “By the end of September, every month since June 2011 had above normal average temperatures, a record that is unprecedented.” That's 16 straight months of above normal temperatures!

So my seat-of-the-pants guess is that next year’s yields are likely to still be below trend line (which would be at around 160 bushels/acre). Obviously lots of things could push it above trend line (including changing the definition of the trend!), and it’s way too early to have much confidence about how 2013 will end up. But following Sol’s lead on the Sandy damage prediction, I’ll go out on a limb (his mean was too low by a factor of two, but the true damage was within the confidence interval!). And to pair a risky bet with a safe one, I’ll also predict Stanford wins big at the Rose Bowl.

Pages