G-FEED: extreme heat

Showing posts with label extreme heat. Show all posts

Thursday, August 10, 2017

Climate change, crop failure, and suicides in India (Guest post by Tamma Carleton)

[This is a guest post by Tamma Carleton, a Doctoral Fellow at the Global Policy Lab and PhD candidate in the Ag and Resource Econ department here at Berkeley]

Last week, I published a paper in PNAS addressing a topic that has captured the attention of media and policymakers around the world for many years – the rising suicide rate in India. As a dedicated student of G-FEED contributors, I focus on the role of climate in this tragic phenomenon. I find that temperatures during India’s main growing season cause substantial increases in the suicide rate, amounting to around 65 additional deaths if all of India gained a degree day. I show that over 59,000 suicides can be attributed to warming trends across the country since 1980. With a range of different approaches I’ll talk about here, I argue that this effect appears to materialize through an agricultural channel in which crops are damaged, households face economic distress, and some cope by taking their own lives. It’s been a pretty disheartening subject to study for the last couple years, and I’m glad to see the findings out in the world, and now here on G-FEED.

First, a little background on suicides in India. The national suicide rate has approximately doubled since 1980, from around 6 per 100,000 to over 11 per 100,000 (for reference, the rate in the U.S. is about 13 per 100,000). The size of India’s population means this number encompasses many lives – today, about 135,000 are lost to suicide annually. There have been a lot of claims about what contributes to the upward trend, although most focus on increasing risks in agriculture, such as output price volatility, costly hybrid seeds, and crop-damaging climate events like drought and heat (e.g. here, here, and here). While many academic and non-academic sources have discussed the role of the climate, there was no quantitative evidence of a causal effect. I wanted to see if this relationship was in the data, and I wanted to be able to speak to the ongoing public debate by looking at mechanisms, a notoriously thorny aspect of the climate impacts literature.

The first finding in my paper is that while growing season temperatures increase the annual suicide rate, also hurting crops (as the G-FEED authors have shown us many times over), these same temperatures have no effect on suicides outside the growing season. While the results are much less certain for rainfall (I’m stuck with state-by-year suicide data throughout the analysis), a similar pattern emerges there, with higher growing season rainfall appearing to cause reductions in suicide:

These effects seem pretty large to me. As I said above, a degree day of warming throughout the country during the growing season causes about 65 suicides throughout the year, equivalent to a 3.5% increase in the suicide rate per standard deviation increase in growing season degree days.

The fact that the crop response functions are mirrored by the suicide response functions is consistent with an agricultural mechanism. However, this isn’t really enough evidence. Like in other areas of climate impacts research, it’s difficult to find exogenous variation that turns on or shuts off the hypothesized mechanism here – I don’t have an experiment where I randomly let some households’ farm income be unaffected by temperature, as others’ suffer. Therefore, aspects of life that are different in the growing and non-growing seasons could possibly be driving heterogeneous response functions between temperature and suicide. Because this mechanism is so important to policy, I turn to a couple additional tests.

I first show that there are substantial lagged effects, which are unlikely to occur if non-economic, direct links between the climate and suicidal behavior were taking place (like the psychological channels linking temperature to violence discussed in Sol and Marshall’s work). I also estimate spatial heterogeneity in both the suicide response to temperature, as well as the yield response, and find that the locations where suicides are most sensitive to growing season degree days also tend to be the locations where yields are most sensitive:

The fact that higher temperatures mean more suicides is troubling as we think about warming unfolding in the next few decades. However, I’m an economist, so I should expect populations to reallocate resources and re-optimize behaviors to adapt to a gradually warming climate, right? Sadly, after throwing at the data all of the main adaptation tests that I’m aware of from the literature, I find no evidence of adaptation, looking both across space (e.g. are places with hotter average climates less sensitive?) and across time (e.g. has the response function flattened over time? What about long differences?):

Keeping in mind that there is no evidence of adaptation, my last calculation is to estimate the total number of deaths that can be attributed to warming trends observed since 1980. Following the method in David, Wolfram and Justin Costa-Roberts’ 2011 article in Science, I find that by end of sample in 2013, over 4,000 suicides per year across the country can be attributed to warming. Integrating from 1980 to today and across all states in India, I estimate that over 59,000 deaths in total can be attributed to warming. With spatially heterogeneous warming trends and population density, these deaths are distributed very differently across space:

While the tools I use are not innovative by any means (thanks to the actual authors of this blog for developing most of them), I think this paper is valuable to our literature for a couple reasons. First, while we talk a lot about integrating our empirical estimates of the mortality effects of climate change into policy-relevant metrics like the SCC, this is a particular type of death I think we should be incredibly concerned about. Suicide indicates extreme distress and hardship, and since we care about welfare, these deaths mean something distinct from the majority of the deaths driving the mortality rate responses that we often study.

Second, mechanisms really matter. The media response to my paper has been shockingly strong, and everyone wants to talk about the mechanism and what it means for preventative policy. While I have by no means nailed the channel down perfectly here, a focus on testing the agricultural mechanism has made my findings much more tangible for people battling the suicide epidemic on the ground in India. I look forward to trying to find ways to improve the tools at our disposal for identifying mechanisms in this context and in others.

Finally, as climate change progresses, I think we could learn a lot from applying David, Wolfram, and Justin’s method more broadly. While attribution exercises have their own issues (e.g. we can’t, of course, attribute with certainty the entire temperature trend in any location to anthropogenic warming), I think it’s much easier for many people to engage with damages being felt today, as opposed to those likely to play out in the future.

Thursday, June 22, 2017

Drink more coffee and run faster! Get fit in only 9 seconds! A rant on NYTimes exercise coverage.

As a fairly half-assed exerciser, I am always on the lookout for quick/easy/delicious ways to get in shape without having to really do anything. Thus a lot of the exercise articles on NYTimes's "Well" blog are irresistible clickbait for me -- as they are for the presumably millions of other NYT-reading half-assed exercisers who ensure that these articles consistently top of the NYT most-read lists. Drink more caffeine and run faster! Take a hot bath and run faster! Get all the exercise you need in only seven minutes! Scratch that -- only four!! Scratch that -- only one!!!

image stolen from NYT article

Here's one from this week: Hot weather workout? Try a hot bath beforehand. Triple clickbait for me, as this week is really hot in California, I was hoping to get in some runs, and I certainly enjoy hot baths (+long walks on the beach, white wine, etc). Original study here (can't tell if that's paywalled), published in the Journal of Strength and Conditioning Research. This study takes n=9 (!) people, 8 dudes and 1 woman, and first has them run 5k on a treadmill in a lab where they cranked the temperature to 90F. Then all subjects underwent about a week of heat acclimation, where they pedaled a stationary bike in the hot lab for five 90-minute sessions. Then, finally, they ran the 5k again in the hot lab. Low and behold, times had improved! The now-heat acclimated participants shaved an average of about a minute and a half off their 5k times (~ 6% reduction) when running in hot conditions for the second time.

But wait a sec. You took some mildly fit runners (5k time of 25min is not exactly East African pace), had them put in almost 8 hours of training, and then tested to see whether they got faster?? As a lazy, mildly-fit runner myself, 8 hours of training constitutes a good month for me, so I would be PISSED if I didn't get faster with that much training. Now, you might say, they were pedaling a bike and not running, and this is indeed what the paper tries to argue in some hard-to-understand phrasing ("Cycling training controlled for performance that could arise from increased training volume were participants to acclimate through running"). But to me it seems pretty unlikely that a lot of biking is going to give you no running benefit at all, and some quick googling found about 1000 blog posts by runner/bikers that claim that it does (and also 1000 that claim that is doesn't, go figure). In any case, clearly this is not anywhere close to the optimal research design you'd want for figuring out the causal effect of training-when-hot on performing-when-hot. Strangely, the "hot bath" part does not appear in the paper at all, but just in an off-hand comment by a research quoted by the NYT. So that's weird too.

Or take the one-minute-of-exercise-is-all-you-need study (original study; NYT clickbait). This study takes n=25 "sedentary" men and divides them into three groups: a control group (n=6), a group that does moderate-intensity stationary-bike pedaling for 45min at a time (n=10), and a group that does high-intensity ("all out") stationary-bike pedaling for 3x 20 seconds (n=9). Only 60 total seconds! This happens three times a week for 12 weeks, after which researchers compare various measures of fitness, including how much oxygen you can take up at your peak (i.e. VO2-max). Both the moderate (VO2 = 3.2) and intense groups (VO2=3.0) had improved significantly upon the control group (VO2 = 2.5), but the post-training VO2max levels in the moderate and intense groups were not statistically different from each other. Hence the paper's exciting title: "Twelve Weeks of Sprint Interval Training Improves Indices of Cardiometabolic Health Similar to Traditional Endurance Training despite a Five-Fold Lower Exercise Volume and Time Commitment", and the NYT clickbait translation: "1 Minute of All-Out Exercise May Have Benefits of 45 Minutes of Moderate Exertion".

But wait a sec. Sample size is n=19 in these treatment groups combined! A quick calculation suggests that, at this sample size, the study is DRAMATICALLY underpowered to find an effect. That is, the study has a high chance of failing to find an effect when there actually is one. I calculate that to detect a significant difference in means of 0.2 between these two groups (on a control group standard deviation of 0.7, given in the table), the study would have needed 388 participants, or about 20x what they had! (This assumes the researchers would want an 80% chance of correctly rejecting the null that the two groups had the same mean; in Stata, if I did it right: power twomeans 3.2 3.0, sd(0.7)). Even reliably detecting a 20% increase in VO2 max between the two treated groups would have needed 46 participants, more than twice the number they had. Put more simply: with sample sizes this small, you are very unlikely to find significant differences between two groups, even when differences actually exist. So maybe the moderate exercise group actually did do better, or maybe they didn't, but either way this study can't tell us. The same thing appears to be true with the 4-min-exercise paper (n=26) -- it's way underpowered. And I haven't looked systematically, but my guess is this is true of a lot of the studies they cover that find no effect. Andrew Gelman is always grumping about studies with large effects, but we should probably be just as cautious believing small-N studies that find no effect.

So should the NYT stop covering these underpowered or poorly designed studies? There's not a lot at stake here I guess, so one reasonable reaction is, "who gives a sh*t"? But surely this sort of coverage crowds out coverage of other higher-quality science, such as why your roller-bag wheels wobble so much when you're running to catch a flight, or why eggs are egg-shaped. And cutting down on this crappy exercise coverage will save me the roughly 20min/week of self-loathing I feel when I click on yet another too-good-to-be-true NYT exercise article, look up the article it actually references, and find my self not very convinced. Twenty minutes I could have spent exercising!! That's like 20 1-minute workouts...

Tuesday, August 18, 2015

Daily or monthly weather data?

We’ve had a few really hot days here in California. It won’t surprise readers of this blog to know the heat has made Marshall unusually violent and Sol unusually unproductive. They practice what they preach. Apart from that, it’s gotten me thinking back to a common issue in our line of work - getting “good” measures of heat exposure. It’s become quite popular to be as precise as possible in doing this – using daily or even hourly measures of temperature to construct things like ‘extreme degree days’ or ‘killing degree days’ (I don’t really like the latter term, but that’s beside the point for now).

I’m all for precision when it is possible, but the reality is that in many parts of the world we still don’t have good daily measures of temperature, at least not for many locations. But in many cases there are more reliable measures of monthly than daily temperatures. For example, the CRU has gridded time series of monthly average max and min temperature at 0.5 degree resolution.

It seems a common view is that you can’t expect to do too well with these “coarse” temporal aggregates. But I’m going to go out on a limb and say that sometimes you can. Or at least I think the difference has been overblown, probably because many of the comparisons between monthly and daily weather show the latter working much better. But I think it’s overlooked that most comparisons of regressions using monthly and daily measures of heat have not been a fair fight.

What do I mean? On the one hand, you typically have the daily or hourly measures of heat, such as extreme degree days (EDD) or temperature exposure in individual bins of temperature. Then they enter into some fancy pants model that fits a spline or some other flexible function that capture all sorts of nonlinearities and asymmetries. Then on the other hand, for comparison you have a model with a quadratic response to growing season average temperature. I’m not trying to belittle the fancy approaches (I bin just as much as the next guy), but we should at least give the monthly data a fighting chance. We often restrict it to growing season rather than monthly averages, often using average daily temperatures rather than average maximums and minimums, and, most importantly, we often impose symmetry by using a quadratic. Maybe this is just out of habit, or maybe it’s the soft bigotry of low expectations for those poor monthly data.

As an example, suppose, as we’ve discussed in various other posts, that the best predictor of corn yields in the U.S. is exposure to very high temperatures during July. In particular, suppose that degree days above 30°C (EDD) is the best. Below I show the correlation of this daily measure for a site in Iowa with various growing season and monthly averages. You can see that average season temperature isn’t so good, but July average is a bit better, and July average daily maximum even better. In other words, if a month has a lot of really hot days, then that month's average daily maximum is likely to be pretty high.

You can also see that the relationship isn’t exactly linear. So a model with yields vs. any of these monthly or growing season averages likely wouldn’t do as well as EDD if the monthly data entered in as a linear or quadratic response. But as I described in an old post that I’m pretty sure no one has ever read, one can instead define simple assymetric hinge functions based on monthly temperature and rainfall. In the case of U.S. corn, I suggested these three based on a model fit to simulated data:

This is now what I’d consider more of a fair fight between daily and monthly data. The table below is from what I posted before. It compares the out-of-sample skill of a model using two daily-based measures (GDD and EDD), to a model using the three monthly-based hinge functions above. Both models include county fixed effects and quadratic time trends. In this particular case, the monthly model (3) even works slightly better than the daily model (2). I suspect the fact it’s even better relates less to temperature terms than to the fact that model (2) uses a quadratic in growing season rainfall, which is probably less appropriate than the more assymetric hinge function – which says yields respond up to 450mm of rain and are flat afterwards.

Model	Calibration R²	Average root mean square error for calibration	Average root mean square error for out-of-sample data (for 500 runs)	% reduction in out-of-sample error
1	0.59	0.270	.285	--
2	0.66	0.241	.259	8.9
3*	0.68	0.235	.254	10.7

Overall, the point is that monthly data may not be so much worse than daily for many applications. I’m sure we can find some examples where it is, but in many important examples it won’t be. I think this is good news given how often we can’t get good daily data. Of course, there’s a chance the heat is making me crazy and I’m wrong about all this. Hopefully at least I've provoked the others to post some counter-examples. There's nothing like a good old fashioned conflict on a hot day.

Monday, February 16, 2015

Siestas and climate change

Lately I’ve been thinking about siestas. And not just because of the extremely warm summer – I mean February – we are having in California. Taking an afternoon break is probably one of the oldest and most widespread adaptations humans have in the face of hot weather. Better to work hard in the morning and late afternoon when the heat is not too bad, and take a break when you would otherwise be unproductive. For example, I was never so willing to wake up at 4am as when I was doing field work in Sonora, Mexico in July – it meant I could get most of the day’s work done by lunch and not face the afternoon sun and heat. I can only guess that a similar strategy explains Max’s daily naps in his office, though Berkeley isn’t typically all that hot.

Although humans have widely adopted this practice, we don’t typically think of it as a strategy for how we grow crops. But various studies seem to suggest that the best way to adapt to extreme heat may be to have crops take a break. This can be done on two different time scales. In a day, we can select for crops that slow down growth during hours that are the hottest, and have the highest vapor pressure deficits. This avoids using water at the times when the efficiency of using water would be lowest. A nice recent review by Vincent Vadez and colleagues discusses some of the ways crop scientists are selecting for this trait. As I’ve mentioned in past posts, it’s a strategy we are trying to evaluate as a possible adaptation to climate change. It probably helps a little, but it’s not clear how much.

At longer time scales, it’s possible to just scrap the idea of growing a crop during the peak of the summer, and instead try to fit in two crops that straddle this period. This is the idea behind a recent paper we have led by my student, Chris Seifert, on double cropping trends and potentials in the US. Part of the challenge of double cropping is that it pushes the late summer crop (typically soybean) into greater risks of frost damage at the end of the season. But with warming temperatures, crops develop more quickly and the frost dates recede. This idea has been around for a while, but the idea of the paper was to simulate how much more viable the practice is actually becoming and will continue to become.

The basic approach was to develop a realistic model of phenology for a wheat-soy double crop, and see how often the two crops can be squeezed in between the last frost of spring and the first frost of fall. Below is a summary of the projected suitability (% years of survival) for current and future conditions under two emission scenarios (top is low emissions, RCP4.5, and bottom is high emission, RCP8.5). The current simulated limits match pretty well where existence of double cropping extends in reality (roughly to the south of Illinois). But by mid-century the area suitable has shifted at least a full state northward, and by 2100 is roughly double or triple the current suitable area depending on emission. It’s also slightly interesting that really warm scenarios actually make this particular system less suitable in the South, because wheat can’t vernalize properly.

The other thing the paper looks at is whether warming has already encouraged a spread of double cropping. The data to look at double-cropping are pretty coarse in spatial resolution and short in time, but the trends do seem consistent with the simulated increase in suitability. For example, the figure below shows the fraction of reported double-cropping (red line) in states that historically have had frost limitations vs. the simulated suitability in those states for three different thresholds of survival (survival = soy maturing before first frost).

Just to be clear, this isn’t a definite gain relative to sticking with a single summer crop of corn or soybean. That would depend on prices and yields, both of which will also depend on what climate is doing. And even if it’s a gain, it could well still be worse than what could be achieved in the current climate with a single crop (for reasons we get into in the paper). Just like taking a siesta may make you more productive than if you didn’t, but not more productive than you’d be without the hot weather. But if nothing else it is an intriguing option that could make sense in a lot more places than it used to. And for many places it falls in the category of things that wouldn’t make much sense without climate change, because of the frost constraint, which I think is a key criterion for a truly effective adaptation.

Overall, maybe the lesson here is the corollary of what Marshall blogged about a few months ago. If we are learning that people’s response to heat is empirically not that different from crops, maybe it follows that strategies people have adopted to deal with heat can give us an idea of what might work well with crops. And maybe with all the heat lately – at least in the Western U.S. – people should start taking more siestas too. Max must just be testing out his adaptation strategy.

Saturday, January 10, 2015

Searching for critical thresholds in temperature effects: some R code

If google scholar is any guide, my 2009 paper with Wolfram Schlenker on the nonlinear effects of temperature on crop outcomes has had more impact than anything else I've been involved with.

A funny thing about that paper: Many reference it, and often claim that they are using techniques that follow that paper. But in the end, as far as I can tell, very few seem to actually have read through the finer details of that paper or try to implement the techniques in other settings. Granted, people have done similar things that seem inspired by that paper, but not quite the same. Either our explication was too ambiguous or people don't have the patience to fully carry out the technique, so they take shortcuts. Here I'm going to try to make it easier for folks to do the real thing.

So, how does one go about estimating the relationship plotted in the graph above?

Here's the essential idea: averaging temperatures over time or space can dilute or obscure the effect of extremes. Still, we need to aggregate, because outcomes are not measured continuously over time and space. In agriculture, we have annual yields at the county or larger geographic level. So, there are two essential pieces: (1) estimating the full distribution of temperatures of exposure (crops, people, or whatever) and (2) fitting a curve through the whole distribution.

The first step involves constructing the distribution of weather. This was most of the hard work in that paper, but it has since become easier, in part because finely gridded daily weather is available (see PRISM) and in part because Wolfram has made some STATA code available. Here I'm going to supplement Wolfram's code with a little bit of R code. Maybe the other G-FEEDers can chime in and explain how to do this stuff more easily.

First step: find some daily, gridded weather data. The finer scale the better. But keep in mind that data errors can cause serious attenuation bias. For the lower 48 since 1981, the PRISM data above is very good. Otherwise, you might have to do your own interpolation between weather stations. If you do this, you'll want to take some care in dealing with moving weather stations, elevation and microclimatic variations. Even better, cross-validate interpolation techniques by leaving one weather station out at a time and seeing how well the method works. Knowing the size of the measurement error can also help correcting bias. Almost no one does this, probably because it's very time consuming... Again, be careful, as measurement error in weather data creates very serious problems (see here and here).

Second step: estimate the distribution of temperatures over time and space from the gridded daily weather. There are a few ways of doing this. We've typically fit a sine curve between the minimum and maximum temperatures to approximate the time at each degree in each day in each grid, and then aggregate over grids in a county and over all days in the growing season. Here are a couple R functions to help you do this:

# This function estimates time (in days) when temperature is
# between t0 and t1 using sine curve interpolation.  tMin and
# tMax are vectors of day minimum and maximum temperatures over
# range of interest.  The sum of time in the interval is returned.
# noGrids is number of grids in area aggregated, each of which 
# should have exactly the same number of days in tMin and tMax
 
days.in.range <- function( t0, t1 , tMin, tMax, noGrids )  {
  n <-  length(tMin)
  t0 <-  rep(t0, n)
  t1 <-  rep(t1, n)
  t0[t0 < tMin] <-  tMin[t0 < tMin]
  t1[t1 > tMax] <-  tMax[t1 > tMax]
  u <- function(z, ind) (z[ind] - tMin[ind])/(tMax[ind] - tMin[ind])  
  outside <-  t0 > tMax | t1 < tMin
  inside <-  !outside
  time.at.range <- ( 2/pi )*( asin(u(t1,inside)) - asin(u(t0,inside)) ) 
  return( sum(time.at.range)/noGrids ) 
}

# This function calculates all 1-degree temperature intervals for 
# a given row (fips-year combination).  Note that nested objects
# must be defined in the outer environment.
aFipsYear <- function(z){
  afips    = Trows$fips[z]
  ayear    = Trows$year[z]
  tempDat  = w[ w$fips == afips & w$year==ayear, ]
  Tvect = c()
  for ( k in 1:nT ) Tvect[k] = days.in.range(
              t0   = T[k]-0.5, 
              t1   = T[k]+0.5, 
              tMin = tempDat$tMin, 
              tMax = tempDat$tMax,
              noGrids = length( unique(tempDat$gridNumber) )
              )
  Tvect
}

The first function estimates time in a temperature interval using the sine curve method. The second function calls the first function, looping through a bunch of 1-degree temperature intervals, defined outside the function. A nice thing about R is that you can be sloppy and write functions like this that use objects defined outside of the environment. A nice thing about writing the function this way is that it's amenable to easy parallel processing (look up 'foreach' and 'doParallel' packages).

Here are the objects defined outside the second function:

w       # weather data that includes a "fips" county ID, "gridNumber", "tMin" and "tMax".
        #   rows of w span all days, fips, years and grids being aggregated
 
tempDat #  pulls the particular fips/year of w being aggregated.
Trows   # = expand.grid( fips.index, year.index ), rows span the aggregated data set
T       # a vector of integer temperatures.  I'm approximating the distribution with 
        #   the time in each degree in the index T

To build a dataset call the second function above for each fips-year in Trows and rbind the results.

Third step: To estimate a smooth function through the whole distribution of temperatures, you simply need to choose your functional form, linearize it, and then cross-multiply the design matrix with the temperature distribution. For example, suppose you want to fit a cubic polynomial and your temperature bins that run from from 0 to 45 C. The design matrix would be:

D = [ 0 0 0
1 1 1
2 4 8
...
45 2025 91125]

These days, you might want to do something fancier than a basic polynomial, say a spline. It's up to you. I really like restricted cubic splines, although they can over smooth around sharp kinks, which we may have in this case. We have found piecewise linear works best for predicting out of sample (hence all of our references to degree days). If you want something really flexible, just make D and identity matrix, which effectively becomes a dummy variable for each temperature bin (the step function in the figure). Whatever you choose, you will have a (T x K) design matrix, with K being the number of parameters in your functional form and T=46 (in this case) temperature bins.

To get your covariates for your regression, simply cross multiply D by your frequency distribution. Here's a simple example with restricted cubic splines:

library(Hmisc)
DMat <- rcspline.eval(0:45)
XMat <- as.matrix(TemperatureData[,3:48])%*%DMat
fit <- lm(yield~XMat, data=regData)
summary(fit)

Note that regData has the crop outcomes. Also note that we generally include other covariates, like total precipitation during the season, county fixed effects, time trends, etc. All of that is pretty standard. I'm leaving that out to focus on the nonlinear temperature bit.

Anyway, I think this is a cool and fairly simple technique, even if some of the data management can be cumbersome. I hope more people use it instead of just fitting to shares of days with each maximum or mean temperature, which is what most people following our work tend to do.

In the end, all of this detail probably doesn't make a huge difference for predictions. But it can make estimates more precise, and confidence intervals stronger. And I think that precision also helps in pinning down mechanisms. For example, I think this precision helped us to figure out that VPD and associated drought was a key factor underlying observed effects of extreme heat.

Thursday, January 2, 2014

Massetti et al. - Part 3 of 3: Comparison of Degree Day Measures

Yesterday's blog entry outlined the differences between Massetti et al. derivation of degree days and our own. To quickly recap: Our measure show much less variation within a county over the years, i.e., the standard deviation of fluctuations around the mean outcome in a county are about a third of theirs. One possibility is that our measure over-smoothes the year-to-year fluctuations, or alternatively, that Massetti et al.'s fluctuations might include measurement error, which would result in attenuation bias (paper).

Below are tests comparing various degree day measures in a panel of log corn and soybean yields. It seems preferable to test the predictive power in a panel setting as one does not have to worry about omitted variable bias (As mentioned before, Massetti et al. did not share their data with us and we hence can't match the same controls in a cross-sectional regression of farmland values). We use the optimal degree days bounds from earlier literature.

The following two tables regress log corn and soybean yields, respectively, for all counties east of the 100 degree meridian (except Florida) in 1979-2011 on four weather variables, state-specific restricted cubic splines with 3 knots, and county fixed effects. Column definitions are the same as in yesterday's post: Columns (1a)-(3b) use the NARR data to derive degree dats, while column (4b) uses our 2008 procedure. Columns (a) use the approach of Massetti et al. and derive the climate in a county as the inverse-distance weighted average of the four NARR grids surrounding a county centroid. Columns (b) calculate degree days for each 2.5x2.5mile PRISM grid within a county (squared inverse-distance weighted average of all NARR grids over the US) and derives the county aggregate as the weighted average of all grids where the weight is proportional to the cropland area in a county.

Columns (0a)-(0b) are added as baseline using a quadratic in growing season average temperature. Columns (1a)-(1b) follow Massetti et al. and first derive average daily temperatures and degree days using daily averages, i.e., degree days are only positive if the daily average exceeds the threshold. Columns (2a)-(2b) calculate degree days for each 3-hour reading. Degree days will be positive if part of the temperature distribution is above the threshold, but not the daily average. Columns (3a)-(3b) approximate the temperature distribution within a day by linearly interpolating between the 3-hour measures. Column (4b) uses a sinusoidal approximation between the daily minimum and maximum to approximate the temperature distribution within a day.

Explaining log corn yields 1979-2011.

Explaining log soybean yields 1979-2011.

The R-square is lowest for regressions using a quadratic in average temperature (0.37 for corn and 0.33 for soybeans). It is slightly higher when we use degree days based on the NARR data set in columns (1a)-(3b), ranging from 0.39-0.41 for corn and 0.35-0.36 for soybeans. It is much higher when our degree days measure is used in columns (4b): 0.51 for corn and 0.48 for soybeans.

The second row in the footer lists the percent reduction in root mean squared error (RMSE) compared to a model with no weather controls (just county fixed effects and state-specific time trends). Weather variables that add nothing would have 0%, while weather measures that explain all remaining variation would reduce the RMSE by 100%. Column (4b) reduces the RMSE by twice as much as measures derived from NARR. Massetti et al.'s claim that they introduce "accurate measures of degree days" seems very odd given that their measure performs half as well as previously published measures that we shared with them.

The NARR data set likely includes more measurement error than our previous data set. Papers making comparisons between degree days and average temperature should use the best available degree days construction in order not to bias the test against the degree days model.

Correction (January 30th): An earlier version had a mistake in the code by calculating the RMSE both in and out-of-sample. The corrected version only calculates the RMSE out-of-sample. While the reduction in RMSE increased for all columns, the relative comparison between models is not impacted.

Wednesday, January 1, 2014

Massetti et al. - Part 2 of 3: Calculation of Degree Days

Following up on yesterday's post, let's look at the differences in how to calculate degree days. Recall that degree days just count the number of degrees above a threshold and sum them over the growing season. Massetti et al. argue in their abstract that "The paper shows that [...] hypotheses of the degree day literature fail when accurate measures of degree days are used." This claim is attributed to the fact that Massetti et al. supposedly use better data and hence get more accurate readings of degree days, however, no empirical evidence is provided. They use data from the North American Regional Reanalysis (NARR) that provides temperatures at 3-hour intervals. The authors proceed to first calculate average temperatures for each day from the eight readings per day, and then calculate degree days as the difference of the average temperature to the threshold.

Before we compare their method to calculating degree days to ours, a few words on the NARR data. Reanalysis data combine observational data with differential equations from physical models to interpolate data. For example, they utilize mass and energy balance, i.e., a certain amount of moisture can only fall once at precipitation. If precipitation comes down in one grid, it can't also come down in a neighboring grid. On the plus side, the physical models construct an entire series of data (solar radiation, dew point, fluxes, etc) that normal weather stations do not measure. On the downside, the imposed differential equations that relate all weather measures imply that interpolated data do not always match actual observations.

So how do the degree days in Massetti et al. compare to ours? Here's a little detour on degree days - this is a bit technical and dry, so please be patient. The first statistical study my coauthors and I published using degree days in 2006 used monthly temperature data since we did not have daily temperature data at the time. Since degree days depend how many times a temperature threshold is passed, monthly averages can be a challenge as a temporal average will hide how many times a threshold is passed. The literature has gotten around this problem by estimating an empirical link between the standard deviation in daily and monthly temperatures, called Thom's formula. We used this formula was used to derive fluctuations in average daily temperatures to derive degree days.

The interpolation of the temperature distribution when only knowing monthly averages is certainly not ideal, and we hence went through great length to better approximate the temperature distribution. All of my subsequent work with various coauthors hence not only looked at the distribution of daily average temperatures within a month, but went one step further by looking at the temperature distribution within a day. The rational is that even if average daily temperatures do not cross a threshold, the daily maximum might. We interpolated daily maximum and minimum temperature, and fit a sinusoidal curve between the two to approximate the distribution within a day (See Snyder). This is again an interpolation and might have its own pitfalls, but one can empirically test whether it improves predictive power, which we did and will do for part 3 of this series.

Here is my beef with Massetti et al: Our subsequent work in 2008 showed that calculating degree days using the within-day distribution of temperatures is much better. We even emphasize that in a panel setting average temperatures perform better than degree days derived using Thom's formula (but not in the cross-section as the Thom's approximation works much better at getting average number of degree days correct than year-to-year fluctuations around the mean). What I find disingenuous in the Massetti et al. is that it makes a general statement about comparing degree days to average temperature, yet only discusses the inferior approach for calculating degree days using Thom's formula. What makes things worse is that we shared our "better" degree days data that uses the within day distribution with them (which they acknowledge).

Unfortunately, Massetti et al. decided not to share their data with us, so the analysis below uses our construction of their variables. We downloaded surface temperature from NARR. The reanalysis data provides temperature readings at several altitude levels above ground, and in general, the higher the reading above the ground, the lower temperatures, which will result in lower degree day numbers.

The following table constructs degree days for counties east of the 100 degree meridian in various ways. Columns (1a)-(3b) use the NARR data, while column (4b) uses our 2008 procedure. Columns (a) use the approach of Massetti et al. and derive the climate in a county as the inverse-distance weighted average of the four NARR grids surrounding a county centroid. Columns (b) calculate degree days for each 2.5x2.5mile PRISM grid within a county (squared inverse-distance weighted average of all NARR grids over the US) and derives the county aggregate as the weighted average of all grids where the weight is proportional to the cropland area in a county. Results don't differ much between (a) and (b).

Columns (1a)-(1b) follow Massetti et al. and first derive average daily temperatures and degree days using daily averages, i.e., degree days are only positive if the daily average exceeds the threshold. Columns (2a)-(2b) calculate degree days for each 3-hour reading. Degree days will be positive if part of the temperature distribution is above the threshold, but not the daily average. Columns (3a)-(3b) approximate the temperature distribution within a day by linearly interpolating between the 3-hour measures. Column (4b) uses a sinusoidal approximation between the daily minimum and maximum to approximate the temperature distribution within a day.

Average temperature and average season-total degree days 8-32C in 1979-2011 are fairly consistent between all columns. We give the mean outcome in a county as well as two standard derivations: the between standard deviation (in round brackets) is the standard deviation in the average outcome between counties, while the within standard deviation [in square brackets] is the average standard deviation of the year-to-year fluctuations around a county mean. The between standard deviation is fairly consistent across columns, but the within-county standard deviation is much lower for our interpolation in column (4b).

As a result of the lower within-county variation, fluctuations are lower and hence the threshold is passed less often in column (4b). Extreme heat as measured by degree days above 29C or 34C are hence lower when the within-day distribution is use din column (4b) compared to columns (2a)-(3b). There are two possible interpretation: either our data is over-smoothing and hence under-predicting the variance, or NARR has measurement error which will lead to attenuation bias. We will test both possible theories in part 3 tomorrow.

Tuesday, December 31, 2013

Massetti et al. - Part 1 of 3: Convergence in the Effect of Warming on US Agriuclture

Emanuele Massetti has posted a new paper (joined with Robert Mendelsohn and Shun Chonabayashi) that takes another look at the best climate predictor of farmland prices in the United States. He'll present it at the ASSA meetings in Philadelphia - I have seen him present the paper at the 2013 NBER spring EEE meeting and at the 2013 AERE conference, and wanted to provide a few discussion points for people interested in the material.

A short background: several articles of contributors to this blog have found that temperature extremes are crucial at predicting agricultural output. To name a few: Maximilian Auffhammer and coauthors have shown that rice have opposite sensitivities to minimum and maximum temperature, and this relationship can differ over the growing season (paper). David Lobell and coauthors found that there is a highly nonlinear relationship between corn yields and temperature using data from field trials in Africa (paper), which is comparable to what Michael Roberts and I have found in the United States (paper). The same relationship was observed by Marshal Burke and Kyle Emerick when looking at yield trends and climate trends over the last three decades (paper).

Massetti et al. argue that average temperature are a better predictor of farmland values than nonlinear transformations like degree days. They exclusively rely on cross-sectional regressions (in contrast to the aforementioned panel regressions), re-examining earlier work Michael Hanemann, Tony Fisher and I have done where we found that degree days are better and more robust predictors of farmland values than average temperature (paper).

Before looking into the differences between the studies, it might be worthwhile to emphasize an important convergence in the sign and magnitude of predicted effect of a rise in temperature on US agriculture. There has been an active debate whether a warmer climate would be beneficial or detrimental. My coauthors and I have usually been on the more pessimistic side, i.e., arguing that warming would be harmful. For example, a +2C and +4C increase, respectively, predicted a 10.5% and 31.6 percent decrease in farmland values in the cross-section of farmland values (short-term B1 and long-term B2 scenarios in Table 5) and a 14.9 and 35.3 percent decrease in corn yields in the panel regression (Appendix Table A5).

Robert Mendelsohn and various coauthors have consistently found the opposite, and the effects have gotten progressively more positive over time. For example, their initial innovative AER paper that pioneered the cross-sectional approach in 1994 argued that "[...] our projections suggest that global warming may be slightly beneficial to American agriculture." Their 1999 book added climate variation as an additional control and argued that "Including climate variation suggests that small amount of warming are beneficial," even in the cropland model. A follow-up paper in 2003 further controls for irrigation and finds that "The beneficial effect of warmer temperatures increases slightly when water availability is included in the model."

There latest paper finds results that are consistent with our earlier findings, i.e., a +2C warming predicts decreases in farmland values of 20-27 percent (bottom of Table 1), while a +4C warming decreases farmland values by 39-49 percent. These numbers are even more negative than our earlier findings and rather unaffected whether average temperatures or degree days are used in the model. While the authors go on to argue that average temperatures are better than degree days (more on this in future posts), it does change the predicted negative effect of warming: it is harmful.

Sunday, December 1, 2013

It's not the model. Really it isn't

There is a most lively discussion as to whether climate change will have significant negative impacts on US agriculture. There are a number of papers by my co-bloggers (I am not worthy!) showing that extreme heat days will have significant negative impacts on yields for all major crops except for rice. I will talk about rice another day. For the main crop growing regions in the US, climate models project a significant increase in these extreme heat days. This will likely, short of miraculous adaptation, lead to significant yield losses. To put it simply, this part of the literature has shown a sensitivity of yields to extreme temperatures and linked it with projected increases in these extreme temperature events.

On the other hand, there are a number of papers, which argue that climate change will have no significant impacts on US agriculture. Seo, in a recent issue of Climatic Change, essentially argues that the literature projecting big impacts confuses weather ("panel models") and climate ("cross sectional models") and that using weather instead of climate as a source of identification leads to big impacts. As Wolfram Schlenker and I note in a comment this is simply not true for five reasons:

1) Even the very limited number of papers he cites, which use weather as the source of variation to identify a sensitivity, clearly state what this means when interpreting the resulting coefficients. There is no confusion here.

2) He fails to discuss the fact that the bias from adaptation when using weather as a source of variation could go in either direction.

3) It is simply not true that all panel models find big impacts and all Ricardian cross sectional models find small impacts. There are big and small impacts to be found in both camps.

4) There is recent work by Burke and Emerick, which uses the fixed effects identification strategy with climate on the right hand side! I wish I would have thought of that. They can compare their "long differences" (a.k.a. climate) sensitivity results to more traditional weather sensitivity results and find no significant difference between the two. This will either enrage both camps or make them very happy, since it suggests that the difference between sources of variation (weather versus climate) in this setting is not huge.

5) The big differences in studies may finally not be due to differences in sensitivities, but differences in the climate model used. Burke et al. point out that uncertainty over future climate is a major driver of variation in impacts. We refer the reader to this excellent study, which discusses a much broader universe of studies and very carefully discusses the sources of uncertainty in impacts estimates.

We are of the humble opinion, that the most carefully done studies using both identification strategies yield similar estimates for the Eastern United States.

Pages