G-FEED: statistics

Showing posts with label statistics. Show all posts

Monday, August 31, 2020

Seminar talk on Spatial First Differences

Hannah Druckenmiller and I wrote a paper on a new method we developed that allows you to do cross-sectional regressions while eliminating a lot (or all) omitted variables bias. We are really excited about it and it seems to be performing well in a bunch of cases.

Since people seem to like watching TV more than reading math, here's a recent talk I gave explaining the method to the All New Zealand Economics Series. (Since we don't want to discriminate against the math-reading-folks-who-don't-want-to-read-the-whole-paper, we'll also post a blog post explaining the basics...)

If you want to try it out, code is here.

Monday, September 26, 2016

Errors drive conclusions in World Bank post on ivory trade and elephant poaching: Response to Do, Levchenko, & Ma

(Spoiler alert: if you want to check your own skills in sniffing out statistical errors, skip the summary.)

Summary:

In their post on the World Bank blog, Dr. Quy-Toan Do, Dr. Andrei Levchenko, and Dr. Lin Ma (DLM for short) make three claims that they suggest undermine our finding that the 2008 one-time sale of ivory corresponded with a discontinuous 66% increase in elephant poaching globally. First, they claim that the discontinuity vanishes for sites reporting a large number of carcasses—i.e., there was only a sudden increase in poaching in 2008 at sites reporting a small number of total carcasses. Second, they claim that price data for ivory (which they do not make public) from TRAFFIC do not show the increase in 2008 that they argue would be expected if the one-time sale had affected ivory markets. Third, they claim that re-assigning a seemingly small proportion of illegal carcasses counted in 2008 to be legal carcasses makes the finding of a discontinuity in 2008 less than statistically significant, and they speculate that a MIKE (CITES) initiative to improve carcass classification may explain the discontinuous increase in poaching in sites with small numbers of carcasses.

In this post, we systematically demonstrate that none of these concerns are valid. First, as it turns out, the discontinuity does hold for sites reporting a large number of carcasses, so long as one does not commit a coding error that causes a systematic omission of data where poaching was lower before the one-time sale, as did DLM. Furthermore, we show that DLM misreported methods and excluded results that contradicted their narrative, and that they made other smaller coding errors in this part of their analysis. Second, we note that, notwithstanding various concerns we have about the ivory price data, our original paper had already derived why an increase in poaching due to the one-time sale would not have a predictable (and possibly no) effect on black market ivory prices. Finally, we note that (i) DLM provide no evidence that training on carcass identification could have led to a discontinuous change in PIKE (in fact, the provided evidence contradicts that hypothesis), and that (ii) in the contrived reclassification exercise modeled by DLM, the likelihood of surveyors making the seven simultaneous errors DLM state is very likely is in fact, under generous assumptions, actually less than 0.35%— i.e. extraordinarily unlikely.

Overall, while DLM motivated an interesting exercise in which we show that our result is robust to classification of sites based on the number of carcasses found, they provided no valid critiques to our analytical approach or results. The central conclusion, that our results should be dismissed, was the result of a sequence of coding, inferential, and logical errors.

Evidence Should be Used in Global Management of Endangered Species: reply to the CITES Technical Advisory Group

In anticipation of the 17th meeting of the Conference of the Parties of the Convention on the International Trade in Endangered Species (CITES), the CITES Technical Advisory Group (TAG) on elephant poaching and ivory smuggling released a document (numbered CoP17 Inf. 42) in which they directly addressed a working paper released by Nitin Sekar and myself.

The document did not introduce new substantive arguments (see end of post for one exception regarding dates)—in fact, the document’s claims were nearly identical to those made in Fiona Underwood’s prior blog posts about our analysis (here and here). In her second post, Dr. Underwood (one of a handful of members of the TAG) openly states that the TAG asked her to evaluate our analysis, and it seems they have simply adopted her view as the official stance. These arguments provide the basis for the TAG's conclusion:

13) The claims in the working paper by Hsiang and Sekar are fundamentally flawed, both in logic and methodology. The MIKE and ETIS TAG is therefore of the view that the study should not be used to inform CITES policy on elephants.

Given that the arguments have not changed, we have previously responded to almost all the document’s claims in our two prior blogposts, here and here. Below we summarize the relevant points and address new points.

As shown below,

numerous statements about our analysis (either in the manuscript or follow-up analysis in these prior posts) made by the CITES white paper are factually incorrect;
in supporting its narrative, the white paper misreports results and conclusions of other studies;
the white paper makes erroneous statistical arguments;
when the approach advocated for in the white paper is correctly implemented, it provides results virtually identical to the original findings in our analysis.

Thus, overall, there is no evidence to support the claims in the CITES white paper and therefore no reason for CITES to refuse to consider these results when developing policy.

Normality, Aggregation & Weighting: Response to Underwood’s second critique of our elephant poaching analysis

Nitin Sekar and I recently released a paper where we tried to understand if a 2008 legal ivory sale coordinated by CITES and designed to undercut black markets for ivory increased or decreased elephant poaching. Our results suggested the sale abruptly increased poaching:

Dr. Fiona Underwood, a consultant hired by CITES and serving on the Technical Advisory Group of CITES' MIKE and ETIS programs, had previously analyzed the same data in an earlier paper with coauthors and arrived at different conclusions. Dr. Underwood posted criticisms of our analysis on her blog. We replied to every point raised by Dr. Underwood and posted our replication code (including an Excel replication) here. We also demonstrated that when we analyzed the data published alongside Dr. Underwood's 2011 study, in an effort to reconcile our findings, we essentially recovered our main result that the sale appeared to increase poaching rates. Dr. Underwood has responded to our post with a second post. This post responds to each point raised in Dr. Underwood's most recent critique.

Dr. Underwood's latest post and associated PDF file makes three arguments/criticisms:

1) Dr. Underwood repeats the criticism that our analysis is inappropriate for poaching data because it assumes normality in residuals. Dr. Underwood's intuition is that the data do not have this structure, so we should instead rely on generalized linear models she proposes that assume alternative error structures (as she did in her 2011 PlosOne article). Dr. Underwood's preferred approach assumes the number of elephants that die at each site in each year is deterministic, and the fraction that are poached at each site-year (after the total number marked to die is determined) is determined by a weighted coin toss where PIKE reflects the weighting (we discuss our concerns with this model in our last post).

2) Dr. Underwood additionally and specifically advocates for the evaluation of Aggregate PIKE to deal with variation in surveillance and elephant populations (as has been done in many previous reports) rather than average PIKE, as we do. Dr. Underwood argues that this measure better accounts for variation in natural elephant mortality.

3) To account for variation in the number of elephants discovered, Dr. Underwood indicates that we should be running a weighted regression where the weight of each site-year is determined by total mortality, equal to the sum of natural and illegal carcasses discovered in each site-year.

We respond to each of these points individually below. We then point out that these three criticisms are themselves not consistent with one another. Finally, we note what this debate demonstrates the importance of having multiple approaches to analyzing a data set as critical as PIKE.

A decade of advances in climate econometrics

I have a new working paper out reviewing various methods, models, and assumptions used in the econometrics literature to quantify the impact of climatic conditions on society. My colleagues here on G-FEED have all made huge contributions, in terms of methodological innovations, and it was really interesting and satisfying to try and tie them all together. The process of writing this was much more challenging than I expected, but rereading it makes me feel like we as a community really learned a lot during the last decade of research. Here's the abstract:

Climate Econometrics (forthcoming in the Annual Reviews)

Abstract: Identifying the effect of climate on societies is central to understanding historical economic development, designing modern policies that react to climatic events, and managing future global climate change. Here, I review, synthesize, and interpret recent advances in methods used to measure effects of climate on social and economic outcomes. Because weather variation plays a large role in recent progress, I formalize the relationship between climate and weather from an econometric perspective and discuss their use as identifying variation, highlighting tradeoffs between key assumptions in different research designs and deriving conditions when weather variation exactly identifies the effects of climate. I then describe advances in recent years, such as parameterization of climate variables from a social perspective, nonlinear models with spatial and temporal displacement, characterizing uncertainty, measurement of adaptation, cross-study comparison, and use of empirical estimates to project the impact of future climate change. I conclude by discussing remaining methodological challenges.

And here are some highlights:

1. There is a fairly precise tradeoff in assumptions that are implied when using cross-sectional vs. panel data methods. When using cross-sectional techniques, one is making strong assumptions about the comparability of different units (the unit homogeneity assumption). When using panel based within-estimators to understand the effects of climate, this unit homogeneity assumption can be substantially relaxed but we require a new assumption (that I term the marginal treatment comparability assumption) in order to say anything substantive about the climate. This is the assumption that small changes in the distribution of weather events can be used to approximate the effects of small changes in the distribution of expected events (i.e. the climate). In the paper, I discuss this tradeoff in formal terms and talk through techniques for considering when each assumption might be violated, as well as certain cases where one of the assumptions can be weakened.

2. The marginal treatment comparability assumption can be directly tested by filtering data at different frequencies and re-estimating panel models (this is intuition between the long-differences approach that David and Marshall have worked on). Basically, we can think of all panel data as superpositions of high and low-frequency data and we can see if relationships between climate variables and outcomes over time hold at these different frequencies. If they do, this suggests that slow (e.g. climatic) and fast (e.g. weather) variations have similar marginal effects on outcomes, implying that marginal treatments are comparable. Here's a figure where I demonstrate this idea by applying to US maize (since we already know what's going on here). On the left are time series from an example county, Grand Traverse, when the data is filtered at various frequencies. On the right is the effect of temperature on maize using these different data sets. The basic relationship recovered with annual data holds at all frequencies, changing substantially only in the true cross-section, which is the equivalent of frequency = 0 (this could be because that something happens in terms of adaptation at time scales longer than 33 yrs, or that the unit-homogeneity assumption fails for the cross-section).

3. I derive a set of conditions when weather can exactly identify the effect of climate (these are sufficient, but not necessary). I'm pretty sure this is the first time this has been done (and I'm also pretty sure I'll be taking heat for this over the next several months). If it is true that

the outcome of interest (Y) is the result of some maximization, where we think that Y is the maximized value (e.g. profits or GDP),
all adaptations to climate can take on continuous values (e.g. I can chose to change how much irrigation to use by continuous quantities),
the outcome of interest is differentiable in these adaptations,

then by the Envelope Theorem the marginal effect of weather variation on Y is identical to the marginal effect of identically structured changes in climate (that "identically structured" idea is made precise in the paper, Mike discussed this general idea earlier). This means that even though "weather is not equal to climate," we still can have that "the effect of weather is equal to the effect of climate" (exactly satisfying the marginal treatment comparability assumption above). If we're careful about how we integrate these marginal effects, then we can use this result to construct estimated effects of non-marginal changes in climate (invoking the Gradient Theorem), which is often what we are after when we are thinking about future climate changes.

From the archives: Friends don't let friends add constants before squaring

I was rooting around in my hard-drive for a review article when I tripped over this old comment that Marshall, Ted and I drafted a while back.

While working on our 2013 climate meta-analysis, we ran across an interesting article by Ole Thiesen at PRIO where he coded up all sorts of violence at a highly local level in Kenya to investigate whether local climatic events, like rainfall and temperature anomalies, appeared to be affecting conflict. Thiesen was estimating a model analogous to:

and reported finding no effect of either temperature or rainfall. I was looking through the replication code of the paper to check the structure of the fixed effects being used when I noticed something, the squared terms for temperature and rainfall were offset by a constant so that the minimum of the squared terms did not occur at zero:

(Thiesen was using standardize temperature and rainfall measures, so they were both centered at zero). This offset was not apparent in the linear terms of these variables, which got us thinking about whether this matters. Often, when working with linear models, we get used to shifting variables around by a constant, usually out of convenience, and it doesn't matter much. But in non-linear models, adding a constant incorrectly can be dangerous.

After some scratching pen on paper, we realized that

for the squared term in temperature (C is a constant), which when squared gives:

because this constant was not added to the linear terms in the model, the actual regression Thiesen was running is:

which can be converted to the earlier intended equation by computing linear combinations of the regression coefficients (as indicated by the underbraces), but directly interpreting the beta-tilde coefficients as the linear and squared effects is not right--except for beta-tilde_2 which is unchanged. Weird, huh? If you add a constant prior squaring for only the measure that is squared, then the coefficient for that term is fine, but it messes up all the other coefficients in the model. This didn't seem intuitive to us, which is part of why we drafted up the note.

To check this theory, we swapped out the T-tilde-squared measures for the correct T-squared measures and re-estimated the model in Theisen's original analysis. As predicted, the squared coefficients don't change, but the linear effects do:

This matters substantively, since the linear effect of temperature had appeared to be insignificant in the original analysis, leading Thiesen to conclude that Marshall and Ted might have drawn incorrect conclusions in their 2009 paper finding temperature affected conflict in Africa. But just removing the offending constant term revealed a large positive and significant linear effect of temperature in this new high resolution data set, agreeing with the earlier work. It turns out that if you compute the correct linear combination of coefficients from Thiesen's original regression (stuff above the brace for beta_1 above), you actually see the correct marginal effect of temperature (and it is significant).

The error was not at all obvious to us originally, and we guess that lots of folks make similar errors without realizing it. In particular, it's easy to show that a similar effect shows up if you estimate interaction effects incorrectly (after all, temperature-squared is just an interaction with itself).

Thiesen's construction of this new data set is an important contribution, and when we emailed this point to him he was very gracious in acknowledging the mistake. This comment didn't get seen widely because when we submitted it to the original journal that published the article, we received an email back from the editor stating that the "Journal of Peace Research does not publish research notes or commentaries."

This holiday season, don't let your friends drink and drive or add constants the wrong way in nonlinear models.

Monday, December 14, 2015

The right way to overfit

As the Heisman Trophy voters showed again, it is super easy to overfit a model. Sure, the SEC is good at playing football. But that doesn’t mean that the best player in their league is *always* the best player in the country. This year I don’t think it was even close.

At the same time, there are still plenty examples of overfitting in the scientific literature. Even as datasets become larger, this is still easy to do, since models often have more parameters than they used to. Most responsible modelers are pretty careful about presenting out-of-sample errors, but even that can get misleading when cross-validation techniques are used to select models, as opposed to just estimating errors.

Recently I saw a talk here by Trevor Hastie, a colleague at Stanford in statistics, which presented a technique he and Brad Efron have recently started using that seems more immune to overfitting. They call it spraygun, which doesn’t seem too intuitive a description to me. But who am I to question two of the giants of statistics.

Anyhow, a summary figure he presented is below. The x-axis shows the degree of model variance or overfitting, with high values in the left hand side, and the y-axis shows the error on a test dataset. In this case they’re trying to predict beer ratings from over 1M samples. (statistics students will know beer has always played an important role in statistics, since the origin of the “t-test”). The light red dots show the out-of-sample error for a traditional lasso-model fit to the full training data. The dark red dots show models fit to subsets of the data, which unsurprisingly tend to overfit sooner and have worse overall performance. But what’s interesting is that the average of the predictions from these overfit models do nearly the same as the model fit to the full data, until the tuning parameter is turned up enough that that full model overfits. At that point, the average of the models fit to the subset of data continues to perform well, with no notable increase in out-of-sample error. This means one doesn’t have to be too careful about optimizing the calibration stage. Instead just (over)fit a bunch of models and take the average.

This obviously relates to the superior performance of ensembles of process-based models, such as I discussed in a previous post about crop models. Even if individual models aren't very good, because they are overfit to their training data or for other reasons, the average model tends to be quite good. But in the world of empirical models, maybe we have also been too guilty of trying to find the ‘best’ model for a given application. This maybe makes sense if one is really interested in the coefficients of the model, for instance if you are obsessed with the question of causality. But often our interest in models, and even in identifying causality, is that we just want good out-of-sample prediction. And for causality, it is still possible to look at the distribution of parameter estimates across the individual models.

Hopefully for some future posts one of us can test this kind of approach on models we’ve discussed here in the past. For now, I just thought it was worth calling attention to. Chances are that when Trevor or Brad have a new technique, it’s worth paying attention to. Just like it’s worth paying attention to states west of Alabama if you want to see the best college football player in the country.

Tuesday, August 18, 2015

Daily or monthly weather data?

We’ve had a few really hot days here in California. It won’t surprise readers of this blog to know the heat has made Marshall unusually violent and Sol unusually unproductive. They practice what they preach. Apart from that, it’s gotten me thinking back to a common issue in our line of work - getting “good” measures of heat exposure. It’s become quite popular to be as precise as possible in doing this – using daily or even hourly measures of temperature to construct things like ‘extreme degree days’ or ‘killing degree days’ (I don’t really like the latter term, but that’s beside the point for now).

I’m all for precision when it is possible, but the reality is that in many parts of the world we still don’t have good daily measures of temperature, at least not for many locations. But in many cases there are more reliable measures of monthly than daily temperatures. For example, the CRU has gridded time series of monthly average max and min temperature at 0.5 degree resolution.

It seems a common view is that you can’t expect to do too well with these “coarse” temporal aggregates. But I’m going to go out on a limb and say that sometimes you can. Or at least I think the difference has been overblown, probably because many of the comparisons between monthly and daily weather show the latter working much better. But I think it’s overlooked that most comparisons of regressions using monthly and daily measures of heat have not been a fair fight.

What do I mean? On the one hand, you typically have the daily or hourly measures of heat, such as extreme degree days (EDD) or temperature exposure in individual bins of temperature. Then they enter into some fancy pants model that fits a spline or some other flexible function that capture all sorts of nonlinearities and asymmetries. Then on the other hand, for comparison you have a model with a quadratic response to growing season average temperature. I’m not trying to belittle the fancy approaches (I bin just as much as the next guy), but we should at least give the monthly data a fighting chance. We often restrict it to growing season rather than monthly averages, often using average daily temperatures rather than average maximums and minimums, and, most importantly, we often impose symmetry by using a quadratic. Maybe this is just out of habit, or maybe it’s the soft bigotry of low expectations for those poor monthly data.

As an example, suppose, as we’ve discussed in various other posts, that the best predictor of corn yields in the U.S. is exposure to very high temperatures during July. In particular, suppose that degree days above 30°C (EDD) is the best. Below I show the correlation of this daily measure for a site in Iowa with various growing season and monthly averages. You can see that average season temperature isn’t so good, but July average is a bit better, and July average daily maximum even better. In other words, if a month has a lot of really hot days, then that month's average daily maximum is likely to be pretty high.

You can also see that the relationship isn’t exactly linear. So a model with yields vs. any of these monthly or growing season averages likely wouldn’t do as well as EDD if the monthly data entered in as a linear or quadratic response. But as I described in an old post that I’m pretty sure no one has ever read, one can instead define simple assymetric hinge functions based on monthly temperature and rainfall. In the case of U.S. corn, I suggested these three based on a model fit to simulated data:

This is now what I’d consider more of a fair fight between daily and monthly data. The table below is from what I posted before. It compares the out-of-sample skill of a model using two daily-based measures (GDD and EDD), to a model using the three monthly-based hinge functions above. Both models include county fixed effects and quadratic time trends. In this particular case, the monthly model (3) even works slightly better than the daily model (2). I suspect the fact it’s even better relates less to temperature terms than to the fact that model (2) uses a quadratic in growing season rainfall, which is probably less appropriate than the more assymetric hinge function – which says yields respond up to 450mm of rain and are flat afterwards.

Model	Calibration R²	Average root mean square error for calibration	Average root mean square error for out-of-sample data (for 500 runs)	% reduction in out-of-sample error
1	0.59	0.270	.285	--
2	0.66	0.241	.259	8.9
3*	0.68	0.235	.254	10.7

Overall, the point is that monthly data may not be so much worse than daily for many applications. I’m sure we can find some examples where it is, but in many important examples it won’t be. I think this is good news given how often we can’t get good daily data. Of course, there’s a chance the heat is making me crazy and I’m wrong about all this. Hopefully at least I've provoked the others to post some counter-examples. There's nothing like a good old fashioned conflict on a hot day.

Saturday, January 10, 2015

Searching for critical thresholds in temperature effects: some R code

If google scholar is any guide, my 2009 paper with Wolfram Schlenker on the nonlinear effects of temperature on crop outcomes has had more impact than anything else I've been involved with.

A funny thing about that paper: Many reference it, and often claim that they are using techniques that follow that paper. But in the end, as far as I can tell, very few seem to actually have read through the finer details of that paper or try to implement the techniques in other settings. Granted, people have done similar things that seem inspired by that paper, but not quite the same. Either our explication was too ambiguous or people don't have the patience to fully carry out the technique, so they take shortcuts. Here I'm going to try to make it easier for folks to do the real thing.

So, how does one go about estimating the relationship plotted in the graph above?

Here's the essential idea: averaging temperatures over time or space can dilute or obscure the effect of extremes. Still, we need to aggregate, because outcomes are not measured continuously over time and space. In agriculture, we have annual yields at the county or larger geographic level. So, there are two essential pieces: (1) estimating the full distribution of temperatures of exposure (crops, people, or whatever) and (2) fitting a curve through the whole distribution.

The first step involves constructing the distribution of weather. This was most of the hard work in that paper, but it has since become easier, in part because finely gridded daily weather is available (see PRISM) and in part because Wolfram has made some STATA code available. Here I'm going to supplement Wolfram's code with a little bit of R code. Maybe the other G-FEEDers can chime in and explain how to do this stuff more easily.

First step: find some daily, gridded weather data. The finer scale the better. But keep in mind that data errors can cause serious attenuation bias. For the lower 48 since 1981, the PRISM data above is very good. Otherwise, you might have to do your own interpolation between weather stations. If you do this, you'll want to take some care in dealing with moving weather stations, elevation and microclimatic variations. Even better, cross-validate interpolation techniques by leaving one weather station out at a time and seeing how well the method works. Knowing the size of the measurement error can also help correcting bias. Almost no one does this, probably because it's very time consuming... Again, be careful, as measurement error in weather data creates very serious problems (see here and here).

Second step: estimate the distribution of temperatures over time and space from the gridded daily weather. There are a few ways of doing this. We've typically fit a sine curve between the minimum and maximum temperatures to approximate the time at each degree in each day in each grid, and then aggregate over grids in a county and over all days in the growing season. Here are a couple R functions to help you do this:

# This function estimates time (in days) when temperature is
# between t0 and t1 using sine curve interpolation.  tMin and
# tMax are vectors of day minimum and maximum temperatures over
# range of interest.  The sum of time in the interval is returned.
# noGrids is number of grids in area aggregated, each of which 
# should have exactly the same number of days in tMin and tMax
 
days.in.range <- function( t0, t1 , tMin, tMax, noGrids )  {
  n <-  length(tMin)
  t0 <-  rep(t0, n)
  t1 <-  rep(t1, n)
  t0[t0 < tMin] <-  tMin[t0 < tMin]
  t1[t1 > tMax] <-  tMax[t1 > tMax]
  u <- function(z, ind) (z[ind] - tMin[ind])/(tMax[ind] - tMin[ind])  
  outside <-  t0 > tMax | t1 < tMin
  inside <-  !outside
  time.at.range <- ( 2/pi )*( asin(u(t1,inside)) - asin(u(t0,inside)) ) 
  return( sum(time.at.range)/noGrids ) 
}

# This function calculates all 1-degree temperature intervals for 
# a given row (fips-year combination).  Note that nested objects
# must be defined in the outer environment.
aFipsYear <- function(z){
  afips    = Trows$fips[z]
  ayear    = Trows$year[z]
  tempDat  = w[ w$fips == afips & w$year==ayear, ]
  Tvect = c()
  for ( k in 1:nT ) Tvect[k] = days.in.range(
              t0   = T[k]-0.5, 
              t1   = T[k]+0.5, 
              tMin = tempDat$tMin, 
              tMax = tempDat$tMax,
              noGrids = length( unique(tempDat$gridNumber) )
              )
  Tvect
}

The first function estimates time in a temperature interval using the sine curve method. The second function calls the first function, looping through a bunch of 1-degree temperature intervals, defined outside the function. A nice thing about R is that you can be sloppy and write functions like this that use objects defined outside of the environment. A nice thing about writing the function this way is that it's amenable to easy parallel processing (look up 'foreach' and 'doParallel' packages).

Here are the objects defined outside the second function:

w       # weather data that includes a "fips" county ID, "gridNumber", "tMin" and "tMax".
        #   rows of w span all days, fips, years and grids being aggregated
 
tempDat #  pulls the particular fips/year of w being aggregated.
Trows   # = expand.grid( fips.index, year.index ), rows span the aggregated data set
T       # a vector of integer temperatures.  I'm approximating the distribution with 
        #   the time in each degree in the index T

To build a dataset call the second function above for each fips-year in Trows and rbind the results.

Third step: To estimate a smooth function through the whole distribution of temperatures, you simply need to choose your functional form, linearize it, and then cross-multiply the design matrix with the temperature distribution. For example, suppose you want to fit a cubic polynomial and your temperature bins that run from from 0 to 45 C. The design matrix would be:

D = [ 0 0 0
1 1 1
2 4 8
...
45 2025 91125]

These days, you might want to do something fancier than a basic polynomial, say a spline. It's up to you. I really like restricted cubic splines, although they can over smooth around sharp kinks, which we may have in this case. We have found piecewise linear works best for predicting out of sample (hence all of our references to degree days). If you want something really flexible, just make D and identity matrix, which effectively becomes a dummy variable for each temperature bin (the step function in the figure). Whatever you choose, you will have a (T x K) design matrix, with K being the number of parameters in your functional form and T=46 (in this case) temperature bins.

To get your covariates for your regression, simply cross multiply D by your frequency distribution. Here's a simple example with restricted cubic splines:

library(Hmisc)
DMat <- rcspline.eval(0:45)
XMat <- as.matrix(TemperatureData[,3:48])%*%DMat
fit <- lm(yield~XMat, data=regData)
summary(fit)

Note that regData has the crop outcomes. Also note that we generally include other covariates, like total precipitation during the season, county fixed effects, time trends, etc. All of that is pretty standard. I'm leaving that out to focus on the nonlinear temperature bit.

Anyway, I think this is a cool and fairly simple technique, even if some of the data management can be cumbersome. I hope more people use it instead of just fitting to shares of days with each maximum or mean temperature, which is what most people following our work tend to do.

In the end, all of this detail probably doesn't make a huge difference for predictions. But it can make estimates more precise, and confidence intervals stronger. And I think that precision also helps in pinning down mechanisms. For example, I think this precision helped us to figure out that VPD and associated drought was a key factor underlying observed effects of extreme heat.

Wednesday, October 29, 2014

Fanning the flames, unnecessarily

This post is a convolution of David's post earlier today and Sol's from a few days ago. Our debate with Buhaug et al that Sol blogged about has dragged on for a while now, and engendered a range of press coverage, the most recent by a news reporter at Science. "A debate among scientists over climate change and conflict has turned ugly", the article begins, and goes on to catalog the complaints on either side while doing little to engage with the content.

Perhaps it should be no surprise that press outlets prefer to highlight the mudslinging, but this sort of coverage is not really helpful. And what was lost in this particular coverage were the many things I think we've actually learned in the protracted debate with Buhaug.

We've been having an ongoing email dialog with ur-blogger and statistician Andrew Gelman, who often takes it upon himself to clarify or adjudicate these sorts of public statistical debates, which is a real public service. Gelman writes in an email:

In short, one might say that you and Buhang are disagreeing on who has the burden of proof. From your perspective, you did a reasonable analysis which holds up under reasonable perturbations and you feel it should stand, unless a critic can show that any proposed alternative data inclusion or data analytic choices make a real difference. From their perspective, you did an analysis with a lot of questionable choices and it’s not worth taking your analysis seriously until all these specifics are resolved.

I'm sympathetic with this summary, and am actually quite sympathetic to Buhaug and colleagues' concern about our variable selection in our original Science article. Researchers have a lot of choice over how and where to focus their analysis, which is a particular issue in our meta-analysis since there are multiple climate variables to choose from and multiple ways to operationalize them. Therefore it could be that our original effort to bring each researcher's "preferred" specification into our meta-analysis might have doubly amplified any publication bias -- with researchers of the individual studies we reviewed emphasizing the few significant results, and Sol, Ted, and I picking the most significant one out of those. Or perhaps the other researchers are not to blame and the problem could have just been with Sol, Ted, and my choices about what to focus on.

One effect to rule them all? Our reply to Buhaug et al's climate and conflict commentary

For many years there has been a heated debate about the empirical link between climate and conflict. A year ago, Marshall, Ted Miguel and I published a paper in Science where we reviewed existing quantitative research on this question, reanalyzed numerous studies, synthesized results into generalizable concepts, and conducted a meta-analysis of parameter estimates (watch Ted's TED talk). At the time, many researchers laid out criticism in the press and blogosphere, which Marshall fielded through G-FEED. In March, Halvard Buhaug posted a comment signed by 26 authors on his website strongly critiquing our analysis, essentially claiming that they had overturned our analysis by replicating it using an unbiased selection of studies and variables. At the time, I explained numerous errors in the comment here on G-FEED.

The comment by Buhaug et al. was published today in Climatic Change as a commentary (version here), essentially unchanged from the version posted earlier with none of the errors I pointed out addressed.

You can read our reply to Buhaug et al. here. If you don't want to bother with lengthy details, our abstract is short and direct:

Abstract: A comment by Buhaug et al. attributes disagreement between our recent analyses and their review articles to biased decisions in our meta-analysis and a difference of opinion regarding statistical approaches. The claim is false. Buhaug et al.’s alteration of our meta-analysis misrepresents findings in the literature, makes statistical errors, misclassifies multiple studies, makes coding errors, and suppresses the display of results that are consistent with our original analysis. We correct these mistakes and obtain findings in line with our original results, even when we use the study selection criteria proposed by Buhaug et al. We conclude that there is no evidence in the data supporting the claims raised in Buhaug et al.

When evidence does not suffice

Halvard Buhaug and numerous coauthors have released a comment titled “One effect to rule them all? A comment on climate and conflict” which critiques research on climate and human conflict that I published in Science and Climatic Change with my coauthors Marshall Burke and Edward Miguel.

The comment does not address the actual content of our papers. Instead it states that our papers say things they do not say (or that our papers do not say thing they actually do say) and then uses those inaccurate claims as evidence that our work is erroneous.

Below the fold is my reaction to the comment, written as the referee report that I would write if I were asked to referee the comment.

(This is not the first time Buhaug and I have disagreed on what constitutes evidence. Kyle Meng and I recently published a paper in PNAS demonstrating that Buhaug’s 2010 critique of an earlier paper made aggressive claims that the earlier paper was wrong without actually providing evidence to support those claims.)

A climate for conflict

Sol, Ted Miguel, and I are excited to have a paper we've been working on for a while finally come out. The title is "Quantifying the impact of climate on human conflict", and it's out in this week's issue of Science. Seems like this should get Sol's tenure package at Berkeley off to a pretty nice start.

Our goal in the paper is rather modest, and it's all in the title: we collect estimates from the rapidly growing number of studies that look at the relationship between some climate variable and some conflict outcome, make sure they are being generated by a common and acceptable methodology (one that can credibly estimate causal effects), and then see how similar these estimates are. We want to understand whether all these studies spread across different disciplines are actually telling us very different things, as the ongoing debate on climate and conflict might seem to suggest.

We compute things in terms of standardized effects -- percent change in conflict per standard deviation change in climate -- and find that, by and large, estimates across the studies are not all that different. All the exciting/tedious details are in the paper and the supplement (here's a link to the paper for those without access, and here is a link to all the replication data and code). We've also gotten some nice (and not so nice) press today that discusses different aspects of what we've done. See here, here, here, here, here, here, here, here, here, and here for a sampler. Most importantly, see here.

We want to use this space to answer some FAQ about the study. Or, really, some FHC: Frequently Heard Criticisms. A bunch of these have popped up already in the press coverage. Some are quite reasonable, some (we believe) stem from a misunderstanding or misrepresentation of what we're trying to do in the paper, and some are patently false.

So, in no particular order, some FHC we've gotten (in bold), and our responses. We include direct quotes from our critics where possible. Apologies for the lengthy post, but we are trying to get it all out there.

UPDATE (Friday Aug 2, 4pm): We have now heard from 3 different folks - Andy Solow, Idean Salehyan, and Cullen Hendrix - who each noted that what they were quoted as saying in these articles wasn't the full story, and that they had said lots of nice things too. After talking to them, it seems there is much more agreement on many of these issues than the press articles would have you believe, and in many cases it looks like the journalists went far out of their way to highlight points of disagreement. I guess this shouldn't be surprising, but we believe it does the debate a pretty serious disservice. So our responses below should be interpreted as responses to quotes as they appeared, not necessarily responses to particular individuals' viewpoints on these issues, which the quotes might not adequately represent. We are not trying to pick fights, just to clarify what our paper does and doesn't do.

1. You confuse weather and climate.
(verbatim from Richard Tol, courtesy Google Translate)

This is an old saw that you always get with these sorts of papers. The implied concern is that most of the historical relationships are estimates using short-run variation in temperature and precip (typically termed "weather"), and then these are used to say something about future, longer-run changes in the same variables ("climate"). So the worry is that people might respond to future changes in climate -- and in particular, slower-moving changes in average temperature or precip -- differently than they have to past short-run variation.

This is a very sensible concern. However, there are a couple reasons why we think our paper is okay on this front. First, we document in the paper hat the relationship between climate variables and conflict shows up at a variety of time scales, from hourly changes in temperature to century-scale changes in temperature and rainfall. We use the word "climate" in the paper to refer to this range of fluctuations, and we find similar responses across this range, which provides some evidence that today's societies are not that much better at dealing with long-run changes than short-run changes. This is consistent with evidence on climate effects in other related domains (e.g. agriculture).

Second, we do not to make explicit projections about future impacts - our focus is on the similarity in historical responses. Nevertheless, the reader is definitely given the tools to do their own back-of-the-envelope projections: e.g. we provide effects in terms of standard deviations of temperature/ rainfall, and then provide a map of the change in temperature by 2050 in terms of standard deviations (which are really really large!). That way, the reader can assume whatever they want about how historical responses map into future responses. If you think people will respond in the future just like they did in the past, it's easy multiplication. If you think they'll only be half as sensitive, multiply the effect size by 0.5, etc etc. People can adopt whatever view they like on how future long-run responses might differ from past short-run responses; our paper does not take a stand on that.

2. Your criterion for study inclusion are inconsistently applied, and you throw out studies that disagree with your main finding.
(Paraphrasing Halvard Buhaug and Jurgen Sheffran)

This one is just not true. We define an inclusion criterion -- which mainly boils down to studies using standard techniques to account for unobserved factors that could be correlated with both climate and conflict -- and include every study that we could find that meets this criterion. In more technical terms, these are panel or longitudinal studies that include fixed effects to account for time-invariant or time-varying omitted variables.

In a few cases, there were multiple studies that analyzed the exact same dataset and outcomes, and in those cases we included either the study that did it first (if the studies were indistinguishable in their methods), or the study that did the analysis correctly (if, as was sometimes the case, one study met the inclusion criteria and another did not).

So, the inclusion criterion had nothing to do with the findings of the study, and in the paper we highlight estimates from multiple studies whose do not appear to agree with the main findings of the meta-analysis (see Figure 5). We did throw out a lot of studies, and all of these were studies that could not reliably identify causal relationships between climate and conflict -- for instance if they only relied on cross-sectional variation. We also throw out multiple studies that agreed very strongly with our main findings! We provide very detailed information on the studies that we did and did not include in Section A of the SOM.

Sheffran refers to other recent reviews, and complains that our paper does not include some papers in those reviews. This is true, for the methodological reasons just stated. But what he does not mention is that none of these reviews even attempt to define an inclusion criterion, most of them review a very small percentage of the number of papers we do, and no make any attempt to quantitatively compare results across papers. This is why our study is a contribution, and presumably why Science was interested in publishing it as a Research Article.

3. You lump together many different types of conflict that shouldn't be lumped together. Corollary: There is no way that all these types of conflict have the same causal mechanism.
(Idean Salehyan: "It’s hard to see how the same causal mechanism that would lead to wild pitches would be linked to war and state collapse")

This quote is very substantial misrepresentation of what we do. First, nowhere do we make the claim that these types of conflict have the same causal mechanism. In fact we go to great lengths to state that climate affects many different things that might in turn affect conflict, and the fact that the effect sizes are broadly similar across different types of conflict could be explained by climate's pervasive effect on many different potential intervening variables (economic conditions, food prices, institutional factors, ease of transportation, etc).

Second, we take great pains to separate out conflicts into different categories, and only make comparisons within each category. So we calculate effect sizes separately for individual-level conflict (things like assault and murder), and group level conflict (things like civil war). So, again contra Idean's quote, we are not actually comparing baseball violence (which we term individual-level) with war and state collapse (which are group conflicts). Read the paper, Idean!

But the whole point of the paper is to ask whether effect sizes across these different types of conflict are in fact similar! What we had in the literature was a scattering of studies across disciplines, often looking at different types of conflict and using different methodologies. This disarray had led to understandable confusion about what the literature as a whole was telling us. Our goal was to put all the studies on the same footing and ask, are these different studies actually telling us that different types of conflict in different settings respond very differently to climate? Our basic finding is that there is much more similarity across studies than what is typically acknowledged in the debate.

Whether this similarity is being driven by a common underlying mechanism, or by multiple different mechanisms acting at the same time, is something we do not know the answer to -- and what we highlight very explicitly as THE research priority going forward.

4. You cherry pick the climate variables that you report
(paraphrasing Halvard Buhaug).

We try really hard not to do this. Where possible, we focus on the climate variable that the authors focused on in the original study. However, the authors themselves in these studies are almost completely unrestricted in how they want to parameterize climate. You can run a linear model, a quadratic model, you can include multiple lags, you can create binary measures, you can create fancy drought measures that combine temperature and rainfall, etc etc. Authors do all these things, and often do many of them in the same paper. Since we can't include all estimates from every single paper, we try to pick out the author's preferred measure or estimate, and report that one. In the cases where authors tested many different permutations and did not hint at their "preferred" estimate (e.g. in Buhaug's comments on our earlier PNAS paper), we pick the median estimate across all the reported estimates. Section A2 in the SOM provides extra detail on all of these cases.

5. This paper is not based on any theory of conflict, so we learn nothing.

This is very related to FHC #3 above, and we get this from the political science crew a lot. The thing is, there are a ton of different theories on the causes of conflict, and empirical work so far has not done a great job of sorting them out. In some sense, we are being atheoretical in the paper -- we just want to understand whether the different estimates are telling us very different things. As noted above, though, the fact that they're generally not would seem to be very important to people interested in theory!

Claire Adida, a political science prof at UCSD (and @ClaireAdida on twitter), put it really nicely in an email: "I don't understand this ostrich politics. How about saying something like 'we really want to thank these economists for doing a ton of work to show how confident we can be that this relationship exists. It's now our turn - as political scientists - to figure out what might be the causal mechanisms underlying this relationship.' " (Btw, I have no idea what "ostrich politics" means, but I really like it!)

6. People will adapt, so your results are a poor guide for impacts under climate change.
(paraphrasing Jurgen Sheffran, courtesy Google Translate; Cullen Hendrix: "I'm optimistic. Unlike glaciers, humans have remarkable adaptive capacity"; as well as audience members in every seminar we've ever given on this topic).

This is very related to FHC #1 above. It is definitely possible that future societies will become much better at dealing with extreme heat and erratic rainfall. However, to just assume that this is the case seems to us a dangerous misreading of the existing evidence. As stated above, available evidence suggests that human societies are remarkably bad at dealing with deviations from average climate, be it a short-lived temperature increase or a long-term one. See here and here for other evidence on this topic.

And it has to be the case that knowing something about how yesterday's and today's world respond to climate tells us more about future impacts than knowing nothing about how societies have responded to climate. The alternative - that the present world tells us nothing about the future world - just does not appear consistent with how nearly anybody sees things.

7. A lot of your examples - e.g. about civilization collapse - do not pertain to the modern world.

We got this in peer review. It's true that a lot of these collapse stories are from way back. The Akkadian empire collapsed before 2000 BC, after all! In a companion paper, forthcoming in Climatic Change, Sol and I look a little more carefully at this, and it actually turns out that per capita incomes in many of these societies, pre-collapse, were remarkably similar to incomes in many developing countries today. To the extent that economic conditions shape conflict outcomes -- a common belief in economics and political science -- then this provides at least some evidence that these historical events are not completely irrelevant to today.

More basically, though, it seems like hubris to just assume that "this time things are different". At the time of their collapse, each of these societies (the Akkadians, the Maya, some of the Chinese dynasties, Angkor Wat) were incredibly advanced by global standards, and they probably also did not figure that climate would play a role in their demise. Because we don't yet have a firm grasp on why climate affects conflict, it again seems dangerous to assume that things are completely different today -- just as it seems dangerous to conclude that modern societies are going to be completely destroyed by climate change, a claim we make nowhere in the paper.

However, we do hope that "this time is different"! It would be quite nice if the Mayan and Angkor Wat examples did not, in fact, pertain to the modern world.

8. You can't claim that there is an overall significant relationship between climate and conflict if many of the studies you analyze do not show a statistically significant effect.
(Halvard Buhaug: "I struggle to see how the authors can claim a remarkable convergence of quantitative evidence when one-third of their civil conflict models produce a climate effect statistically indistinguishable from zero, and several other models disagree on the direction of a possible climate effect")

This is a basic misunderstanding of what a meta-analysis does. The beauty of a meta-analysis is that, by pooling a bunch of different studies, you can dramatically increase statistical power by increasing your sample size. It's even possible to find a statistically significant result across many small studies even if no individual study found a significant result. This happens in medical meta-analyses all the time, and is why they are so popular in that setting: each individual study of some expensive drug or procedure often only includes a few individuals, and only by combining across studies do you have enough statistical power to figure out what's going on.

So the fact that some of the individual studies were statistically significant, and others were not, does not necessarily affect the conclusions you draw when you average across studies. In our case, it did not: the mean across studies can be estimated very precisely, as we show in Figures 4 and 5, and discuss in detail in the SOM.

A final point: we find a striking consistency in findings in the studies that look at temperature in particular. Of the 27 modern studies that looked at a relationship between temperature and conflict, all 27 estimated a positive coefficient. This is extremely unlikely to happen by chance - i.e. very unlikely to happen if there were in fact no underlying relationship between temperature and conflict. Think of flipping a coin 27 times and getting heads all 27 times. The chance of that is less than 1 in a million. This is not a perfect analogy -- coin flips of a fair coin are independent, our studies are not fully independent (e.g. many studies share some of the same data) -- but we show on page 19 in the SOM that even if you assume a very strong dependence across studies, our results are still strongly statistically significant.

9. Conflict has gone down across the world, as temperatures have risen. This undermines the claims about a positive relationship between temperature and conflict.
(Idean Salehyan: "We've seen rising temperatures, but there's actually been a decline in armed conflict".)

There are a couple things wrong with this one. First, many types of conflict that we look at have not declined at all over time. Here is a plot of civil conflicts and civil wars since 1960 from the PRIO data, summed across the world. As coded in these data, civil conflicts are conflicts that result in at least 25 battle deaths (light gray in the plot), and civil wars are those that result in at least 1000 deaths (dark gray). As you can see, both large wars and smaller conflicts peaked in the mid-1990s, and while the incidence of larger wars have fallen somewhat, the incidence of smaller conflicts is currently almost back up to its 1990s peak. These types of conflicts are examined by many of the papers we study, and have not declined.

As another check on this, I downloaded the latest version of the Social Conflict in Africa Dataset, a really nice dataset that Idean himself was instrumental in assembling. This dataset tracks the incidence of protests, riots, strikes, and other social disturbances in Africa. Below is a plot of event counts over time in these data. Again, you'd be very hard pressed to say that this type of conflict has declined either. So I just don't understand this comment.

Second, and more importantly, there are about a bazillion other things that are also trending over this period. The popularity of the band New Kids On The Block as also fallen fairly substantially since the 1990s, but no-one is attributing changes in conflict to changes in NKOTB popularity (although maybe this isn't implausible). The point is that identifying causal effects from these trends is just about impossible, since so many things are trending over time.

Our study instead focuses on papers that use detrended data - i.e. those that use variation in climate over time in a particular place. These papers, for instance, compare what happens to conflict in a hot year in a given country, to what happens in a cooler year in that country, after having account for any generic trends in both climate and conflict that might be in the data. Done this way, you are very unlikely to erroneously attribute the effects of changes in conflict to changes in climate.

10. You don't provide specific examples of conflicts that were caused by climate.

(Halvard Buhaug: "Surprisingly, the authors provide no examples of real conflicts that plausibly were affected by climate extremes that could serve to validate their conclusion. For these and other reasons, this study fails to provide new insight into how and under what conditions climate might affect violent conflict")

I do not understand this statement. We review studies that look at civil conflict in Somalia, studies that look at land invasions in Brazil, studies that look at domestic violence in one city in one year in Australia, studies that look at ethnic violence in India, studies that look at murder in a small set of villages in Tanzania. The historical studies looking at civilization collapses in particular try to match single events to large contemporaneous shifts in climate. We highlight these examples in both the paper and in the press materials that we released, and they were included in nearly every news piece on our work that we have seen. So, again, this comment just does not make sense.

Perhaps implicit in this claim is often some belief that we are climate determinists. But as we say explicitly in the paper, we are not arguing that climate is the only factor that affects conflict, nor even that it is the most important factor affecting conflict. Our contribution is to quantify its role across a whole host of settings, and our findings we hope will help motivate a bunch more research on why climate should shape conflict so dramatically (see Claire's quote above).

11. You are data mining. Corollary: What you guys are demonstrating is a severe publication bias problem -- only studies that show a certain result get published.
(Andy Solow: "In the aggregate, if you work the data very hard, you do find relationships like this. But when you take a closer look, things tend to be more complicated." As an aside, Andy sent us a very nice email, noting in reference to the press coverage of our article: "From what I've seen so far, all the nice things I said - that you are smart, serious researchers working on an important and difficult problem, that your paper will contribute to the discussion, that you may well be right - have been lost in favor of concerns I have and that, as I took pains to point out, you are already aware of.")

This is related to FHC #2 and #4 above. We have defined a clear inclusion criterion, and only include studies that meet this criterion. As detailed in the SOM Section A2, we do not include a number of studies that agree very strongly with our main findings - for instance Melissa Dell's very nice paper on the Mexican Revolution. Again, our inclusion criteria is based on methodology, not findings.

The publication bias issue is a tougher one, and one which we explicitly address in the paper -- it even gets its own section, so it's not something we're trying to hide from. We test formally for it in the SOM (Section C), finding limited evidence that publication bias is behind our results. We also note that it's not clear where the professional incentives now lie in terms of the sorts of results that are likely to get published or noticed. The handful of climate/conflict skeptics have garnered a lot of press by very publicly disagreeing with our findings, and this has presumably been good for their careers. Had they instead published papers that agreed with our findings, it's likely that the press would not have had these folks as their go-to this time around. Similarly, journals are probably becoming less interested in publishing yet another paper that shows that higher temperatures lead to more conflict. Because so many of the papers we review are really recent (the median publication date across our studies was 2011), we feel that it is unlikely that all of these results are false positives.

G-FEED

Pages