Friday, March 21, 2014

When evidence does not suffice

Halvard Buhaug and numerous coauthors have released a comment titled “One effect to rule them all? A comment on climate and conflict” which critiques research on climate and human conflict that I published in Science and Climatic Change with my coauthors Marshall Burke and Edward Miguel

The comment does not address the actual content of our papers.  Instead it states that our papers say things they do not say (or that our papers do not say thing they actually do say) and then uses those inaccurate claims as evidence that our work is erroneous.

Below the fold is my reaction to the comment, written as the referee report that I would write if I were asked to referee the comment.

(This is not the first time Buhaug and I have disagreed on what constitutes evidence. Kyle Meng and I recently published a paper in PNAS demonstrating that Buhaug’s 2010 critique of an earlier paper made aggressive claims that the earlier paper was wrong without actually providing evidence to support those claims.)

My response:

The commentary by Buhaug et al. critiques two recent articles by Hsiang and Burke (Climatic Change, 2014) and Hsiang, Burke and Miguel (Science, 2013) [henceforth HBM]. The authors suggest that their criticisms undermine the conclusions of these earlier two studies and that “research on climate and conflict to date has produced mixed and inconclusive results.”

I have two major concerns, each of which is fatal to the paper taken on its own.

First, the article provides three qualitative criticisms of HBM’s analysis, approach and discussion that the authors suggest undermine HBM’s original findings: 1) assuming independence across data sets in HBM suggests their meta-analysis findings are too strong, 2) the authors pool too many types of conflict in their meta-analysis, 3) study selection and variable selection in HBM’s meta-analysis are biased.  However all of these of these claims, as they are stated in the manuscript, grossly misrepresent the content of HBM’s actual analysis and article.  All of the concerns expressed by the authors are dealt with in great detail in the original HBM article, in direct contradiction to the authors’ assertions.  It is unclear why the authors have chosen to disregard the actual content of the HBM article, claiming that it omits analyses, papers, and discussions that it includes, but the image of the HBM paper painted by the authors is far coarser and more naïve than the actual HBM paper. For example, the manuscripts title “One effect to rule them all? A comment on climate and conflict” misrepresents the HBM analysis as promoting a single effect, when in reality multiple summary effects for different classes of conflict are clearly presented in HBM’s abstract. In another example, the authors state that HBM ignore the cross-study correlations in data, when in fact HBM present replications of their main analysis where they assume extremely strong cross-study correlations. In another example, the authors list five articles that they claim were ignored by HBM, when all five were specifically discussed and analyzed in HBM.  Below I detail each of the criticisms presented in the manuscript and present quotes and material from the original articles demonstrating that the claims made in the manuscript about the content of HBM are false. 

Second, the authors claim to replicate the meta-analysis of HBM and use this analysis to suggest that the results in HBM are biased and that accurate results would show that research is inconclusive.  However, the authors make a major and critical adjustment to the approach they are using without providing sufficient detail for a reader to replicate their analysis, instead only saying “we also applied a consistent lag of one time period (t–1) to all climate parameters across all models”.  Below, I detail three possible interpretations of this ambiguous statement and explain why either (1) the approach of the authors is reasonably consistent with the literature, in which case the authors have made obvious mathematical/transcription errors or (2) the approach of the authors is completely inconsistent with the literature and objectively misrepresents findings from all studies, in which case the authors have made obvious errors and the authors’ use of these results is inappropriate.  Under neither case does the analysis accurately present findings from the literature or replicate the findings of HBM.  Furthermore, the authors’ rhetorical use of this replication exercise to suggest that the un-replicated results in HBM are invalid is neither self-evident nor mathematically logical.

Details on the second and third points above are presented below.

Buhaug et al.’s first criticism:

The authors state:

First, HBM’s main analysis rests on the assumption that sample studies are fully independent, although it is clear that there is considerable overlap between them. Every civil conflict study considered by Hsiang and Burke (2013) and 19 of the 22 studies of modern climate–intergroup conflict link in Hsiang et al. (2013) include African countries and more than half of these are limited to post-1980 Sub-Saharan Africa or a subset of country years. In one case, the cross-study correlation is estimated at r=0.6 (see supplementary information). Accordingly, the precision-weighted calculation of climate effects conducted by HBM returns unrealistically precise estimates and the true uncertainty around the average climate effect is much larger than reported.

However this exact issue was explained and explored by HBM on page 18 of their supplement. HBM find that even if they assume all studies are correlated with all other studies with rho = 0.7 (higher than Buhaug et al suggest) then HBM’s main conclusions would remain unchanged. HBM write (emphasis added):

As discussed above, the estimates ofbeta are unlikely to be independent across all studies. We cannot estimate all of these covariances, but we think it is both reasonable and conservative to assume that they are likely to be weakly positive in our setting – as they are probably an increasing function of the spatial and temporal overlap in a given set of studies. To develop a sense of whether our assumption of “no cross-study correlation” is generating a false sense of statistical significance in our precision-weighted average effects, we assume that all cross-study covariances take on arbitrary positive values and ask how this alters our estimates of Var(beta) using Equation 9.… we assume rho = {0.1,0.3,0.5,0.7} and then estimate

Cov(beta_i, beta_j) = rho_ij sigma_i sigma_j

 which are then used in Equation 9. Using these four values for rho_ij, we obtain estimates for sqrt(Var(beta)) of 2.1, 3.1, 3.8, and 4.5, respectively, for the studies of intergroup conflict (our least-precise result). Since the estimated mean effect is 11.1 for this set of studies, we infer that this result would still be statistically significant even if rho_ij = 0.7 for all pairs of studies. Since it is very likely that rho_ij is much lower for all pairs of studies, this shows that our central conclusion about the general relationship between climate an conflict is not dependent on our assumption of cross-study correlations.

Thus the authors’ concern regarding cross-study correlation was addressed in the original article.

Buhaug et al.’s second criticism:

The authors second critique of HBM concerns the breadth of types of conflict studied in HBM:

Second, HBM’s sample of candidate studies covers a wide range of phenomena from horn honking to imperial war, involves temporal scales from hours to millennia, concerns actors that range from individuals to ancient civilizations, and assumes climate effects that sometimes are linear, at other times parabolic; sometimes instant and at other times materialize after a distinct temporal lag. Claiming the same underlying climate effect across these heterogeneous studies is certainly a bold exercise, but this assumption is essential for the meta-analysis to be meaningful. A careful reading of the literature, or inspection of Figure 1, reveals a variation in findings that is inconsistent with the assumption of causal homogeneity.

However this is a gross misrepresentation of the analysis and conclusions of HBM.  HBM do not claim that the effect of climate across these studies are the same. HBM explicitly separate their discussion and analysis of studies into three groups, clearly stating on page 8

We divide this section topically, examining, in turn, the evidence on how climatic changes shape personal violence, group-level violence, and the breakdown of social order and political institutions.

HBM then proceed to analyze these three types of human conflict in entirely different (and separately titled) sections of the text.  To ensure no confusion between these types of conflict, HBM conduct separate meta-analyses of interpersonal conflict and inter-group conflict and present these results separately in both the main text and in the abstract. Furthermore, analyses of historical civilizations are completely omitted from the meta-analysis and are separately presented.  Thus, HBM unambiguously do not assume that dramatically different forms of conflict are identical in their response to climate, and nowhere is this assumption made either implicitly or explicitly as Buhaug et al state in their critique.

HBM have a much more nuanced interpretation of their analysis than Buhaug et al. claim. In the introduction to their approach, Hsiang and Burke clearly state

We use the terms conflict and social instability (“conflict” for brevity) to describe events where regular patterns of dispute resolution fail or social orders change. These events may or may not be violent in nature, they may involve individuals or groups of individuals, they may be organized or disorganized, and they may be personally, politically or otherwise motivated. Most studies examine only one type of conflict at a time, however we examine this comprehensive set of outcomes because it appears that many of these patterns are potentially related and that their responses to climate exhibit certain commonalities. For this reason it is likely that evaluating these phenomena together might help us better understand each individually.

The authors do not in any way claim “causal homogeneity” when reporting their results. For example, in their summary of findings Hsiang and Burke state that associations are observable at many scales and temporal frequencies, but they do not claim that the causal mechanism is the same:
Once attention is restricted to those studies capable of making causal claims, a growing consensus emerges from the recent literature: we find strong support for a causal association between climatological changes and conflict across a range of geographies, a range of different time periods, a range of spatial scales and across climatic events of different duration. Although some disagreement between studies remains, many purported discrepancies disappear when a common statistical approach is employed and hypothesis tests are carefully interpreted

Furthermore, the meta-analytic technique used in HBM explicitly assumes that effects across studies are not the same even within a given class of conflict, e.g. the approach explicitly assumes that different types of intergroup conflict in different regions responds to different climate variables differently.  In contrast to what Buhaug et al. suggest, HBM explicitly discuss this heterogeneity and characterize it using their Bayesian hierarchical approach. They do not sweep these differences under the rug but instead give them a complete treatment in the original text. For example, on pg. 10 HBM systematically characterize and discuss cross-study heterogeneity within each class of conflict (emphasis added):

We estimate the precision-weighted probability distribution of study-level effect sizes in Figs. 4 and 5 and in table S1. These distributions are centered at the precision-weighted averages described above and can be interpreted as the distribution of results from which studies’ findings are drawn. The distribution for interpersonal conflict is narrow around its mean, probably because most interpersonal conflict studies focus on one country (the United States) and use very large samples and derive very precise estimates. The distribution for intergroup conflict is broader and covers values that are larger in magnitude, with an interquartile range of 6 to 14% per 1sigma and the 5th to 95th percentiles spanning –5 to 32% per 1sigma (table S1). We estimate that for the intergroup and interpersonal conflict studies, respectively, 10 and 0% of the probability mass of the distributions of effect sizes lies below zero.

Figures 4 and 5 make it clear that even though there is substantial agreement across results, some heterogeneity across estimates remains. It is possible that some of this variation is meaningful, perhaps because different types of climate variables have different impacts or because the social, economic, political, or geographic conditions of a society mediate its response to climatic events. For instance, poorer populations appear to have larger responses, consistent with prior findings that such populations are more vulnerable to climatic shifts (51). However, it is also possible that some of this variation is due to differences in how conflict outcomes are defined, measurement error in climate variables, or remaining differences in model specifications that we could not correct in our reanalysis.

To formally characterize the variation in estimated responses across studies, we use a Bayesian hierarchical model that does not require knowledge of the source of between-study variation (92) (see supplementary materials). Under this approach, estimates of the precision-weighted mean are essentially unchanged, and we recover estimates for the between-study SD (a measure of the underlying dispersion of true effect sizes across studies) that are half of the precision-weighted mean for interpersonal conflict and two-thirds of the precision-weighted mean for intergroup conflict (median estimates; see supplementary mate- rials, fig. S3, and tables S2 and S3). By comparison, if variation in effect sizes across studies was driven by sampling variation alone, then this SD in the underlying distribution of effect sizes would be zero. This finding suggests that true effects probably differ across settings, and understanding this heterogeneity should be a primary goal of future research.

HBM go on to provide an extensive four-page analysis of between-study heterogeneity in their supplement (pg. 19-22), going so far as to characterize the posterior distribution of hyperparameters tau that describe the true underlying heterogeneity across studies using its variance (Fig S3). Notably, HBM conduct this analysis separately for different classes of conflict – i.e. HBM do not even assume that the cross-study differences in results are the same for different types of conflict.

Buhaug et al.’s third criticism:

Buhaug et al’s third criticism concerns study selection and variable selection. Buhaug et al. state:

Third, aggregating and generalizing results from selected studies serves no larger purpose unless the sample constitutes a representative subset of all relevant scientific research. Yet, HBM’s sample inclusion strategy favors form over function by using strictly methodological selection criteria. The result is a meta-analysis that disregards modern studies that revisit previously investigated climate-conflict associations, regardless of whether they complement, contrast or correct earlier findings. For example, the country-level relationship between rainfall and civil conflict is represented by a single peer-reviewed article (Miguel et al. 2004), ignoring several more recent investigations that reach different conclusions (e.g., Buhaug 2010; Burke et al. 2009; Ciccone 2011; Couttenier and Soubeyran 2013; Koubi et al. 2012). Moreover, HBM’s meta-analysis considers just one climate indicator from each study, in many cases the one that indicated the strongest effect, despite most of the original studies exploring multiple alternative and complementary climate measures that sometimes produce contrasting results.

Yet again, this critique misrepresents the content of HBM and Hsiang and Burke.

First, both articles discuss the results and implications of Buhaug 2010, Burke et al. 2009, Ciccone 2011 and Couttenier and Soubeyran 2013. Koubi et al. 2012 is discussed in the supplement of HBM, where HBM clearly explain why it is omitted. Buhaug et al’s suggestion that they are somehow unaccounted for in HBM is highly misleading.

Second, Buhaug et al.’s statement that “HBM’s sample inclusion strategy favors form over function by using strictly methodological selection criteria” is inaccurate. The function of using methodological criteria to select studies is clearly stated in both HBM and Hsiang and Burke, it is not cosmetic. HBM restrict their analysis to studies that are capable of making credible causal claims, something that can only be achieved if appropriate methods are employed.  Because these methods are so fundamental to the credibility of a study, these methods were described in surprising length and detail (for a study in Science) in the main text of HBM in the section on “Research Design.”

Third, HBM’s approach of analyzing individual climate variables in their metanalysis is reasonable because they are simply presenting the results of prior studies. HBM focus on the findings presented in a prior paper, so if earlier papers focused on a specific variable that is the variable HBM examine.  However, the concern raised by Buhaug et al. is clearly not lost on HBM and they address it directly in their main text on page 11:

Climate variables that have been analyzed previously, such as seasonal temperatures, precipitation, water availability indices, and climate indices, may be correlated with one another and autocorrelated across both time and space. For instance, temperature and precipitation time series tend to be negatively correlated in much of the tropics, and drought indices tend to be spa-tially correlated (51, 126). Unfortunately, only a few of the existing studies account for the correlations between different variables, so it may be that some studies mistakenly measure the influence of an omitted climate variable by proxy [see (126) for a complete discussion of this issue]. Except for the experiments linking temperature to aggression (27, 28), only a few studies demonstrate that a specific climate variable is more important for predicting conflict than other climate variables or that climatic changes during a specific season are more important than during other seasons. Furthermore, no study isolates a particular type of climatic change as the most influential, and no study has identified whether temporal or spatial autocorrelations in climatic variables are mechanistically important. Identifying the climatic variables, timing of events, and forms of autocorrelation that influence conflict will help us better understand the mechanisms linking climatic changes to conflict.

Fourth, the most direct treatment of Buhaug et al.’s critique that HBM are biased in the climate variables they examine would be to examine a single climate variable for all studies.  HBM do just that, repeating their analysis for all studies of just temperature, thereby removing any potential conflation of different climate variables in their analysis. As stated in the main text of HBM on page 10,

If we restrict our attention to only the effects of temperature, the precision-weighted average effect is similar for interpersonal conflict (2.3%); however, for intergroup conflict, the effect rises to 13.2% per 1s in temperature (SE = 2.0, P < 0.001; Fig. 5).

Further, HBM report that these studies of temperature exhibit the least evidence of publication bias (pg 22-26 of HBM’s supplement), indicating that this approach is unlikely to be biased by researcher selectivity.

Buhaug et al.’s effort to replicate HBM’s meta-analysis:

Buhaug et al. conclude by claiming to replicate HBM’s meta-analysis after applying three adjustments to the analysis in HBM.

The authors’ first restriction is to limit their analysis to civil conflict outcome variables. This restriction is reasonable, although it means that any conclusions drawn from the analysis will only be relevant to civil conflicts. The authors incorrectly use this new analysis to infer something about HBM’s analysis of other non-civil-conflict outcomes.

The second adjustment is to include all types of climate variables from all studies, not just those variables that were the focus of analysis in specific studies. Given the strong potential for omitted variables bias in those studies that do not include multiple climate parameters and the fact that control variables are generally not rigorously evaluated in these studies, it is unclear a priori what effect this should have on the results and it is not at all obvious that it should de-bias estimates. A more appropriate strategy would be to focus on a single type of variable, such as temperature, across all studies, as was done in the original analysis of HBM.

The third adjustment is to change the statistical model that is used in each paper and/or to change what parameter from the model is being reported. This is probably the most critical alteration, although I cannot determine exactly what was done from the text. The authors only state “we also applied a consistent lag of one time period (t–1) to all climate parameters across all models” but this is not enough information to know what was done. I can only guess at what the authors have done, and based on these guesses it seems that either the authors have made transcription errors that misrepresent the findings of earlier studies and/or they are reporting the incorrect numbers from these studies.

To be clear, the authors may have estimated the model

Conflict_t = A * climate_t + B * climate _t-1 + C * controls + error

And are now reporting the coefficient A, which is the focus of the HBM article. However, if this is what they did, then casual inspection of the figure indicates that these values shown are erroneous. For example, Levy et al and Hsiang et al are depicted with large negative coefficients when they should be positive.

Alternatively the authors may have reported the coefficient B from the equation above or estimated a new equation

Conflict_t = B2 * climate _t-1 + C * controls + error

and report the coefficient B2.  Normally I would not suspect that this is what is done because none of the reported studies claim to find any lagged association between climate variables and conflict and all studies focus on the coefficient A, so it would be a dramatic distortion of reported results to suggest that lagged coefficients capture the findings of previous papers. [The only reason I think this might be the case is that the lead author did ran the above model (and reported finding no lagged effect) in his recent Climatic Change article about conflict in Asia.]  If this is what was done, then it is a major error and would imply that the presentation of these findings is not attempting to accurately depict results in the literature.

However, again, casual inspection suggests that if lagged coefficients B or B2 are what are reported, then the authors have again made mathematical errors and are not reporting accurate numbers. For example, the coefficient on Hsiang et al (2011) is shown as roughly -12%, but the lagged coefficient reported in Supplementary Table 14 (model 2) of Hsiang et al (2011) is -0.319, which would be -7% in standardized terms. 

Thus it is completely unclear to me how the numbers presented were generated as the authors do not provide enough information to replicate their analysis, and casual inspection suggests that the authors have either made basic mathematical errors in transcription or they are misrepresenting the results of studies by reporting coefficients that are known to be irrelevant, or both.  Thus, the mathematical result presented in the new meta-analysis contains at least one of these fatal errors.

For the above reasons, the “replication” of the HBM’s meta-analysis is at best uninformative because it does not accurately represent findings in the literature and contains major errors. Furthermore, because the analysis is restricted to only civil conflicts, it is misleading that the authors suggest it is informative of other types of conflict analyzed in HBM.

No comments:

Post a Comment