Monday, December 21, 2015

From the archives: Friends don't let friends add constants before squaring

I was rooting around in my hard-drive for a review article when I tripped over this old comment that Marshall, Ted and I drafted a while back.

While working on our 2013 climate meta-analysis, we ran across an interesting article by Ole Thiesen at PRIO where he coded up all sorts of violence at a highly local level in Kenya to investigate whether local climatic events, like rainfall and temperature anomalies, appeared to be affecting conflict. Thiesen was estimating a model analogous to:
and reported finding no effect of either temperature or rainfall. I was looking through the replication code of the paper to check the structure of the fixed effects being used when I noticed something, the  squared terms for temperature and rainfall were offset by a constant so that the minimum of the squared terms did not occur at zero:

(Thiesen was using standardize temperature and rainfall measures, so they were both centered at zero). This offset was not apparent in the linear terms of these variables, which got us thinking about whether this matters. Often, when working with linear models, we get used to shifting variables around by a constant, usually out of convenience, and it doesn't matter much. But in non-linear models, adding a constant incorrectly can be dangerous.

After some scratching pen on paper, we realized that

for the squared term in temperature (C is a constant), which when squared gives:

because this constant was not added to the linear terms in the model, the actual regression Thiesen was running is:

which can be converted to the earlier intended equation by computing linear combinations of the regression coefficients (as indicated by the underbraces), but directly interpreting the beta-tilde coefficients as the linear and squared effects is not right--except for beta-tilde_2 which is unchanged. Weird, huh? If you add a constant prior squaring for only the measure that is squared, then the coefficient for that term is fine, but it messes up all the other coefficients in the model.  This didn't seem intuitive to us, which is part of why we drafted up the note.

To check this theory, we swapped out the T-tilde-squared measures for the correct T-squared measures and re-estimated the model in Theisen's original analysis. As predicted, the squared coefficients don't change, but the linear effects do:

This matters substantively, since the linear effect of temperature had appeared to be insignificant in the original analysis, leading Thiesen to conclude that Marshall and Ted might have drawn incorrect conclusions in their 2009 paper finding temperature affected conflict in Africa. But just removing the offending constant term revealed a large positive and significant linear effect of temperature in this new high resolution data set, agreeing with the earlier work. It turns out that if you compute the correct linear combination of coefficients from Thiesen's original regression (stuff above the brace for beta_1 above), you actually see the correct marginal effect of temperature (and it is significant).

The error was not at all obvious to us originally, and we guess that lots of folks make similar errors without realizing it. In particular, it's easy to show that a similar effect shows up if you estimate interaction effects incorrectly (after all, temperature-squared is just an interaction with itself).

Thiesen's construction of this new data set is an important contribution, and when we emailed this point to him he was very gracious in acknowledging the mistake. This comment didn't get seen widely because when we submitted it to the original journal that published the article, we received an email back from the editor stating that the "Journal of Peace Research does not publish research notes or commentaries."

This holiday season, don't let your friends drink and drive or add constants the wrong way in nonlinear models.

No comments:

Post a Comment