Thursday, September 12, 2013

The noise lovers

I try not to use this blog much for rants, but it’s an easy way to post more frequently. So… we’ve been getting some feedback on our recent paper on drought stress in U.S. maize, some of it positive, some not. One thing that comes up, as with a lot of prior work, is doubts about how important temperature is. Agronomists often talk about how important the timing of rainfall is. And about how heat doesn’t hurt nearly as much if soils are very moist. To me this is another way of saying (1) other factors matter too and (2) there are interactions between temperature and other factors. The answer to both of these is “Of course!”

I am struck by the similarity of these discussions to those that Marshall posted about the empirical work on conflict. They go something like this:
Person A: “We’ve looked at the data and see a clear response that people tend to wake up when you turn the lights on”
Person B: “But people also wake up because they went to bed early last night and aren’t tired any more.”
A: “Yeah, ok”
B: “And the lights probably won’t wake them up if they are passed out drunk.”
A: “Ok, great point”
B: “I’ve woken up thousands of times in my 30-year career, and rarely did I get woken because someone turned the light on”
A: “ok”
B: “So then how can you possibly claim that turning on lights causes people to wake up”
A: “What? Are you serious? Did you even read the paper?”
B: “I don’t need to, I am an expert on waking up.”
And so on. I sometimes don’t know whether people seriously don’t understand the difference between explaining some vs. all of the variance, or if they just look for any opportunity to plug their own area of expertise. When we claim to see a clear signal in the data, it is not a claim that there is no noise around that signal. And some of the noise might include interactions with other variables. In fact, if there wasn’t any noise then the signal would have been known long ago.

The other day I was thinking of replacing my old tennis racquet, so I went to Google and typed in “prince thunder” (the name of my current racquet).  Turns out the top results were about a song that the artist Prince wrote called Thunder. Does that mean that Google is entirely useless? No, it means there is some error if you try to predict what someone wants based only on a couple of words they type. But with their enormous datasets, they are pretty good at picking up signals and getting the answer right more often than not. For most people typing those words, they were probably looking for the song.

Back to the crop example. Of course, heat will matter less or more depending on the situation. And of course getting rainfall right before key stages is more important than getting it after. These are both reasons for the scatter in any plot of heat (or any other variable) against yields. But neither of those refutes the fact that higher temperatures tend to result in more water stress, and lower yields. Or in Sol and Marshall’s case, that higher temperatures tend to increase the risk of conflict.


I sometimes think if one of us were to discover some magic combination of predictors that explained 95% of the variance in some outcome, there would be a chorus of people saying “you left out my favorite 5%!” Don’t get me wrong, there are lots of legitimate questions about cause and effect, stationarity, etc. that are worth looking into. But how much time should we really spend having the same old conversation about the difference between a signal and noise? 

1 comment: