Monday, December 7, 2015

Warming makes people unhappy: evidence from a billion tweets (guest post by Patrick Baylis)

Everyone likes fresh air, sunshine, and pleasant temperatures. But how much do we like these things? And how much would we be willing to pay to gain more of them, or to prevent a decrease in the current amount that we get?

Clean air, sunny days, and moderate temperatures can all be thought of as environmental goods. If you're not an environmental economist, it may seem strange to think about different environmental conditions as "goods". But, if you believe that someone prefers more sunshine to less and would be willing to pay some cost for it, then a unit of sunshine really isn't conceptually much different from, say, a loaf of bread or a Playstation 4.

The tricky thing about environmental goods is that they're usually difficult to value. Most of them are what economists call nonmarket goods, meaning that we don't have an explicit market for them. So unlike a Playstation 4, I can't just go to the store and buy more sunshine or a nicer outdoor temperature (or maybe I can, but it's very, very expensive). This also makes it more challenging to study how much people value these goods. Still, there is a long tradition in economics of using various nonmarket valuation methods to study this kind of problem.

New data set: a billion tweets

Understanding preferences over temperature in particular is important to climate economists. We are getting better and better at measuring the different costs of climate change, but for the most part we have focused on estimating the value of the "indirect" impacts of changes in temperature such as temperature-induced changes in agricultural yields, income, conflict, or other inputs to our utility functions that are altered by temperature.

For example, to compute the value of the climate-induced change in assault, we estimate the increase in assaults expected from a one-degree rise in temperature, and then multiply it by a previously measured willingness-to-pay to avoid one assault. This procedure gives us an estimate of the "assault cost" of one degree of warming. Then, we sum the value for assault with the value for all of the other changes caused by climate change to get a net cost (empirically speaking, the net costs are usually positive) of one degree of warming. This is essentially the procedure by which we estimate a social cost of carbon and the Risky Business project, although I'm simplifying over a huge amount of detail for the sake of brevity.

However, if we think that people in general have preferences over temperature itself (and not just the things that might be increased or decreased by changes in temperature), it seems likely that there is an "amenity cost" to climate change as well. This is trickier to measure. One way to assess the value of temperature is to use people's location choices to back out an implicit value for different climates, an approach used most recently by Albouy, Graf, Kellogg, and Wolff (2013). This is a useful technique because it produces a direct monetary valuation for different climates by comparing how much people are willing to pay to live in places with different climates. However, because this approach compares different regions of the country to each other, it relies on the important assumption that there are no unobservable factors that vary across space that might be bias the results.

As a complement to their work, and as a general method for measuring nonmarket goods, I have a recently released working paper (which is also my job market paper) that instead uses hedonic, or emotional, state to better understand peoples' preferences for temperature. To measure hedonic state, I use a unique source of data: Twitter updates, or tweets. Using Twitter's Streaming API, I download more than a billion tweets from 2014-present. I then use the text of the tweets to infer their hedonic state using a set of text analysis algorithms derived from work in computational linguistics. I use four measures of hedonic state to ensure that the results are not driven by idiosyncratic measure design:
  1. The "Expert" measure scores a set of words that are deemed to be indicative of mood in psychological studies, adapted for performance in a microblogging setting like Twitter.
  2. The "Crowd-sourced" measure uses a word list constructed by surveying thousands of users on their emotional associations with about 10,000 frequently-used words, originally created by the folks at
  3. The "Profanity" measure is quite simple: it codes each tweet for the presence or absence of profanity, assuming that most profanity usage indicates negative hedonic state.
  4. The "Emoticon" measure trains a machine-learning algorithm to classify the tweets as negative or positive based on whether or not they resemble tweets with negative or positive emoticons.
Because all of the tweets I use are geo-located, I'm able to match my estimates of hedonic state to the temperature (and other conditions) that the user was experiencing at the precise time she (or he) wrote a given tweet.

Regular readers of this blog can probably guess where this is going: next, I regress the measures of hedonic state on a flexible function of temperature. The benefit of using this dataset is that I can control very carefully for unobserved correlates of both temperature and mood using this method, since I can include both spatial and temporal fixed effects that take out unobserved variation across both space and time. Here's a hypothetical example of why this might be important: Minnesota is much colder than Alabama. However, Minnesotans are on average considerably wealthier than Alabamians. If the way people express themselves is a function of wealth, then a statistical model that compares temperature to mood without accounting for income differences is likely to be biased. Of course, income isn't the only possible confound. In order to control for all time-invariant covariates (like income, which is mostly invariant in my sample), I include area fixed effects. This means that instead of comparing Alabama to Minnesota, I compare Minnesota on a hot day to Minnesota on a cool day and Alabama on a hot day to Alabama on a coolday. I control similarly for seasonal differences. This allows me estimate the hedonic response to changes in temperature with a high degree of confidence that the response I document is the same one I would find if I were able to randomly assign different temperatures to different people and observe their response. You can see the findings in the figure below:

Response functions: effects of daily temperature on measures of happiness

I find that people strongly disprefer hot temperatures, which is perhaps little surprise to anyone who has spent time in midwestern summers. To give you a sense for how big these effects are, the size of the difference in hedonic state between a 60-70 degree F day and a 80-90 degree day is roughly equivalent to the average difference in hedonic state that I observe between a Sunday and a Monday.

What's odd is that I don't find a similar dispreference for cooler temperatures, as some other work does. However, the standard errors are large so I can't reject either a large positive or a large negative response. I show in the paper that part of the difference has to do with seasonal preferences. Speculatively, I suspect that it may also be due to people in cold weather having more adaptation margins: they can go inside or to put on more clothes, whereas people in warm temperatures may not people able to further adjust to those temperatures, either because they don't have air conditioning or because they can't take off any more clothes. Since this finding differs from other work, I'm continuing to think about this.

Since one of my research interests is in understanding how adaptation behavior might interact with the effects of climate change, I next use the density of my data to estimate different response functions for different regions of the country. For the sake of argument, I assume that differences in these response functions are driven solely by the historical temperature regimes experienced by their residents. Next, I combine the response functions I estimate with projections of end-of-century climate change to understand how this might affect different areas in the United States. I show that allowing for different degrees of adaptation has substantially different implications for the net hedonic cost of a changing climate. Below is a map from the paper, showing change in the average hedonic we would expect if the end-of-century temperature changes occurred tomorrow, but allowed for adaptation. For some areas, the size of the projected changes are similar to turning all of our Saturdays and Sundays into Mondays.

Projected changes in hedonic state

If you're interested in reading more, the working paper is available here. I'd love to hear your thoughts.

This is a guest post by Patrick Baylis, a doctoral candidate at UC Berkeley.


  1. Interesting work!

    I have a little trouble interpreting your graphs. What's the shaded area? A confidence interval or something like that? And why is it 0 at +/- 65F?

    And what's on the y-axis? Standardized effect size (e.g., 1 = 1 SD)? If so, those strike me as rather small effects, rather than large ones, not really of practical significance. It's still interesting that you can find the differences though.

  2. Hi Jonas, thanks for reading and for the good questions.

    Yes, the shaded area is the 95% confidence interval. The graphs are 0 at 65F because I normalize the response function to be 0 at the 60-70F bin. So you can think of the size of the point estimate for, say, 80-90 degrees as representing the difference in hedonic state caused by changing the temperature of a given day from 60-70F to 80-90F.

    You are correct that 1 = 1SD. 0.01SD seemed small to me as well when I first ran these regressions. But, it turns out that the hedonic difference between a Sunday and Monday is actually only about 0.01SD as well! Also, the hedonic difference between a local football team winning and losing is a little bit over 0.01SD. I take from this that there's quite a bit of variation in the measures I use, which probably both reflects underlying variation in hedonic state along with variation in the measures. Both likely contribute to making the standardized effect sizes appear small, which is why I think it's useful to compare them to other things (like day of week) that we have a clearer intuitive sense for. Thanks for reading!

  3. I always thought that Hawai`i's high happiness ranking had something to do with the weather. But maybe it's something else.