Tuesday, August 27, 2019

On the efficacy of the "sniff test" for understanding climate impacts


Economists (and other high-statured people) often pride themselves on having good noses.  When encountering a research finding, they can often apply a quick "sniff test" to evaluate whether the results of this finding are likely true or false.

We encounter the sniff test frequently when we discuss our estimates of the potential impacts of future warming on domestic or global economic output. Economic orthodoxy has has long held that 3-4C of warming will reduce global GDP by a few percentage points over the next century relative to a world were temperature was held fixed.  These estimates are based largely on earlier cross-sectional studies relating average output to average temperature, and have provided critical input to the construction of "damage functions" used in benchmark integrated assessment models.  Output from these models, in turn, have played a pivotal role in policy-making around the social cost of carbon (e.g. see here).

Newer analysis done by your G-FEED bloggers and others using panel based methods have found much larger potential effects of warming on output (e.g. see here), e.g. ~20% loss of global GDP in a +4C world, relative to a world where temperature had stayed fixed.  Because these estimates are so much larger than previous estimates, they have frequently failed the sniff test, both publicly and in referee reports we've received.  Some people have told us the results just seem "too big".  No way can climate change make us 20% poorer than we would have been without it.

Rather than adjudicating the methods underlying these newer results, I want to evaluate this sniff test itself by simply looking at some historical data.  This is in the spirit of Sol's last post: before we argue so strongly for a particular approach or result based on our priors, how bout we just look at some data.  Is there historical evidence for some regions performing 20% better than others over century or shorter time scales?

Exhibit 1:  let's just look at income growth in US states.  Below is a plot of real per capita income growth since 1980 in selected US states, with the y-axis normalized to show the % change in income relative to 1980.  Even in this relatively short 35-year time span (a third of a century, for those scoring at home), there is huge variation in performance: income in Massachusetts grew >130% over this period, while incomes in Nevada grew only 30%.  Even the difference between coastal-elite Massachusetts and California over this 35-year period exceeds 20%.  So variation in performance of around 20% tis clearly within our very recent historical experience in this country, on a time scale much shorter than a century.
Data: Bureau of Labor Statistics

Ah, you say, but this is just the US, and we know we have rampant inequality in this country, etc etc.  This variation can't be representative of the rest of the world, right?

Okay:  below is the plot for a small sample of OECD countries, using real GDP/cap from World Bank WDI.  Again, plotting data since only 1980, there's been a ton of variation in observed growth in per capita incomes, with neighboring high-income countries varying by way more than 20% in how much they've grown.  The plot is of course way more stark if you include developing countries, with some countries only growing a few percentage points over the period, and others (e.g. China) growing by >1000% (and screwing up the axes of any plot you try to make...).  So:  massive variation at the country level within just a few decades.
Data: World Bank World Development Indicators

Ah, you say, this is just ~35 years of data -- way too short a time scale to look at these long run trends.  Can't possibly be true for longer time scales, right? Don't countries converge on longer time scales?

Okay:  below is a plot of real per capita growth rates over multiple centuries, based on the Maddison data, for a random selection of countries with data back to 1800.  At this time scale, you see absolutely gargantuan differences in changes in per capita incomes.  German incomes grew nearly 5000% since 1800, Portuguese incomes grew 2000%, while South African incomes grew only about 500%.
Data: Maddison Project
These century scale differences in income over time are roughly two orders of magnitude larger than what our climate impact estimates would predict to be the difference between unmitigated climate change, and a world where warming didn't happen.

So in light of these historical data, can we please do away with the vaunted "sniff test"?  Our estimates or climate impacts are small relative to the variation in historical performance we've seen, and that's true whether you look within countries over recent decades, or (even more so) across countries over very long time periods.  None of this, of course, tells us whether climate change will reduce economic output by 20%;  it just tells us that we cannot reject a 20% estimate out of hand for being "way too big".

But this begs the question:  why are our noses so bad?  Why is the sniff test so un-efficacious? Perhaps we continue to under-appreciate the power of compounding.  A key result from our earlier paper (building on key earlier work by Dell, Jones, and Olken, and shown by multiple other groups as well, see here, here, here), is that changes in temperature can affect the growth rate of GDP.  And small effects on the growth rate can have large effects on the level of income over long time scales.

E.g. consider an economy growing at 2% for a century (the US, roughly).  Knocking only a quarter of a percent off the growth rate -- i.e. growing at 1.75% instead of 2% -- leads to an economy that's >20% poorer than it would have been after 100 years.  Very small growth effects can lead to large level effects over long times scales.

Another analogy (h/t to my colleague and co-author Noah Diffenbaugh) is the continued success of managed mutual funds, despite the clear evidence that the small management fees charged by these funds leads to huge reductions in your accumulated investment over longer time scales, relative to investing in index funds with no fees.  Just as small fees eat away at your total portfolio earnings, small hits to the growth rate cumulate into large level effects down the line.  But the mutual fund industry in the US is $18 trillion (!!).

So let's continue to argue about methodological issues and about how best to understand the response of aggregate output to warming.  More posts from us on that soon.  But, please, can we stop applying the sniff test to these results?  Our noses work less well than we think.

Monday, August 26, 2019

Do GDP growth rates have trends? Evidence from GDP growth data

Over the years, there have a been a handful of critiques of some of our prior research on GDP growth centering on the concern that we model country-specific time trends in growth rates. This concern seems to be experiencing a resurgence, due in part to an RFF working paper by Richard Newell, Brian Prest, and Steven Sexton which claims that including trends in GDP growth regressions is "overfitting" (I'll probably post about this paper in greater depth later). In addition, Richard Rosen recently posted a PNAS comment asserting that this approach was obviously flawed, and Richard Tol has been tweeting something similar.

So is it obviously wrong to account for trends in growth rates when looking at GDP growth data?  One way to settle this would be to look at some GDP growth data.

Grabbing the GDP growth data from our 2015 paper, I write a single (very boring) command that regresses GDP per capita growth rates for the USA on time and a constant (for the period 1960-2013).
reg TotGDPgrowthCap year if iso == "USA"
The result is seeing that GDP growth for the USA has been declining by a steady 0.043% per year per year (p-value = 0.018):


So it certainly seems that growth is unlikely to be constant over time. (Remember that a trend in growth rates means that log(income) over time is not linear, but curved.) Newell et al. advocate for treating the above time series as if it is centered around a constant (not a downward trend), which seems at odds with the data itself, the macro-economic theory of convergence, as well as the findings of the Newell et al analysis itself (more on that in the later post). Nowhere in the Newell et al. paper, the Rosen comment, or the Tol tweets did the authors look at the data and ask if there was statistical evidence of a trend.

The modeling approach in our previous work goes further than this, however. For example, in our 2015 paper we allow each country to exhibit a growth rate that has a (potentially) quadratic country-specific time trend.  This modeling choice was not arbitrary, but actually resulted from looking at our data fairly carefully.

First, countries sometimes have trends in growth that are not linear. As shown above, the trend in USA growth is extremely linear, so in a quadratic model the coefficient on the quadratic term will basically be zero and have no effect on the model (it's precisely -2.89e-06 if you try it with the USA data). But in other cases, the trend is clearly not linear. Take China for example: economic policies in China have changed dramatically over the last several decades, and the rate of these changes has itself not been steady, so the rate of changes in growth rates might itself change (implying curvature in a trend). This turns out to be reasonable intuition, and the quadratic term in a China-GDP time series regression proves to be nonzero and significant.

Second, different countries have different trends in growth, so it's important that trends in growth are country-specific. This too is pretty obvious if one looks at GDP growth data. For example, India's growth rate has been increasing over time (0.1% per year per year, p-value < 0.000) while the USA's growth rate has been declining (-0.043% per year per year, p-value = 0.018). A simple F-test strongly rejects the hypothesis that they are the same trend (p-value < 0.000). Newell et al. and others seems to advocate for models that assume trends are the same across all (or many) countries without testing if there is statistical evidence that these trends are different. Halvard Buhaug made a similar error in his critique of some of Marshall and David's work on climate and conflict (Kyle Meng and I demonstrated the single-command F-test Buhaug should have used in a PNAS paper).  Here are plots from four major economies demonstrating the diversity of trends that show up very clearly in the data:



Taken together, the data suggest that there is strong evidence that (1) there are nonzero trends in GDP growth, (2) these trends are not the same across countries, (3) these trends are not linear in many cases.

The safest way to deal with these facts (at least in the climate-economy contexts that we study) is to allow trends in growth rates to be modeled flexibly so that they can be different across countries and they can be curved if the data suggest they should be. There are multiple ways to do that, and country-specific quadratic trends is one straightforward way to do it. If the trends are actually linear, they will be modeled as basically linear (statistical software may actually drop the quadratic term if trends are linear enough). And if they are are the same across countries, they will be estimated to be the same. But if they are in fact different across countries, or curved over time, the model will not erroneously assume they are homogenous and constant.

I think the meta-moral of the story is to simply look at your data and listen to what it is telling you.  F-tests and the "plot" command are both compelling and often underutilized.