Following up on yesterday's post, let's look at the differences in how to calculate degree days. Recall that degree days just count the number of degrees above a threshold and sum them over the growing season. Massetti et al. argue in their abstract that "The paper shows that [...] hypotheses of the degree day literature fail when accurate measures of degree days are used." This claim is attributed to the fact that Massetti et al. supposedly use better data and hence get more accurate readings of degree days, however, no empirical evidence is provided. They use data from the North American Regional Reanalysis (NARR) that provides temperatures at 3-hour intervals. The authors proceed to first calculate average temperatures for each day from the eight readings per day, and then calculate degree days as the difference of the average temperature to the threshold.
Before we compare their method to calculating degree days to ours, a few words on the NARR data. Reanalysis data combine observational data with differential equations from physical models to interpolate data. For example, they utilize mass and energy balance, i.e., a certain amount of moisture can only fall once at precipitation. If precipitation comes down in one grid, it can't also come down in a neighboring grid. On the plus side, the physical models construct an entire series of data (solar radiation, dew point, fluxes, etc) that normal weather stations do not measure. On the downside, the imposed differential equations that relate all weather measures imply that interpolated data do not always match actual observations.
So how do the degree days in Massetti et al. compare to ours? Here's a little detour on degree days - this is a bit technical and dry, so please be patient. The first statistical study my coauthors and I published using degree days in 2006 used monthly temperature data since we did not have daily temperature data at the time. Since degree days depend how many times a temperature threshold is passed, monthly averages can be a challenge as a temporal average will hide how many times a threshold is passed. The literature has gotten around this problem by estimating an empirical link between the standard deviation in daily and monthly temperatures, called Thom's formula. We used this formula was used to derive fluctuations in average daily temperatures to derive degree days.
The interpolation of the temperature distribution when only knowing monthly averages is certainly not ideal, and we hence went through great length to better approximate the temperature distribution. All of my subsequent work with various coauthors hence not only looked at the distribution of daily average temperatures within a month, but went one step further by looking at the temperature distribution within a day. The rational is that even if average daily temperatures do not cross a threshold, the daily maximum might. We interpolated daily maximum and minimum temperature, and fit a sinusoidal curve between the two to approximate the distribution within a day (See Snyder). This is again an interpolation and might have its own pitfalls, but one can empirically test whether it improves predictive power, which we did and will do for part 3 of this series.
Here is my beef with Massetti et al: Our subsequent work in 2008 showed that calculating degree days using the within-day distribution of temperatures is much better. We even emphasize that in a panel setting average temperatures perform better than degree days derived using Thom's formula (but not in the cross-section as the Thom's approximation works much better at getting average number of degree days correct than year-to-year fluctuations around the mean). What I find disingenuous in the Massetti et al. is that it makes a general statement about comparing degree days to average temperature, yet only discusses the inferior approach for calculating degree days using Thom's formula. What makes things worse is that we shared our "better" degree days data that uses the within day distribution with them (which they acknowledge).
Unfortunately, Massetti et al. decided not to share their data with us, so the analysis below uses our construction of their variables. We downloaded surface temperature from NARR. The reanalysis data provides temperature readings at several altitude levels above ground, and in general, the higher the reading above the ground, the lower temperatures, which will result in lower degree day numbers.
The following table constructs degree days for counties east of the 100 degree meridian in various ways. Columns (1a)-(3b) use the NARR data, while column (4b) uses our 2008 procedure. Columns (a) use the approach of Massetti et al. and derive the climate in a county as the inverse-distance weighted average of the four NARR grids surrounding a county centroid. Columns (b) calculate degree days for each 2.5x2.5mile PRISM grid within a county (squared inverse-distance weighted average of all NARR grids over the US) and derives the county aggregate as the weighted average of all grids where the weight is proportional to the cropland area in a county. Results don't differ much between (a) and (b).
Columns (1a)-(1b) follow Massetti et al. and first derive average daily temperatures and degree days using daily averages, i.e., degree days are only positive if the daily average exceeds the threshold. Columns (2a)-(2b) calculate degree days for each 3-hour reading. Degree days will be positive if part of the temperature distribution is above the threshold, but not the daily average. Columns (3a)-(3b) approximate the temperature distribution within a day by linearly interpolating between the 3-hour measures. Column (4b) uses a sinusoidal approximation between the daily minimum and maximum to approximate the temperature distribution within a day.
As a result of the lower within-county variation, fluctuations are lower and hence the threshold is passed less often in column (4b). Extreme heat as measured by degree days above 29C or 34C are hence lower when the within-day distribution is use din column (4b) compared to columns (2a)-(3b). There are two possible interpretation: either our data is over-smoothing and hence under-predicting the variance, or NARR has measurement error which will lead to attenuation bias. We will test both possible theories in part 3 tomorrow.