G-FEED: daily weather data

Showing posts with label daily weather data. Show all posts

Thursday, July 6, 2017

Yesterday's maximum temperature is... today's maximum temperature? (Guest post by Patrick Baylis)

[For you climate-data-wrangling nerds out there, today we bring you a guest post by Patrick Baylis, current Stanford postdoc and soon-to-be assistant prof at UBC this fall. ]

You may not know this, but Kahlil Gibran was actually thinking about weather data when he wrote that yesterday is but today’s memory, and tomorrow is today’s dream. (Okay, not really.)

Bad literary references aside, readers of this blog know that climate economists project the impacts of climate change is by observing the relationships between historical weather realizations and economic outcomes. Fellow former ARE PhD Kendon Bell alerted me to an idiosyncrasy in one of the weather datasets we frequently use in our analyses. Since many of us (myself included) rely on high-quality daily weather data to do our work, I investigated. This post is a fairly deep dive into what I learned, so if you happen to not be interested in the minutiae of daily weather data, consider yourself warned.

The PRISM AN81-d dataset is daily minimum and maximum temperatures, precipitation, and minimum and maximum vapor pressure deficit data for the continental United States from 1981 to present. It is created by the PRISM Climate Group at Oregon State, and it is really nice. Why? It’s a gridded data product: it is composed of hundreds of thousands of 4km by 4km grid cells, where the values for each cell are determined by a complex interpolation method from weather station data (GHCN-D) that accounts for topological factors. Importantly, it’s consistent: there are no discontinuous jumps in the data (see figure below) and it’s a balanced panel: the observations are never missing.

Source: PRISM Climate Group

These benefits are well-understood, and as a result many researchers use the PRISM dataset for their statistical models. However, there is a particularity of these data that may be important to researchers making use of the daily variation in the data: most measurements of temperature maximums, and some measurements of temperature minimums, actually refer to the maximum or minimum temperature of the day before the date listed.

To understand this, you have to understand that daily climate observations are actually summaries of many within-day observations. The reported maximum and minimum temperature are just the maximum and minimum temperature observations within a given period, like a day. The tricky part is that stations define a “day” as “the 24 hours since I previously recorded the daily summary”, but not all stations record their summaries at the same time. While most U.S. stations record in the morning (i.e, “morning observers”), a hefty proportion of stations are either afternoon or evening observers. PRISM aggregates data from these daily summaries, but in order to ensure consistency tries to only incorporate morning observers. This leads to the definition of a “PRISM day”. The PRISM documentation defines a “PRISM day” as:

Station data used in AN81d are screened for adherence to a “PRISM day” criterion. A PRISM day is defined as 1200 UTC-1200 UTC (e.g., 7 AM-7AM EST), which is the same as the [the National Weather Service’s hydrologic day]. Once-per day observation times must fall within +/- 4 hours of the PRISM day to be included in the AN81d tmax and tmin datasets. Stations without reported observation times in the NCEI GHCN-D database are currently assumed to adhere to the PRISM day criterion. The dataset uses a day-ending naming convention, e.g., a day ending at 1200 UTC on 1 January is labeled 1 January.

This definition means that generally only morning observers should be included in the data. The last sentence is important: because a day runs from 4am-4am PST (or 7am-7am EST) and because days are labeled using the endpoint of that time period, most of the observations from which the daily measures are constructed for a given date are taken from the day prior. A diagram may be helpful here:

The above is a plot of temperature over about two days, representing a possible set of within-day monitor data. Let’s say that this station takes a morning reading at 7am PST (10am EST), meaning that this station would be included in the PRISM dataset. The top x-axis is the actual date, while the bottom x axis shows which observations are used under the PRISM day definition. The red lines are actual midnights, the dark green dotted line is the PRISM day definition cutoff and the orange (blue) dots in the diagram are the observations that represent the true maximums (minimums) of that calendar day. Because of the definition of a PRISM day, the maximum temperatures (“tmax”s from here on out) given for Tuesday and Wednesday (in PRISM) are actually observations recorded on Monday and Tuesday, respectively. On the other hand, the minimum temperatures (“tmin”s) given for Tuesday (in PRISM) is actually drawn from Tuesday, but the tmin given for Wednesday (in PRISM) is also from Tuesday.

To see this visually, I pulled the GHCN data and plotted a histogram of the average reporting time by station for the stations that report observation time (66% in the United States). The histogram below shows the average observation time by stations for all GHCN-D stations in the continental United States in UTC, colored by whether or not they would be included in PRISM according to the guidelines given above.

This confirms what I asserted above: most, but not all, GHCN-D stations are morning observers, and the PRISM day definition does a good job capturing the bulk of that distribution. On average, stations fulfilling the PRISM criterion report at 7:25am or so.

The next step is to look empirically at how many minimum and maximum temperature observations are likely to fall before or after the observation time cutoff. To do that, we need some raw weather station data, which I pulled from NOAA’s Quality Controlled Local Climatological Data(QCLCD). To get a sense for which extreme temperatures would be reported as occurring on the actual day they occurred, I assumed that all stations would report at 7:25am, the average observation time in the PRISM dataset. The next two figures show histograms of observed maximum and minimum temperatures.

Histogram of observed maximum temperatures

Histogram of observed minimum temperatures

I’ve colored the histograms so that all extremes (tmins and tmaxes) after 7:25am are red, indicating that extremes after that time will be reported as occurring during the following day. As expected, the vast majority of tmaxes (>94%) occur after 7:25am. But surprisingly, a good portion (32%) of tmins do as well. If you’re concerned about the large number of minimum temperature observations around midnight, remember that a midnight-to-midnight summary is likely to have this sort of “bump”, since days with a warmer-than-usual morning and a colder-than-usual night will have their lowest temperature at the end of the calendar day.

As a more direct check, I compared PRISM leads of tmin and tmax to daily aggregates (that I computed using a local calendar date definition) of the raw QCLCD data described above. The table below shows the pairwise correlations between the PRISM day-of observations, leads (next day), and the QCLCD data for both maximum and minimum daily temperature.

Measure	PRISM day-of	PRISM lead
tmin (calendar)	0.962	0.978
tmax (calendar)	0.934	0.992

As you can see, the the PRISM leads, i.e., observations from the next day, correlated more strongly with my aggregated data. The difference was substantial for tmax, as expected. The result for tmin is surprising: it also correlates more strongly with the PRISM tmin lead. I’m not quite sure what to make of this - it may be that the stations who fail to report their observation times and the occasions when the minimum temperature occurs after the station observation time are combining to make the lead of tmin correlate more closely with the local daily summaries I’ve computed. But I’d love to hear other explanations.

So who should be concerned about this? Mostly, researchers with econometric models that use daily variation in temperature on the right-hand side, and fairly high frequency variables on the left-hand side. The PRISM group isn’t doing anything wrong, and I’m sure that folks who specialize in working with weather datasets are very familiar with this particular limitation. Their definition matches a widely used definition of how to appropriately summarize daily weather observations, and presumably they’ve thought carefully about the consequences of this definition and of including more data from stations who don’t report their observation times. But researchers who, like me, are not specialists in using meteorological data and who, like me, use PRISM to examine at daily relationships between weather and economics outcomes, should tread carefully.

As is, using the PRISM daily tmax data amounts to estimating a model that includes lagged rather than day-of temperature. A quick fix, particularly for models that include only maximum temperature, is to simply use the leads, or the observed weather for the next day, since it will almost always reflect the maximum temperature for the day of interest. A less-quick fix is to make use of the whole distribution using the raw monitor data, but then you would lose the nice gridded quality of the PRISM data. Models with average or minimum temperature should, at the very least, tested for robustness with the lead values. Let’s all do Gibran proud.

Tuesday, August 18, 2015

Daily or monthly weather data?

We’ve had a few really hot days here in California. It won’t surprise readers of this blog to know the heat has made Marshall unusually violent and Sol unusually unproductive. They practice what they preach. Apart from that, it’s gotten me thinking back to a common issue in our line of work - getting “good” measures of heat exposure. It’s become quite popular to be as precise as possible in doing this – using daily or even hourly measures of temperature to construct things like ‘extreme degree days’ or ‘killing degree days’ (I don’t really like the latter term, but that’s beside the point for now).

I’m all for precision when it is possible, but the reality is that in many parts of the world we still don’t have good daily measures of temperature, at least not for many locations. But in many cases there are more reliable measures of monthly than daily temperatures. For example, the CRU has gridded time series of monthly average max and min temperature at 0.5 degree resolution.

It seems a common view is that you can’t expect to do too well with these “coarse” temporal aggregates. But I’m going to go out on a limb and say that sometimes you can. Or at least I think the difference has been overblown, probably because many of the comparisons between monthly and daily weather show the latter working much better. But I think it’s overlooked that most comparisons of regressions using monthly and daily measures of heat have not been a fair fight.

What do I mean? On the one hand, you typically have the daily or hourly measures of heat, such as extreme degree days (EDD) or temperature exposure in individual bins of temperature. Then they enter into some fancy pants model that fits a spline or some other flexible function that capture all sorts of nonlinearities and asymmetries. Then on the other hand, for comparison you have a model with a quadratic response to growing season average temperature. I’m not trying to belittle the fancy approaches (I bin just as much as the next guy), but we should at least give the monthly data a fighting chance. We often restrict it to growing season rather than monthly averages, often using average daily temperatures rather than average maximums and minimums, and, most importantly, we often impose symmetry by using a quadratic. Maybe this is just out of habit, or maybe it’s the soft bigotry of low expectations for those poor monthly data.

As an example, suppose, as we’ve discussed in various other posts, that the best predictor of corn yields in the U.S. is exposure to very high temperatures during July. In particular, suppose that degree days above 30°C (EDD) is the best. Below I show the correlation of this daily measure for a site in Iowa with various growing season and monthly averages. You can see that average season temperature isn’t so good, but July average is a bit better, and July average daily maximum even better. In other words, if a month has a lot of really hot days, then that month's average daily maximum is likely to be pretty high.

You can also see that the relationship isn’t exactly linear. So a model with yields vs. any of these monthly or growing season averages likely wouldn’t do as well as EDD if the monthly data entered in as a linear or quadratic response. But as I described in an old post that I’m pretty sure no one has ever read, one can instead define simple assymetric hinge functions based on monthly temperature and rainfall. In the case of U.S. corn, I suggested these three based on a model fit to simulated data:

This is now what I’d consider more of a fair fight between daily and monthly data. The table below is from what I posted before. It compares the out-of-sample skill of a model using two daily-based measures (GDD and EDD), to a model using the three monthly-based hinge functions above. Both models include county fixed effects and quadratic time trends. In this particular case, the monthly model (3) even works slightly better than the daily model (2). I suspect the fact it’s even better relates less to temperature terms than to the fact that model (2) uses a quadratic in growing season rainfall, which is probably less appropriate than the more assymetric hinge function – which says yields respond up to 450mm of rain and are flat afterwards.

Model	Calibration R²	Average root mean square error for calibration	Average root mean square error for out-of-sample data (for 500 runs)	% reduction in out-of-sample error
1	0.59	0.270	.285	--
2	0.66	0.241	.259	8.9
3*	0.68	0.235	.254	10.7

Overall, the point is that monthly data may not be so much worse than daily for many applications. I’m sure we can find some examples where it is, but in many important examples it won’t be. I think this is good news given how often we can’t get good daily data. Of course, there’s a chance the heat is making me crazy and I’m wrong about all this. Hopefully at least I've provoked the others to post some counter-examples. There's nothing like a good old fashioned conflict on a hot day.

Saturday, April 12, 2014

Daily weather data: original vs knock-off

Any study that focuses on nonlinear temperature effects requires precise estimates of the exact temperature distribution. Unfortunately, most gridded weather data sets only give monthly estimates (e.g., CRU, University of Delaware, and up until recently PRISM). Monthly averages can hide extremes - both hot and cold. Monthly means don't capture how often and by how much temperatures pass a certain threshold.

At the time Michael Roberts and I wrote our article on nonlinear temperature effects in agriculture, the PRISM climate group only made its monthly aggregates publicly available for download, but not the underlying daily data. In the end we hence reverse-engineered the PRISM interpolation algorithm, i.e., we regressed monthly averages at each PRISM grid on monthly averages at the (7 or 10, depends on the version) closest weather stations that are publicly available. Once we had the regression estimates linking monthly PRISM averages to weather stations, we bravely applied them to the daily weather data at the stations to get daily data at the PRISM cells (for more detail, see the paper). Cross-validation suggested we weren't that far off, but then again, we only could do cross-validation tests in areas that have weather stations.

Recently, the PRISM climate group made their daily data available from the 1980s onwards. I finally got a chance to download them and compare them to the daily data we previously had constructed from monthly averages. This was quiet a nerve-wrecking exercise: how far were we off and does it change the results - or in the worst case, did I screw up the code and got garbage for our previous paper?

Below is a table that summarizes PRISM's daily data for the growing season (April-September) in all counties east of the 100 degree meridian except Florida that either grow corn or soybeans, basically the set of counties we had used in our study (small change: our study used 1980-2005, but since PRISM's daily data is only available from 1981 onwards, the tables below use 1981-2012). The summary statistics are:

First sigh of relieve! It looks like the numbers are rather close (strangely enough, the biggest deviations seems to be for precipitation, yet we used PRISM's monthly aggregates to derive season-totals and did not rely on any interpolation, so the new daily PRISM data is a bit different from the old PRISM data). Also, recall from a recent post that looked at the NARR data that degrees above 29C can differ a lot between data sets, as small differences in the daily maximum temperature will give vastly different results.

Next, I plugged both data sets into a panel of corn and soybean yields to see which one explains those yields better (i) in sample; and (ii) out of sample. I used models using only temperature variables (columns a and b) as well as models using the same four weather variables we used before (columns c and d). PRISM's daily data is used in columns a and c, our re-engineered data are in columns b and d:

Second sigh of relief: It seems to be rather close again. In all four comparisons (1b) to (1a), (1d) to (1c), (2b) to (2a), and (2d) to (2c), our reconstruction for some strange reason has a larger in-sample R-square. The reduction in RMSE is given in the second row of the footer: it is the reduction in out-of sample prediction error compared to a model with no weather variables. I take 1000 times 80% of the data as estimation sample and derive the prediction error for the remaining 20%. The given number is the average of the 1000 draws. For RMSE reductions, the picture is mixed: for the corn models that only include the two degree days variables, the PRISM daily data does slightly better, while the reverse is true for soybeans. In models that also include precipitation, the construction of season-total precipitation seems to do better when I added the monthly PRISM totals (columns d) rather than adding the new daily PRISM precipitation totals (columns c).

Finally, since the data we constructed is a knock-off, how can it do better than the original in some cases? My wild guess (and this is really only speculation) is that we took great care in filling in missing data for weather stations to get a balanced panel. That way we insured that year-to-year fluctuations are not due to fact that one averages over a different set of stations. I am not aware how exactly PRISM deals with missing weather station data.

Pages