As a fairly half-assed exerciser, I am always on the lookout for quick/easy/delicious ways to get in shape without having to really do anything. Thus a lot of the exercise articles on NYTimes's "Well" blog are irresistible clickbait for me -- as they are for the presumably millions of other NYT-reading half-assed exercisers who ensure that these articles consistently top of the NYT most-read lists. Drink more caffeine and run faster! Take a hot bath and run faster! Get all the exercise you need in only seven minutes! Scratch that -- only four!! Scratch that -- only one!!!
Here's one from this week: Hot weather workout? Try a hot bath beforehand. Triple clickbait for me, as this week is really hot in California, I was hoping to get in some runs, and I certainly enjoy hot baths (+long walks on the beach, white wine, etc). Original study here (can't tell if that's paywalled), published in the Journal of Strength and Conditioning Research. This study takes n=9 (!) people, 8 dudes and 1 woman, and first has them run 5k on a treadmill in a lab where they cranked the temperature to 90F. Then all subjects underwent about a week of heat acclimation, where they pedaled a stationary bike in the hot lab for five 90-minute sessions. Then, finally, they ran the 5k again in the hot lab. Low and behold, times had improved! The now-heat acclimated participants shaved an average of about a minute and a half off their 5k times (~ 6% reduction) when running in hot conditions for the second time.
But wait a sec. You took some mildly fit runners (5k time of 25min is not exactly East African pace), had them put in almost 8 hours of training, and then tested to see whether they got faster?? As a lazy, mildly-fit runner myself, 8 hours of training constitutes a good month for me, so I would be PISSED if I didn't get faster with that much training. Now, you might say, they were pedaling a bike and not running, and this is indeed what the paper tries to argue in some hard-to-understand phrasing ("Cycling training controlled for performance that could arise from increased training volume were participants to acclimate through running"). But to me it seems pretty unlikely that a lot of biking is going to give you no running benefit at all, and some quick googling found about 1000 blog posts by runner/bikers that claim that it does (and also 1000 that claim that is doesn't, go figure). In any case, clearly this is not anywhere close to the optimal research design you'd want for figuring out the causal effect of training-when-hot on performing-when-hot. Strangely, the "hot bath" part does not appear in the paper at all, but just in an off-hand comment by a research quoted by the NYT. So that's weird too.
Or take the one-minute-of-exercise-is-all-you-need study (original study; NYT clickbait). This study takes n=25 "sedentary" men and divides them into three groups: a control group (n=6), a group that does moderate-intensity stationary-bike pedaling for 45min at a time (n=10), and a group that does high-intensity ("all out") stationary-bike pedaling for 3x 20 seconds (n=9). Only 60 total seconds! This happens three times a week for 12 weeks, after which researchers compare various measures of fitness, including how much oxygen you can take up at your peak (i.e. VO2-max). Both the moderate (VO2 = 3.2) and intense groups (VO2=3.0) had improved significantly upon the control group (VO2 = 2.5), but the post-training VO2max levels in the moderate and intense groups were not statistically different from each other. Hence the paper's exciting title: "Twelve Weeks of Sprint Interval Training Improves Indices of Cardiometabolic Health Similar to Traditional Endurance Training despite a Five-Fold Lower Exercise Volume and Time Commitment", and the NYT clickbait translation: "1 Minute of All-Out Exercise May Have Benefits of 45 Minutes of Moderate Exertion".
But wait a sec. Sample size is n=19 in these treatment groups combined! A quick calculation suggests that, at this sample size, the study is DRAMATICALLY underpowered to find an effect. That is, the study has a high chance of failing to find an effect when there actually is one. I calculate that to detect a significant difference in means of 0.2 between these two groups (on a control group standard deviation of 0.7, given in the table), the study would have needed 388 participants, or about 20x what they had! (This assumes the researchers would want an 80% chance of correctly rejecting the null that the two groups had the same mean; in Stata, if I did it right: power twomeans 3.2 3.0, sd(0.7)). Even reliably detecting a 20% increase in VO2 max between the two treated groups would have needed 46 participants, more than twice the number they had. Put more simply: with sample sizes this small, you are very unlikely to find significant differences between two groups, even when differences actually exist. So maybe the moderate exercise group actually did do better, or maybe they didn't, but either way this study can't tell us. The same thing appears to be true with the 4-min-exercise paper (n=26) -- it's way underpowered. And I haven't looked systematically, but my guess is this is true of a lot of the studies they cover that find no effect. Andrew Gelman is always grumping about studies with large effects, but we should probably be just as cautious believing small-N studies that find no effect.
So should the NYT stop covering these underpowered or poorly designed studies? There's not a lot at stake here I guess, so one reasonable reaction is, "who gives a sh*t"? But surely this sort of coverage crowds out coverage of other higher-quality science, such as why your roller-bag wheels wobble so much when you're running to catch a flight, or why eggs are egg-shaped. And cutting down on this crappy exercise coverage will save me the roughly 20min/week of self-loathing I feel when I click on yet another too-good-to-be-true NYT exercise article, look up the article it actually references, and find my self not very convinced. Twenty minutes I could have spent exercising!! That's like 20 1-minute workouts...
image stolen from NYT article |
Here's one from this week: Hot weather workout? Try a hot bath beforehand. Triple clickbait for me, as this week is really hot in California, I was hoping to get in some runs, and I certainly enjoy hot baths (+long walks on the beach, white wine, etc). Original study here (can't tell if that's paywalled), published in the Journal of Strength and Conditioning Research. This study takes n=9 (!) people, 8 dudes and 1 woman, and first has them run 5k on a treadmill in a lab where they cranked the temperature to 90F. Then all subjects underwent about a week of heat acclimation, where they pedaled a stationary bike in the hot lab for five 90-minute sessions. Then, finally, they ran the 5k again in the hot lab. Low and behold, times had improved! The now-heat acclimated participants shaved an average of about a minute and a half off their 5k times (~ 6% reduction) when running in hot conditions for the second time.
But wait a sec. You took some mildly fit runners (5k time of 25min is not exactly East African pace), had them put in almost 8 hours of training, and then tested to see whether they got faster?? As a lazy, mildly-fit runner myself, 8 hours of training constitutes a good month for me, so I would be PISSED if I didn't get faster with that much training. Now, you might say, they were pedaling a bike and not running, and this is indeed what the paper tries to argue in some hard-to-understand phrasing ("Cycling training controlled for performance that could arise from increased training volume were participants to acclimate through running"). But to me it seems pretty unlikely that a lot of biking is going to give you no running benefit at all, and some quick googling found about 1000 blog posts by runner/bikers that claim that it does (and also 1000 that claim that is doesn't, go figure). In any case, clearly this is not anywhere close to the optimal research design you'd want for figuring out the causal effect of training-when-hot on performing-when-hot. Strangely, the "hot bath" part does not appear in the paper at all, but just in an off-hand comment by a research quoted by the NYT. So that's weird too.
Or take the one-minute-of-exercise-is-all-you-need study (original study; NYT clickbait). This study takes n=25 "sedentary" men and divides them into three groups: a control group (n=6), a group that does moderate-intensity stationary-bike pedaling for 45min at a time (n=10), and a group that does high-intensity ("all out") stationary-bike pedaling for 3x 20 seconds (n=9). Only 60 total seconds! This happens three times a week for 12 weeks, after which researchers compare various measures of fitness, including how much oxygen you can take up at your peak (i.e. VO2-max). Both the moderate (VO2 = 3.2) and intense groups (VO2=3.0) had improved significantly upon the control group (VO2 = 2.5), but the post-training VO2max levels in the moderate and intense groups were not statistically different from each other. Hence the paper's exciting title: "Twelve Weeks of Sprint Interval Training Improves Indices of Cardiometabolic Health Similar to Traditional Endurance Training despite a Five-Fold Lower Exercise Volume and Time Commitment", and the NYT clickbait translation: "1 Minute of All-Out Exercise May Have Benefits of 45 Minutes of Moderate Exertion".
But wait a sec. Sample size is n=19 in these treatment groups combined! A quick calculation suggests that, at this sample size, the study is DRAMATICALLY underpowered to find an effect. That is, the study has a high chance of failing to find an effect when there actually is one. I calculate that to detect a significant difference in means of 0.2 between these two groups (on a control group standard deviation of 0.7, given in the table), the study would have needed 388 participants, or about 20x what they had! (This assumes the researchers would want an 80% chance of correctly rejecting the null that the two groups had the same mean; in Stata, if I did it right: power twomeans 3.2 3.0, sd(0.7)). Even reliably detecting a 20% increase in VO2 max between the two treated groups would have needed 46 participants, more than twice the number they had. Put more simply: with sample sizes this small, you are very unlikely to find significant differences between two groups, even when differences actually exist. So maybe the moderate exercise group actually did do better, or maybe they didn't, but either way this study can't tell us. The same thing appears to be true with the 4-min-exercise paper (n=26) -- it's way underpowered. And I haven't looked systematically, but my guess is this is true of a lot of the studies they cover that find no effect. Andrew Gelman is always grumping about studies with large effects, but we should probably be just as cautious believing small-N studies that find no effect.
So should the NYT stop covering these underpowered or poorly designed studies? There's not a lot at stake here I guess, so one reasonable reaction is, "who gives a sh*t"? But surely this sort of coverage crowds out coverage of other higher-quality science, such as why your roller-bag wheels wobble so much when you're running to catch a flight, or why eggs are egg-shaped. And cutting down on this crappy exercise coverage will save me the roughly 20min/week of self-loathing I feel when I click on yet another too-good-to-be-true NYT exercise article, look up the article it actually references, and find my self not very convinced. Twenty minutes I could have spent exercising!! That's like 20 1-minute workouts...
Okay, my turn to rant. This post is full of lies! Marshall is anything BUT "a lazy, mildly-fit runner."
ReplyDeletesol, i was going to say the same. anyone who did R2R2R in the past few months is decidedly NOT a half-assed runner.
ReplyDeleteWell I'm glad at least our g-feed blogging team read g-feed :)
ReplyDeleteI read it too, sweetie.
ReplyDeleteHi, thanks for your blog entry. As the author of the heat training study, I got a notification about it being discussed on your blog. I hope you don't mind me responding, in the interests of accuracy, to elaborate on a couple of points.
ReplyDeleteThe study didn't seek to see if training in the heat for 5 days would work, rather it was comparing heat training against precooling, a combination of heat training + precooling & a control condition. So you are quite right - it's no surprise that they improved, the novelty was in the comparison against the other treatments - which athletes routinely use. Previously, no-one had studied which may be better or by how much.
Training studies such as this chronically suffer from a low n, predominantly because it is so time consuming to conduct studies such as this. In this instance, it took something in the region of 30 hours in the lab per person... so it is tough from both the researcher and participants perspectives.
Also please note the ~25 min running time was in the heat, all runners were <22 min standard & trained at least 3 times per week. So whilst not elite, we recruited the best runners we could.
Finally, the 5 training sessions were not all exercise. Rather, they trained for ~45 min during this period, with the rest of the time sitting, but being kept hot because of the hot environment. So yes, training volume may be a contributing factor, but I would suggest not to the extent that you describe.
For interest, the link to the hot baths originated from a discussion about a previous study we did, as hot baths can be used in combination with exercise to achieve heat acclimation (https://www.ncbi.nlm.nih.gov/pubmed/27330883), but you are completely correct that it didn't feature in this paper.
Thanks again for taking the time to read the paper.
Carl
Hi Carl - thanks a lot for the clarifications, and for the link to the earlier hot bath study. Definitely appreciate the difficulty of doing these sorts of studies. My own beef was more with the NYT coverage. For instance their article opens: "There are many ways to cope with exercising in hot weather. But one of the most effective may be, surprisingly, to soak in long, hot baths in the days beforehand.." And they then go on to describe your study that doesn't really say anything about heat acclimation relative to a control of doing nothing (and nothing about baths), but really (as you point out) draws conclusions about heat acclimation vs heat acclimation+pre-cooling. So even setting aside issues of research design, the NYT coverage doesn't make much sense.
ReplyDeleteAt any rate, thanks again for the clarifications. -Marshall