As a fairly half-assed exerciser, I am always on the lookout for quick/easy/delicious ways to get in shape without having to really do anything. Thus a lot of the exercise articles on NYTimes's "Well" blog are irresistible clickbait for me -- as they are for the presumably millions of other NYT-reading half-assed exercisers who ensure that these articles consistently top of the NYT most-read lists. Drink more caffeine and run faster! Take a hot bath and run faster! Get all the exercise you need in only seven minutes! Scratch that -- only four!! Scratch that -- only one!!!
Here's one from this week: Hot weather workout? Try a hot bath beforehand. Triple clickbait for me, as this week is really hot in California, I was hoping to get in some runs, and I certainly enjoy hot baths (+long walks on the beach, white wine, etc). Original study here (can't tell if that's paywalled), published in the Journal of Strength and Conditioning Research. This study takes n=9 (!) people, 8 dudes and 1 woman, and first has them run 5k on a treadmill in a lab where they cranked the temperature to 90F. Then all subjects underwent about a week of heat acclimation, where they pedaled a stationary bike in the hot lab for five 90-minute sessions. Then, finally, they ran the 5k again in the hot lab. Low and behold, times had improved! The now-heat acclimated participants shaved an average of about a minute and a half off their 5k times (~ 6% reduction) when running in hot conditions for the second time.
But wait a sec. You took some mildly fit runners (5k time of 25min is not exactly East African pace), had them put in almost 8 hours of training, and then tested to see whether they got faster?? As a lazy, mildly-fit runner myself, 8 hours of training constitutes a good month for me, so I would be PISSED if I didn't get faster with that much training. Now, you might say, they were pedaling a bike and not running, and this is indeed what the paper tries to argue in some hard-to-understand phrasing ("Cycling training controlled for performance that could arise from increased training volume were participants to acclimate through running"). But to me it seems pretty unlikely that a lot of biking is going to give you no running benefit at all, and some quick googling found about 1000 blog posts by runner/bikers that claim that it does (and also 1000 that claim that is doesn't, go figure). In any case, clearly this is not anywhere close to the optimal research design you'd want for figuring out the causal effect of training-when-hot on performing-when-hot. Strangely, the "hot bath" part does not appear in the paper at all, but just in an off-hand comment by a research quoted by the NYT. So that's weird too.
Or take the one-minute-of-exercise-is-all-you-need study (original study; NYT clickbait). This study takes n=25 "sedentary" men and divides them into three groups: a control group (n=6), a group that does moderate-intensity stationary-bike pedaling for 45min at a time (n=10), and a group that does high-intensity ("all out") stationary-bike pedaling for 3x 20 seconds (n=9). Only 60 total seconds! This happens three times a week for 12 weeks, after which researchers compare various measures of fitness, including how much oxygen you can take up at your peak (i.e. VO2-max). Both the moderate (VO2 = 3.2) and intense groups (VO2=3.0) had improved significantly upon the control group (VO2 = 2.5), but the post-training VO2max levels in the moderate and intense groups were not statistically different from each other. Hence the paper's exciting title: "Twelve Weeks of Sprint Interval Training Improves Indices of Cardiometabolic Health Similar to Traditional Endurance Training despite a Five-Fold Lower Exercise Volume and Time Commitment", and the NYT clickbait translation: "1 Minute of All-Out Exercise May Have Benefits of 45 Minutes of Moderate Exertion".
But wait a sec. Sample size is n=19 in these treatment groups combined! A quick calculation suggests that, at this sample size, the study is DRAMATICALLY underpowered to find an effect. That is, the study has a high chance of failing to find an effect when there actually is one. I calculate that to detect a significant difference in means of 0.2 between these two groups (on a control group standard deviation of 0.7, given in the table), the study would have needed 388 participants, or about 20x what they had! (This assumes the researchers would want an 80% chance of correctly rejecting the null that the two groups had the same mean; in Stata, if I did it right: power twomeans 3.2 3.0, sd(0.7)). Even reliably detecting a 20% increase in VO2 max between the two treated groups would have needed 46 participants, more than twice the number they had. Put more simply: with sample sizes this small, you are very unlikely to find significant differences between two groups, even when differences actually exist. So maybe the moderate exercise group actually did do better, or maybe they didn't, but either way this study can't tell us. The same thing appears to be true with the 4-min-exercise paper (n=26) -- it's way underpowered. And I haven't looked systematically, but my guess is this is true of a lot of the studies they cover that find no effect. Andrew Gelman is always grumping about studies with large effects, but we should probably be just as cautious believing small-N studies that find no effect.
So should the NYT stop covering these underpowered or poorly designed studies? There's not a lot at stake here I guess, so one reasonable reaction is, "who gives a sh*t"? But surely this sort of coverage crowds out coverage of other higher-quality science, such as why your roller-bag wheels wobble so much when you're running to catch a flight, or why eggs are egg-shaped. And cutting down on this crappy exercise coverage will save me the roughly 20min/week of self-loathing I feel when I click on yet another too-good-to-be-true NYT exercise article, look up the article it actually references, and find my self not very convinced. Twenty minutes I could have spent exercising!! That's like 20 1-minute workouts...
![]() |
image stolen from NYT article |
Here's one from this week: Hot weather workout? Try a hot bath beforehand. Triple clickbait for me, as this week is really hot in California, I was hoping to get in some runs, and I certainly enjoy hot baths (+long walks on the beach, white wine, etc). Original study here (can't tell if that's paywalled), published in the Journal of Strength and Conditioning Research. This study takes n=9 (!) people, 8 dudes and 1 woman, and first has them run 5k on a treadmill in a lab where they cranked the temperature to 90F. Then all subjects underwent about a week of heat acclimation, where they pedaled a stationary bike in the hot lab for five 90-minute sessions. Then, finally, they ran the 5k again in the hot lab. Low and behold, times had improved! The now-heat acclimated participants shaved an average of about a minute and a half off their 5k times (~ 6% reduction) when running in hot conditions for the second time.
But wait a sec. You took some mildly fit runners (5k time of 25min is not exactly East African pace), had them put in almost 8 hours of training, and then tested to see whether they got faster?? As a lazy, mildly-fit runner myself, 8 hours of training constitutes a good month for me, so I would be PISSED if I didn't get faster with that much training. Now, you might say, they were pedaling a bike and not running, and this is indeed what the paper tries to argue in some hard-to-understand phrasing ("Cycling training controlled for performance that could arise from increased training volume were participants to acclimate through running"). But to me it seems pretty unlikely that a lot of biking is going to give you no running benefit at all, and some quick googling found about 1000 blog posts by runner/bikers that claim that it does (and also 1000 that claim that is doesn't, go figure). In any case, clearly this is not anywhere close to the optimal research design you'd want for figuring out the causal effect of training-when-hot on performing-when-hot. Strangely, the "hot bath" part does not appear in the paper at all, but just in an off-hand comment by a research quoted by the NYT. So that's weird too.
Or take the one-minute-of-exercise-is-all-you-need study (original study; NYT clickbait). This study takes n=25 "sedentary" men and divides them into three groups: a control group (n=6), a group that does moderate-intensity stationary-bike pedaling for 45min at a time (n=10), and a group that does high-intensity ("all out") stationary-bike pedaling for 3x 20 seconds (n=9). Only 60 total seconds! This happens three times a week for 12 weeks, after which researchers compare various measures of fitness, including how much oxygen you can take up at your peak (i.e. VO2-max). Both the moderate (VO2 = 3.2) and intense groups (VO2=3.0) had improved significantly upon the control group (VO2 = 2.5), but the post-training VO2max levels in the moderate and intense groups were not statistically different from each other. Hence the paper's exciting title: "Twelve Weeks of Sprint Interval Training Improves Indices of Cardiometabolic Health Similar to Traditional Endurance Training despite a Five-Fold Lower Exercise Volume and Time Commitment", and the NYT clickbait translation: "1 Minute of All-Out Exercise May Have Benefits of 45 Minutes of Moderate Exertion".
But wait a sec. Sample size is n=19 in these treatment groups combined! A quick calculation suggests that, at this sample size, the study is DRAMATICALLY underpowered to find an effect. That is, the study has a high chance of failing to find an effect when there actually is one. I calculate that to detect a significant difference in means of 0.2 between these two groups (on a control group standard deviation of 0.7, given in the table), the study would have needed 388 participants, or about 20x what they had! (This assumes the researchers would want an 80% chance of correctly rejecting the null that the two groups had the same mean; in Stata, if I did it right: power twomeans 3.2 3.0, sd(0.7)). Even reliably detecting a 20% increase in VO2 max between the two treated groups would have needed 46 participants, more than twice the number they had. Put more simply: with sample sizes this small, you are very unlikely to find significant differences between two groups, even when differences actually exist. So maybe the moderate exercise group actually did do better, or maybe they didn't, but either way this study can't tell us. The same thing appears to be true with the 4-min-exercise paper (n=26) -- it's way underpowered. And I haven't looked systematically, but my guess is this is true of a lot of the studies they cover that find no effect. Andrew Gelman is always grumping about studies with large effects, but we should probably be just as cautious believing small-N studies that find no effect.
So should the NYT stop covering these underpowered or poorly designed studies? There's not a lot at stake here I guess, so one reasonable reaction is, "who gives a sh*t"? But surely this sort of coverage crowds out coverage of other higher-quality science, such as why your roller-bag wheels wobble so much when you're running to catch a flight, or why eggs are egg-shaped. And cutting down on this crappy exercise coverage will save me the roughly 20min/week of self-loathing I feel when I click on yet another too-good-to-be-true NYT exercise article, look up the article it actually references, and find my self not very convinced. Twenty minutes I could have spent exercising!! That's like 20 1-minute workouts...