The devices are overwhelmingly popular. For instance, since its inception, the leading brand, Fitbit, has sold at least 30 million of them. The company promises on its website that the devices “track steps, distance, calories burned, floors climbed, active minutes & hourly activity.” Others, such as PulseOn, Apple Watch, Basis Peak, Samsung Gear S2 and Microsoft Band, promise the same.
A team of Stanford researchers, however, recently called foul after testing these trackers. The scientists said in a paper published Wednesday in the Journal of Personalized Medicine that though the devices purport to help users track their calories — daily energy expenditure — the number is often markedly incorrect.
The least accurate, PulseOn, was off by an average of 93 percent. The most accurate device, Fitbit Surge, was off by an average of 27 percent, the Guardian reported.
In a statement to NPR, PulseOn said the extremely high level of inaccuracy may “suggest that the authors may not have properly set all the user parameters on the device.”
The consequences of such large margins of error could, of course, be significant.
“People are basing life decisions on the data provided by these devices,” Euan Ashley, a professor of cardiovascular medicine at Stanford and co-author of the study, said in a news release.
Let’s say, as a hypothetical, some users check their device at the end of a long day and discover to their delight they burned 1,000 calories when they actually only burned 730. They might have an extra dessert or glass of wine since they think they’ve met their goal.
Over time, that adds up. In this scenario, that’s 1,890 extra calories each week the users don’t know about. Each pound of fat is composed of 3,500 calories.
“It’s just human nature,” Tim Church, professor of preventative medicine at Pennington Biomedical Research Center at Louisiana State University who wasn’t involved in the study, told NPR. “People are checking these inaccurate counts and they think they’ve earned a muffin or earned some ice cream and they’re sabotaging their weight-loss program.”
Of course, some margin of error when using a device like this is inevitable, but the scientists said it should be far lower.
“For a lay user, in a non-medical setting, we want to keep that error under 10 percent,” Anna Shcherbina, a Stanford graduate student and study co-author, said in a news release.
One of the key issues, Shcherbina hypothesized, was the difference in users’ body compositions.
The study participants included a “diversity of ages, male and female, and then also we looked at diversity of skin tone, and then size and weight to try and represent the population generally,” Ashley told the Guardian.
The devices proved most accurate for white women who were already fit, meaning “for those for whom it might matter the most, who are trying to lose weight, the error was actually greater,” Ashley told NPR, speculating that perhaps the companies only test the devices on a narrow group of people.
While the energy expenditure numbers were woefully off, Shcherbina pointed out that it’s much easier to assess heart rate, which can be measured directly and not through proxy calculations.
Indeed, Ashley said, “The heart rate measurements performed far better than expected.” Most were off by only about 5 percent.
There have long been hints that these devices aren’t useful for weight loss. A multiyear study published last September in JAMA split into two groups almost 500 people hoping to lose weight. One used fitness trackers, while the other did not.
Those with the trackers lost about 50 percent less weight than those without.
At the time, the study’s lead author, John Jakicic, a researcher of health and physical activity at the University of Pittsburgh, thought it had to do with people incorrectly interpreting the fitness trackers.
“These technologies are focused on physical activity, like taking steps and getting your heart rate up,” Jakicic told NPR. “People would say, ‘Oh, I exercised a lot today, now I can eat more.’ And they might eat more than they otherwise would have.”
The Stanford study, though, suggests that perhaps the participants were merely working with faulty data.