A Food and Drug Administration advisory committee Tuesday will take up the issue of whether pulse oximeters, the ubiquitous medical devices that became a mainstay for assessing patient oxygen levels during the Covid-19 pandemic, need to be regulated differently — or even completely reconceived — based on research showing the devices are less accurate in people with darker skin.
For many, the question is what took so long.
Studies dating back to 2005 show pulse oximeters tend to overestimate the amount of oxygen a patient with darker skin may actually have in their blood. It’s simple physics: Melanin in skin absorbs some of the light the devices analyze to make their readings. The darker the skin, the more melanin there is and the less light passes through.
But for decades, this knowledge remained largely out of sight and was not acted on by manufacturers, who said the inaccuracies were minuscule and did not affect patient care. The findings weren’t published in widely read journals, weren’t taught in medical schools, and never penetrated the consciousness of most physicians who rely on the devices daily to triage patients, guide treatment decisions, and keep them safe while they are under anesthesia.
“I’m a trained pulmonary and critical care physician and was not aware of these old studies … they never made it into the textbooks I used,” said Michael Sjoding, an associate professor of internal medicine at the University of Michigan, who led a 2020 study on pulse oximeters that drew the first widespread attention to the issue of poorer accuracy in darker-skinned patients. “The fact that this went unrecognized for so long was really jarring.”
The FDA advisory committee, made up of pulmonary experts, plans to hold a nine-hour virtual meeting and will discuss the available real-world evidence on accuracy of pulse oximeters, and factors that may affect pulse oximeter accuracy. It will make recommendations for health care providers and patients, and about study design and analyses, according to meeting documents. It’s attention that many, including five U.S. senators and a number of physicians, think is overdue.
“I am horrified that these deficiencies in the pulse oximeters have been known for decades,” Uché Blackstock, a Black physician-advocate and founder of antiracist health care consulting firm Advancing Health Equity, told STAT. “I would like the FDA to explain how they have allowed deficient pulse oximeters to be used across the country on millions of patients.”
The agency conducted an analysis of recent studies and pre-market data from device manufacturers and may reassess current guidelines for the devices developed in 2013. The agency issued a safety alert in February 2021, two months after Sjoding’s study showed that people with darker skin who were receiving supplemental oxygen were more likely to have hidden hypoxemia, or lower oxygen levels, than readings on pulse oximeters suggested. The devices “have limitations and a risk of inaccuracy under certain circumstances” and require further evaluation, the FDA warning cautioned.
“Will the FDA, the people who have the stick, finally make the companies do something?” asked Valeria Valbuena, a general surgery resident at Michigan who co-authored a study showing the problems with pulse oximeters extended beyond critically ill patients, to those on general and surgical floors. “Market pressure hasn’t occurred because the FDA hasn’t said anything.”
Just why has it taken so long for this issue to gain traction within medicine? To explore this question, STAT spoke to people who will be speaking to the FDA panel or have been thinking deeply about the question. Like most things involving race in America, the answer is complex and sometimes painful. To some, it’s also a collision of science, history, and culture occurring within a society and medical system that may value convenience and low cost over the health of some of its most vulnerable patients.
The pulse oximeter was forged in a white population. The first devices, developed more than half a century ago, were built for use in high-altitude environments and tested on populations that at the time were predominantly white: fighter pilots, astronauts, and mountaineers.
But even in those early days, there was an equitable oximeter. Developed in the 1960s for use on astronauts by Hewlett-Packard, the device used eight wavelengths of light for analysis rather than the two used today. Engineers working on the device for NASA discussed ways to build a machine that would work on all skin colors. They tested their device on more than 200 Black people. But in the 1980s, when Hewlett-Packard shifted its focus to personal computing, the device was shelved.
The pulse oximeters in use today — light, convenient, and relatively cheap — have their roots in technology invented by Takuo Aoyagi, who developed and tested his device in Japan, a country with a relatively homogenous racial makeup with lighter skin.
As the devices were developed and used in the U.S., they continued to be tested, for decades, on mainly lighter-skinned people. It’s not a surprise. The lack of diversity in clinical trials and medical testing is a long-standing national problem.
The technology came into wide use in the 1980s, and revolutionized medicine by offering a way to quickly assess oxygen levels without need for a painful arterial blood draw. Early on, some studies showed skin color might affect readings on the devices, while others did not. Due to these mixed results and their own observations, leaders of the Hypoxia Research Laboratory at the University of California, San Francisco, widely respected for work on the effects of low oxygen on the body and pulse oximeters, decided to test the devices.
In a carefully controlled study published in 2005, lab scientists compared readings from 11 darkly pigmented individuals and 10 lightly pigmented individuals at various oxygen levels. They found the devices read 1% higher for dark-skinned individuals at higher oxygen levels and an average of 3% higher at lower (and more dangerous) oxygen levels. Some readings read up to 8% higher, the authors noted, adding that the issue “deserves attention and possible provision of correction factors, tables, or even built-in user-optional adjustments.”
But even after a follow-up study in 2007 confirmed and expanded the results, nothing was done. The information remained largely unnoticed. Why wasn’t this issue, that literally affects a majority of the world’s population, more widely known? For one thing, the studies were largely published within the journals of one medical specialty, anesthesiology, and never reached a wider audience. Looking back, many now think that disinterest was a clear example of structural racism in medicine. “It speaks to the fact that some scientific knowledge is not prioritized. And this was not,” said Sjoding.
Because those studies never trickled out to a wider audience and weren’t made part of medical school curricula, few physicians — even those who have darker skin themselves, like Blackstock — realized they were using a device that wasn’t working equally well on all patients.
Sjoding and his colleague Thomas Valley are pulmonologists who rely on the devices daily. At their hospital in Ann Arbor, they never questioned readings on patients, who are predominantly white. But when the first Covid wave overwhelmed Detroit, causing hospitals there to send many Black patients to Ann Arbor, they noticed that readings in Black patients didn’t always match numbers taken from blood draws. “We kept seeing this discrepancy,” Sjoding said. “We didn’t know what was going on.”
Sjoding doesn’t have a hypoxia lab, but he has an interest in leveraging big data to improve care. After he read a prescient article about problems with pulse oximeters, he decided to use electronic health record data to compare thousands of measurements of blood oxygen levels taken from arterial blood draws to those from oximeters.
Sjoding and his colleagues found Black patients were three times more likely to have hidden hypoxemia than white patients, raising the possibility that errors in the devices may have clinical ramifications for patients with darker skin. Subsequent studies have buttressed his findings, showing patients with darker skin and less accurate readings received less supplemental oxygen and delayed access to Covid treatments.
The devices can’t be directly blamed for higher Covid mortality in Black and brown patients, of course. There were many factors involved, including that people from these groups were more likely to be frontline workers, to live in multigenerational households, and to have less access to insurance and good medical care. But many physicians remain haunted by questions about patients they sent home from busy hospitals who may have been sicker than the devices made it appear, and angered that little action is being taken to fix the problem.
“We have made a conscious decision to not fix this for Black patients,” said Theodore J. Iwashyna, one of Sjoding’s co-authors and a professor of pulmonary and critical care at Johns Hopkins. “It’s a little coincidental that it just happens to work really well in white people and not in Black people and that’s OK.”
A key question the FDA will take up is whether errors in the devices matter clinically, and whether they may cause patient harm.
There are many, including people who helped develop and manufacture pulse oximeters, who think the errors are too small to be clinically relevant, except in a few uncommon medical conditions such as cyanotic heart disease. They say that this is the reason the issue hasn’t drawn more attention, and argue that recent studies suggesting the devices affect patient care may be drawing the wrong conclusions.
Kevin Tremper is a professor of anesthesiology at the University of Michigan who, for his Ph.D. research in chemical engineering, studied new ways to non-invasively monitor oxygen. He’s open about the fact that he benefits financially from work he’s done on devices and through related companies he has started and sold. But he said he’s confident the devices used today are not causing clinical harm and said they have helped improve care for all patients.
Tremper has pushed back against the newer studies, saying the recent findings showing people with darker skin received less oxygen or had Covid treatment delayed were associations not proven to have been caused by the flaws in pulse oximeters.
Other issues, such as insurance status, poorer overall health, or the type of hospital a patient visited, could also explain those results, he said. He is also concerned because the studies relied on race being self-reported, which is an imperfect proxy for skin color.
In an editorial titled “The Pulse Oximeter is Amazing, but Not Perfect,” Tremper argued that the errors in patients with darker skin are too small to affect care and noted that errors exist for lighter-skinned patients as well. “Everyone thinks these devices are more accurate than they really are,” he told STAT.
While the errors for darker-skinned patients do get larger at lower and more dangerous oxygen levels, Tremper said at these lower levels, clinicians should be measuring oxygen directly from the blood anyway. Clinicians should also pay attention to trends and not just single readings, he said.
While he’s confident in the current devices, Tremper said he is eager, and soon expects, to see manufacturers develop devices that are more accurate at lower oxygen levels and in patients with darker skin.
Many manufacturers of pulse oximeters, including Nonin, Edwards Lifesciences, and Masimo, say their devices do work on a range of skin colors. Masimo has released internal data showing one of its newer devices works well on a variety of skin tones.
“It’s a valid concern to make sure pulse oximeters or other technologies work on all people,” said Joe Kiani, the founder, chairman, and CEO of Masimo, adding that he can’t speak for the quality of all devices. He said his company has emphasized recruiting diverse populations to test and calibrate its devices and has released data showing the results of these studies.
“I believe there is racial bias in the treatment of patients. I believe Caucasians get better treatment than Black and brown people, but it’s not the pulse oximeter. It’s not our pulse oximeter,” he said.
In an editorial and in an interview with STAT, Kiani questioned the results of Sjoding’s study, saying the findings could have been confounded by patients with sickle cell disease or poor circulation, and criticized the fact that blood gas readings were often taken 10 minutes after pulse oximetry readings — a problem because oxygen levels can fluctuate rapidly in very sick patients. “They’re being sloppy. They’re lumping things together,” he said of the studies.
Sjoding agrees the recent studies aren’t perfect — conducting science in the real world on patients is much harder than testing devices in controlled lab settings — but says the findings can’t be ignored because they all point to the same conclusion: The devices don’t work as well in patients with dark skin, patients who happen to be among medicine’s most vulnerable.
“When our study came out, people said, this can’t be right,” he said. “I was grateful these other studies came out that confirmed what we found.”
The debate has gotten a bit ugly and confrontational on all sides. Kiani and Tremper both dismiss the Sjoding study as a mere letter to the editor. (While it is labeled “Correspondence,” it is a scientific submission that is fully peer-reviewed, a spokesperson for the New England Journal of Medicine told STAT.) “The research first describing the double helix was a research letter,” noted Iwashyna, Sjoding’s co-author.
Though the issue of race in medicine is much more openly discussed than it was when pulse oximeters first arrived on the scene, language around race remains fraught. Tremper, for example, is concerned with how the word bias is being used. The studies measured statistical bias, he said, a systematic tendency that causes a difference between results and true facts. “Bias is a statistical term, it’s not a social commentary term,” he said.
Kiani titled his editorial “Pulse Oximeters are not Racist.” But many argue that they are. Not the inanimate objects themselves, of course, but the data and information that went into creating and calibrating them. This is something Ruha Benjamin, an associate professor of African American studies at Princeton, calls “discriminatory design” and “coded inequity,” unfairness that occurs when creators of technology do not consider systemic racism as they create software and devices.
How much the errors matter depends partly on where a pulse oximeter is used. A matter of a few percentage points may not matter in an operating suite where oxygen levels can be exquisitely controlled, but may matter very much in critical care.
This is especially an issue in patients with severe Covid, where, in what’s called “happy hypoxia,” oxygen levels can drop dangerously even in patients without labored breathing.
“You’re in an overloaded ED, trying to come up with a triage tool to decide if this patient needs to be admitted or not. You’re going to rely on a single pulse oximeter reading,” Sjoding said.
One or two points of difference on a pulse oximeter, he said, can determine if someone with a severe Covid infection is sent home from the hospital or admitted, if they are given supplemental oxygen or not, and if that oxygen is paid for, he said. Medicare will pay for oxygen if a patient’s pulse oximeter reading is 88 or 89, but not if it’s 90, meaning patients with darker skin may have to be sicker before they receive the treatment they need.
“You see more failing organs, more risk of death, more people not receiving treatment they should because of Covid or not receiving enough oxygen,” said Michigan’s Valley. “These decisions are being made because of faulty measurement.”
Many engineers, including several who are Black, are working to develop redesigned devices that work equally well, regardless of skin tone, but creating and validating them will take years.
In the meantime, numerous editorials in medical journals have urged immediate and better regulation of the devices. Many are calling for them to be calibrated on more people with dark skin. The FDA currently requires such studies to include at least two people, or at least 15% of subjects, with darker skin. But Sjoding said that’s too few people to run reliable statistics on.
Experts scheduled to speak before the FDA panel told STAT they would also like to see calibration data disaggregated by race or skin tone because pooling data may blur a signal of poor performance if there are only a few people with dark skin tested in studies of a device that works better on light-skinned patients.
Some would also like to see different pulse oximeter devices, including cheaper consumer devices sold in drugstores and online, tested head-to-head by an independent lab to assess manufacturer claims that some devices work on all skin tones, and controlled studies, with skin color carefully measured, conducted in a hospital setting.
Much of this work may come out of the lab that probed the accuracy of the devices more than two decades ago: UCSF’s Hypoxia Lab. The lab recently launched openoximetry.org, a project to better understand the magnitude of the problem, and to identify solutions.
Michael Lipnick, an associate professor of anesthesia at UCSF who leads that project, is starting a study of pulse oximeters with colleagues Phil Bickler and Carolyn Hendrickson at Zuckerberg San Francisco General Hospital and Trauma Center, a safety-net hospital, at the request of the FDA.
Lipnick agrees with colleagues in anesthesiology that there are issues with the retrospective and uncontrolled studies done so far (as there are with all studies of that type) that limit what conclusions can be drawn about the danger of the devices. But he said the signal revealed by the studies is too important to be ignored.
“One of the questions we get commonly is, ‘How big of a problem is this? Do we really need to invest a lot of resources?’” he said. “The answer is yes, we do. We have enough evidence to suggest this needs to be looked into in greater detail and addressed.”
There’s a lot for the lab to sort out. Do the devices work less well in a clinical setting than in the lab? Do some devices work better than others? Can people with darker skin trust their readings? Lipnick can’t say for sure. “It’s a big deal if for any patient — regardless of the color of their skin — we can’t say we are taking the best care we can of you,” he said. “The bar potentially was set too low.”
Regardless of what happens at Tuesday’s meeting, it’s clear that change is already afoot. Many doctors are taking a more skeptical look at pulse oximeter readings taken in darker-skinned patients and testing oxygen levels using blood samples. A petition is also circulating, calling for action on the issue from the World Health Organization and national regulatory agencies.
And some medical schools, including the University of Washington, are now teaching medical students about the issues with the devices, said Andrew M. Luks, a professor of medicine in its division of pulmonary, critical care, and sleep medicine. Luks revised his course curriculum shortly after Sjoding’s article came out in 2020. “People in medical education are much more attuned to racial disparities now, and that’s appropriate,” he said.
Like others, Luks noted that many other medical devices — the spirometer, for example, assumes Black and white people have different lung capacities — may contribute to health disparities as well.
There are fears that racial discrepancies in a wide range of medical devices and function tests could be amplified as more algorithms that rely on instruments, and AI systems calibrated using white populations, become embedded within health care, carrying with them their inherently inequitable heritages. For example, a ventilator controlled by pulse oximeter readings may not deliver as much oxygen as a patient with darker skin needs, and could result in lasting brain damage.
By fixing pulse oximeters, the FDA could be fixing a whole lot more.
This is part of a series of articles exploring racism in health and medicine that is funded by a grant from the Commonwealth Fund.
Get your daily dose of health and medicine every weekday with STAT’s free newsletter Morning Rounds. Sign up here.