Doctors often turn to Google Translate to talk to patients. They want a better option

The patient had just undergone a cesarean section, and now was struggling to put words to her pain in her native Taiwanese. The physician making rounds, Natasha Mehandru, was used to communicating with patients who didn’t speak English as a first language at her county hospital in Phoenix. But this time, calling in an interpreter by phone wasn’t working.

“The service was not really good,” she said — and soon, she realized the patient and the interpreter weren’t even speaking the same dialect. “It was difficult to communicate, even with the interpreter.”

So Mehandru turned to a familiar tool: Google Translate. Typing translations back and forth — Taiwanese to English, English to Taiwanese — she and the patient slowly came to an understanding with the help of the interpreter still on the line. Her pain wasn’t from the C-section, in her abdomen, but from a separate and long-standing issue, lower in her body. “That changed how I managed her that day,” said Mehandru, who was at the time a gynecological resident and is now a surgeon at Kaiser San Jose Medical Center. With the help of the machine translation tool, “we changed around medications, and then over the course of a couple days she ended up feeling better.”


Like many health systems, the hospital complied with federal requirements for meaningful access to language services by staffing in-person interpreters for frequent needs like Spanish, and could call up interpreters for less commonly spoken languages. But it was an imperfect system — there were sometimes delays, or a dialect that it was difficult to track down a translator for — and Google Translate came to serve as a fallback.

Google Translate has become a ubiquitous, if under-examined, part of patient care. “It’s sort of [used] under the table,” said Elaine Khoong, an internist and assistant professor of medicine at the University of California, San Francisco. The practice is hidden in part because it is formally discouraged by health systems and state medical registration boards that see it as a liability. There’s a growing push by Khoong and other researchers to bring it to the surface — both to study Google Translate’s use and risk in the clinic, and to build better versions to backstop traditional language services.


“I do think it is the future,” said Breena Taira, a clinical emergency medicine researcher at UCLA Health whose recent study evaluated Google-translated discharge instructions in seven languages. Tech giants like Google and Microsoft, which have invested heavily in voice recognition software, have expressed interest in exploring medical translation.

“We just have to be really aware of what the limitations are,” Taira said, including significantly lower accuracy rates for languages that aren’t widely spoken. Machine translation could fill an especially large gap in services to provide personalized written instructions for non-English speakers. Sanjana Rao, a doctor at a family medicine practice in Tacoma, Washington, said she’s seen colleagues provide patients with after-visit notes they’ve translated in full with Google Translate with no vetting, a practice she doesn’t trust. 

“We have to do the work to make sure that we can convey written information in non-English languages in a safe way,” said Taira.

Research from Khoong, Taira, and others has highlighted that Google Translate can specifically be unsafe to use to translate emergency room discharge instructions, delivering inaccurate results that could lead to serious errors. While the tool has gotten more accurate since Google switched its algorithmic approach, mistakes are still common when the acronym- and jargon-filled lexicon of clinical communication collides with an algorithm trained on everyday language.

“Obviously, Google Translate wasn’t built for health care applications,” said Nikita Mehandru, a Ph.D. student in clinical artificial intelligence at the University of California, Berkeley and the sister of Natasha. “Maybe something should be.”

Along with fellow student Samantha Robertson and human-computer interaction researcher Niloufar Salehi, Mehandru recently surveyed 20 health care providers about their interpretation and translation resources, aiming to understand the scope of communication challenges before trying to design something like a Google Translate for doctors — starting with the written instructions emergency doctors give patients when they’re discharged.

They plan to train their tool on the text it aims to translate: more than 1,500 emergency discharge records from UCSF, accessed in collaboration with Khoong. “One of the things that makes it a hard problem is that almost none of these black box deep learning models are trained on medical data,” said Salehi. “They’re mostly trained on web form data, so they don’t work really well with medical information.”

But they’re not simply turning neural networks loose on a new clinical corpus. Discharge instructions are often very structured and modeled after a template, “so it doesn’t really make sense to use a black box deep learning model,” said Salehi. Instead, they’re trying to combine deep learning with a pre-translated dictionary of common phrases, making certain results highly reliable and leaving the potential to show providers where uncertainty remains. “We could say, 80% of this discharge info is verified translation, and we could even mark the parts where we’re not so sure,” said Salehi.

Like other clinical decision support tools, such a system could nudge clinicians toward smarter actions rather than providing a pat solution. A tool could prod doctors to write their English instructions in simpler ways, for example, making the machine translation more likely to be accurate, said Khoong.

Even if machine translation tools prove accurate enough for clinical use, there are still significant regulatory and legal hurdles for companies to make them and for health systems to embrace them. The tools would have to be HIPAA compliant, and providers and developers would have to sort out who is liable for failed translations that cause harm — potentially in very public ways.

“We’re already using ML and AI tools in health care, but it’s usually hidden on the backend where people don’t see it — for image interpretation, risk stratification tools,” said Khoong. “But when you bring it up to the front end where patients can see it, the legality issues and the liability issues are a lot more concerning.”

That’s one reason why Khoong is calling to advance the type of research done on medical machine translation systems. In a paper she recently penned with Jorge Rodriguez, a hospitalist and technology equity researcher at Brigham and Women’s Hospital, they lay out a framework for analysis that focuses not just on translation accuracy, but patient outcomes.

The viability of machine translation, they argue, should be judged not just by comparing it with gold-standard interpretation, but current practice — which sometimes is nothing at all.

“For a lot of patients who have non-English language preference, what actually happens is either the clinical team doesn’t talk to them, or they use sign language, or they try to mime,” said Khoong. Interpretation can be especially scarce in safety net facilities, which often end up paying higher rates for call-in services. And physicians can be reticent to call in an interpreter for anything but the most mission-critical moments in a patient’s stay, like surgical consent, because it can take away precious minutes from their interaction with a patient.

That leaves out many of the small moments that make up a patient’s care. “If you want to ask the patient, ‘Are you cold?’ ‘Open your eyes, take a deep breath,’ the time it can take to prepare for those two sentences can be untenable,” said Won Lee, an anesthesiologist at UCSF who is investigating Google Translate’s accuracy in those interstitial moments of care. Research consistently shows that patients who do not share a language with their provider fare more poorly.

“Is [machine translation] better than what’s going on there?” asks Khoong. “I think we don’t have a good sense, and that’s what we should evaluate.”

Understanding patient outcomes is especially critical because of the potential for machine translation to introduce new disparities in health care. If a validated but imperfect technology makes it easier for health systems to avoid calling on interpreters, non-English speaking patients could still get shortchanged on care and communication. “I don’t want it to feel like once we have Google Translate validated, interpreters will go by the wayside,” said Rodriguez. Research will be necessary to understand how to use the tools without undermining patient care and when human interpreters are needed.

That’s why, once Salehi and her team finishes building their discharge translation tool, they hope to conduct a randomized controlled trial of patient outcomes, testing to see “whether giving people information in their own language is more helpful,” she said.

It’s the kind of expensive research that commercial developers — with their deeper pockets and broad reach — could help conduct. “The technology is there to be able to build these algorithms,” said Rodriguez. “It’s just a matter of getting all the right players in the room, and incentivizing it.” 

For Nuance Communications, the voice recognition company that was acquired by Microsoft earlier this month for $16 billion, the incentives may already be in place. The company has a tool, DAX, that listens into doctor’s appointments and produces automatic English transcriptions to feed into visit records. Machine translation of those transcripts into other languages is a leading request from its users, said Peter Durlach, chief strategy officer for Nuance. 

“It’s one of the first things we’re going to be looking to integrate with Microsoft, since they have world class machine translation,” he said. “Since DAX is already recording the conversation, it already identifies the different speakers, why couldn’t it automatically translate in real time? It’s not a massive technical lift to do it.”  

For patients and providers still wrestling to understand each other, validated clinical machine translation could be a boon. “We’ve wanted this for so long, and it’s just not there,” said Rao. “We’re doing last resort things like Google Translate because different providers have to make different calls,” knowing that they’re underserving many patients who speak less common languages. “This technology is absolutely imperative to be launched and be used as soon as possible.” 

Source: STAT