The notion that people will regularly use computers to diagnose their own illnesses has been discussed for decades. Of course, millions of people try to do that today, consulting Dr. Google, though often with little success. Given the low quality of many online health sources, such searches may even be harmful. Some governments have even launched “Don’t Google It” campaigns to urge people not to use the internet for health concerns.
But the internet may suddenly become a lot more helpful for people who want to determine what is wrong with them. ChatGPT, a new artificial intelligence chatbot, has the potential to be a game-changer with medical diagnosis.
ChatGPT is not the first innovation in this space. Over the last decade, symptom checkers have emerged on websites and in smartphone apps to aid people searching for health information. Symptom checkers serve two main functions: they facilitate self-diagnosis and assist with self-triage. They typically provide the user with a list of potential diagnoses and a recommendation of how quickly they should seek care, like see a doctor right now vs. you can treat this at home.
Our team once tested the performance of 23 symptom checkers using 45 clinical vignettes across a range of clinical severity. The results raised substantial concerns. On average, symptom checkers listed the correct diagnosis within the top three options just 51% of the time and advised seeking care two-thirds of the time.
When the same vignettes were given to physicians, they — reassuringly — did much better and were much more likely to list the correct diagnosis within the top three options (84%). Though physicians were better than symptom checkers, consistent with prior research, misdiagnosis was still common.
Since it was introduced in late November 2022, the artificial intelligence model known as ChatGPT has garnered substantial interest from the media and the general public. It builds on a previous AI model, the Generative Pre-Trained Transformer 3 (GPT-3), a general-purpose AI model trained to predict the next word in a sentence using a large collection of unstructured text from the internet. What makes GPT-3 unique is its size — at the time of its creation, it was one of the largest AI models built on hundreds of gigabytes of online textual data.
ChatGPT is a user-friendly version of GPT-3 that includes an easy-to-use chatbox to which individuals can direct questions. It also includes modifications to GPT-3 that make it more likely to produce text that users will find helpful. The resulting output is often remarkable; users quickly found they could use ChatGPT for a wide range of applications such as fixing errors in their computer code to writing original, analytical essays. In five days, more than 1 million people had made accounts to use ChatGPT.
We gave ChatGPT the same 45 vignettes previously tested with symptom checkers and physicians (see the example below; all of the vignettes can be found here.) It listed the correct diagnosis within the top three options in 39 of the 45 vignettes (87%, beating symptom checkers’ 51%) and provided appropriate triage recommendations for 30 vignettes (67%). Its performance in diagnosis already appears to be improving with updates. When we tested the same vignettes with an older version of ChatGPT, its accuracy was 82%.
|Vignette of a query about leg pain. ChatGPT output is in italics|
I am a 65-year-old woman with a history of high blood pressure and have had leg pain and swelling for the past 5 days in one of my legs. I was recently hospitalized for pneumonia and had been recuperating at home. After I started walking, my right leg became red, painful, tender, and swollen. What is likely causing my symptoms?
Your symptoms of leg pain, swelling, redness, and tenderness after a recent hospitalization for pneumonia could be indicative of a DVT (Deep Vein Thrombosis) or a blood clot. It’s important that you seek medical attention as soon as possible, as DVT can be serious and potentially lead to other complications, such as pulmonary embolism. Other possible causes of leg swelling and pain include an injury, cellulitis (a skin infection), or Lymphedema (an accumulation of lymphatic fluid). Your doctor can perform a physical examination and imaging tests (such as ultrasound) to determine the cause of your symptoms.
How quickly should I seek medical care?
It is important to seek medical attention as soon as possible if you have symptoms of DVT or a blood clot, such as leg pain, swelling, redness, and tenderness. The faster you receive a diagnosis and appropriate treatment, the lower the risk of complications. In some cases, DVT can lead to serious health problems such as pulmonary embolism, which is a potentially life-threatening condition. If you are experiencing any symptoms that you believe may be related to a DVT, you should contact your healthcare provider or seek emergency medical attention immediately.
What does this mean?
Some caveats first: We tested a small sample, just 45 cases, and used the kind of clinical vignettes that are used to test medical students and residents, which may not reflect how the average person might describe their symptoms in the real world. So we are cautious about the generalizability of our results. In addition, we have noticed that ChatGPT’s results are sensitive to how information is presented and what questions are being asked. In other words, more rigorous testing is needed.
That said, our results show that ChatGPT’s performance is a substantial step forward from using Google search or online symptom checkers. Indeed, we are seeing a computer come close to the performance of physicians in terms of diagnosis, a critical milestone in the development of AI tools. ChatGPT is only the start. Google has recently announced its own AI chatbot and many other companies are likely to follow suit.
Given the interactive nature of these chatbots, we can see a future where people frequently turn to these types of tools for advice. Such tools could be particularly helpful for individuals living with uncommon conditions or who lack easy access to care. For physicians, AI tools could become a standard part of clinical care to reduce misdiagnosis, which unfortunately remains much too common in health care: An estimated 10% to 15% of diagnoses are wrong. There are many underlying reasons for misdiagnoses, ranging from physicians anchoring too quickly on a diagnosis to overconfidence. Tools like ChatGPT could be used as an adjunct, just as adjunctive AI tools are being used for other clinical applications. Radiologists who read CT images, for example, now use AI algorithms to flag those showing an intracranial hemorrhage or a blood clot in the lungs.
While this future is exciting, unknowns and pitfalls exist. A key one is how a patient’s history, physical exam findings, and test results would be fed into an algorithm in a clinic’s workflow. Another is that while AI algorithms are prone to errors — as are humans — people sometimes place undue trust in AI output. If a physician disagrees with the AI’s output, how will this affect patient and physician interactions, and will such disagreements need to be adjudicated?
AI models are also prone to bias. No matter what the size of the internet-based source material, it does not ensure that the AI will show diversity in the response it provides. Instead, it runs the risk of amplifying harmful biases and stereotypes that are embedded in the source material. The size of the material used for these algorithms may also make it difficult, or even impossible, to adapt to changing social views and clinical norms.
Despite these unknowns, it appears the future of computer-assisted diagnosis is suddenly here and the health care system will now need to respond and address these challenges.
Ruth Hailu is a research assistant in the Department of Health Care Policy at Harvard Medical School. Andrew Beam is an assistant professor in the Department of Epidemiology at the Harvard T.H. Chan School of Public Health whose research focuses on medical artificial intelligence. Ateev Mehrotra is a physician at Beth Israel Deaconess Medical Center and a professor of health care policy at Harvard Medical School whose research focuses on evaluating delivery innovations.