AI Chatbots in Healthcare Can Spread False Medical Info, Study Finds

The Role of Tax Credit Insurance in Mitigating Risk for Tax Credit Transfers (7)

A new study from Mount Sinai has raised serious concerns about the reliability of AI chatbots in healthcare, showing that even the most popular models, including ChatGPT and DeepSeek R1, are capable of generating false medical information when presented with fabricated terms.  

According to the study, the chatbot was ready to accept even made-up medical terms as facts, and produce detailed but inaccurate explanations for it. According to the findings, the key vulnerability in large language models (LLMs) is their tendency to “hallucinate” or produce incorrect yet confident-sounding responses. As per Dr. Eyal Klang, Mount Sinai’s Chief of Generative AI and the study’s lead author, “Even a single false term can derail an otherwise accurate conversation.”  

How the Study Was Conducted for AI Chatbots in Healthcare 

To conduct the study, the research team ran two rounds of experiments: 

  • Baseline Test: AI Chatbots were provided with realistic patient cases, except some of these scenarios contained fabricated medical conditions or symptoms.  
  • Safety-Prompt Test: The same scenarios were provided, but with a cautionary note advising the AI that some details might be unreliable. 

Adding the warning to the prompt cut wrong answers by about half, depicting that even small changes in the prompt can make AI safer for healthcare. 

Why This Matters for Healthcare 

The study reveals a critical tension: while AI chatbots in healthcare can significantly reduce clinical workload, there also lies the risk of spreading misinformation if left unchecked.  

According to Dr. Klang, the solution isn’t to skip generative AI, but to implement stronger guardrails and maintain human oversight. “LLMs can be extremely effective for summarizing patient reports, drafting documentation, and handling administrative tasks,” he said. “But clinicians must remain in the loop to verify accuracy.” 

The Increasing Role of AI in Medicine 

Generative AI in healthcare is on the rise. From ambient documentation platforms like Ambience Healthcare to patent communication tools, it is becoming the next game-changer. Investments in healthcare AI setups have also taken a hike, with companies such as Abridge and Ambience reaching unicorn valuations.  

If we consider the White House’s AI action plan, it identifies healthcare as a priority sector for AI adoption, although some experts believe safety should have been given more weight. 

Several industry groups are rallying thousands of members to address these issues, some of which are the Coalition for Health AI and the Digital Medicine Society.  Major AI developers, including OpenAI and Anthropic, have also dedicated substantial resources to improving model safety. 

Balancing Innovation with Safety 

Healthcare leaders believe that preventing misinformation is vital to building trust in AI-driven care. This means:  

  • Incorporation of safety prompts into clinical AI workflows 
  • Maintaining human oversight for all patient-facing outputs 
  • Testing models regularly against diverse, real-world scenarios 
  • Collaborating across industry and academia to share safety practices 

The takeaway from the Mount Sinai study is clear. AI in healthcare holds extraordinary potential, but without robust safety measures, it can just as easily harm rather than help.