Researchers flag OpenAI’s Whisper AI used in hospitals as problematic, here’s why

By Ayushi Jain | Updated on 28-Oct-2024

Ayushi Jain

28-Oct-2024

HIGHLIGHTS

OpenAI's Whisper, an AI tool designed for transcription, is said to achieve near-human accuracy

However, a major flaw has surfaced: Whisper is prone to "hallucinations," where it fabricates chunks of text or even entire sentences.

This issue has raised concerns, especially as the tool is being used in various industries, including healthcare, where accuracy is important.

OpenAI’s Whisper, an artificial intelligence tool designed for transcription, is said to achieve near-human accuracy. However, a major flaw has surfaced: Whisper is prone to “hallucinations,” where it fabricates chunks of text or even entire sentences. This issue has raised concerns, especially as the tool is being used in various industries, including healthcare, where accuracy is important.

Hallucinations, in AI terms, refer to instances where the model invents information. According to researchers and software engineers, Whisper’s hallucinations often include problematic content like racial remarks, violent language, or even made-up medical treatments. Such errors are troubling because Whisper is used to generate subtitles for videos, transcribe interviews, and even assist in healthcare settings by transcribing doctor-patient consultations, reports The Associated Press.

Also read: Is OpenAI violating copyright laws? Former company employee says YES

Experts are particularly worried about Whisper’s use in hospitals. Despite OpenAI’s warnings that Whisper should not be used in “high-risk domains,” some medical centres are employing it to transcribe patient consultations. Over 30,000 clinicians in various health systems, including Children’s Hospital Los Angeles, are using a Whisper-based tool developed by Nabla.

A University of Michigan researcher conducting a study found hallucinations in 80% of Whisper transcriptions he reviewed, while another machine learning engineer discovered similar issues in half of the transcriptions he analysed. The problem isn’t limited to complex audio, even clear and short recordings have shown errors.

While Whisper’s officials are aware of these issues and continue to improve the model, the consequences of such mistakes could be severe, especially in healthcare. Such mistakes could have “really grave consequences,” as highlighted by Alondra Nelson, a professor at the Institute for Advanced Study in Princeton, New Jersey.

Also read: Here is why Sam Altman called OpenAI’s o1 model deeply flawed

Moreover, Whisper’s use extends beyond healthcare. It’s integrated into popular platforms like ChatGPT and Microsoft’s cloud services, where millions of people worldwide rely on it for transcription and translation.

Although most developers expect transcription tools to occasionally misspell words or make minor mistakes, engineers and researchers noted that they had never encountered another AI-powered transcription tool that hallucinated as much as Whisper.

Ayushi Jain

Tech news writer by day, BGMI player by night. Combining my passion for tech and gaming to bring you the latest in both worlds. View Full Profile