Google AI Health Chatbot Passes US Medical Licensing Exam

Google's AI health chatbot has achieved a passing grade on a US medical licensing examination, according to a peer-reviewed study published on Wednesday.

The study noted that while the chatbot's performance is commendable, it still falls short of human doctors' expertise, AFP reported. In 2022, OpenAI introduced ChatGPT, an artificial intelligence language model that sparked competition among tech giants in the AI field.

Although the potential and risks of AI have been widely discussed, the application of AI in healthcare has already demonstrated tangible progress, with algorithms capable of interpreting specific medical scans with comparable accuracy to human experts.

Google AI Health Chatbot Passes US Medical Licensing Exam — Autonomous delivery robot OttoBot Yeti of Ottonomy.IO is displayed on July 07, 2023 in Geneva, Switzerland. Johannes Simon/Getty Images

Google AI Health Chatbot: Med-PaLM

In a preprint study last December, Google revealed its AI-based medical question-answering tool, Med-PaLM. Unlike ChatGPT, Med-PaLM has not been made publicly available.

According to Google, Med-PaLM is the first large language model trained on extensive human-produced text to pass the US Medical Licensing Examination (USMLE), which is a significant milestone.

A passing score for the USMLE, taken by medical students and physicians-in-training in the United States, is approximately 60 percent. Earlier this year, a study indicated that ChatGPT achieved passing or near-passing results on the exam.

Google researchers also revealed that Med-PaLM scored 67.6 percent on USMLE-style multiple-choice questions. The study acknowledged that while Med-PaLM's performance is encouraging, it still lags behind that of clinicians.

Google developed a new evaluation benchmark to address the issue of "hallucinations," which occur when AI models provide inaccurate information.

The company reported that a newer version of the model called Med-PaLM 2 achieved 86.5 percent on the USMLE exam, surpassing the previous version by nearly 20 percent, as stated in a preprint study released in May.

Assistants Not Decision-Makers

Experts outside of the research highlighted the distinction between answering medical questions and practicing actual medicine, which involves diagnosing and treating real health conditions.

They cautioned that AI-powered chatbots should be considered as assistants rather than final decision-makers. Google researcher Karan Singhal expressed the possibility of using Med-PaLM in the future to present alternative options that doctors might not have considered.

Singhal did not disclose specific partnerships but mentioned that any testing would not involve direct clinical care or pose a risk to patients. Instead, the focus would be on automating administrative tasks with low stakes.

Since April, Med-PaLM 2 has been undergoing testing at the renowned Mayo Clinic research hospital in the US, The Wall Street Journal reported. However, Singhal chose not to disclose specific partnership details, emphasizing that the testing would focus on administrative tasks that are easily automatable rather than direct patient care.