Google's AI Doesn't Work in Real World Testing; Researchers Will Try Again

It's a sad story for artificial intelligence (AI) enthusiasts around the world as the AI screening of Google's researchers fell short during its real-world testing.

Google's AI still needs work after reported failure; Researchers will try again

Google researchers have been humbled by the results in the clinics of Thailand. Google Health created a deep learning system that looks for images of the eye and finds evidence of diabetic retinopathy, a leading cause of vision loss for diabetics around the globe.

Despite high theoretical accuracy by the AI, the tool proved useless in a real-world application. Nurses and patients looked were frustrated at the general lack of harmony of the AI with their practices.

Google still published these less than stellar findings publicly regardless of the perception of the people and it seems that the team of researchers took the lessons to heart.

The research paper states the deployment tool was meant to provide assistance to the existing process by which several clinics in Thailand were already practicing for screening diabetic retinopathy.

Nurses would usually take diabetic patients and take images of their eyes or a "fundus photo," send it to ophthalmologists who will then evaluate them and return the results in four to five days.

What the AI should've done

Google's system was supposed to provide results that usually ophthalmologists do in seconds rather than days with a 90% accuracy rating. Meanwhile, while the nurses could do the preliminary recommendation for a referral if the patients needed further testing in just a minute instead of a month. The theory sounded great, but in actual testing it didn't really deliver promising results.

The researchers had this to say: "We observed a high degree of variation in the eye-screening process across the 11 clinics in our study. The processes of capturing and grading images were consistent across clinics, but nurses had a large degree of autonomy on how they organized the screening workflow, and different resources were available at each clinic."

They added, "The setting and locations where eye screenings took place were also varied across clinics. Only two clinics had a dedicated screening room that could be darkened to ensure patients' pupils were large enough to take a high-quality fundus photo.

The study's author learned from this real-world failure and had this to say in regards to the evaluation of the AI system:

"When introducing new technologies, planners, policy makers, and technology designers did not account for the dynamic and emergent nature of issues arising in complex healthcare programs. The authors argue that attending to people-their motivations, values, professional identities, and the current norms and routines that shape their work-is vital when planning deployments."