Google, Microsoft AI Chatbots Fabricate Super Bowl Stats, Hallucinates Data

Google's Gemini and Microsoft's Copilot, two of the biggest artificial intelligence (AI) chatbots, have shown once again they may fabricate their data when asked a question, this time hallucinating the stats and outcome of one of the biggest games in American Sports, Super Bowl LVIII, as reported by Tech Crunch.

Gemini, using Google's eponymous GenAI models, is reportedly responding to queries regarding Super Bowl LVIII as though the game had finished weeks or even days ago, according to a Reddit thread. Like many bookmakers, it favors the Chiefs over the 49ers.

Gemini embellishes quite a bit. In one instance, he provides a player data breakdown that implies Brock Purdy only had 253 running yards and one touchdown, while Kansas Chiefs quarterback Patrick Mahomes ran 286 yards for two touchdowns and an interception.

Doom Calculator: AI Algorithm that Predicts Death Sparks Ethical Debate — This illustration photograph taken in Helsinki on June 12, 2023, shows an AI (Artificial Intelligence) logo blended with four fake Twitter accounts bearing profile pictures apparently generated by Artificial Intelligence software. OLIVIER MORIN/AFP via Getty Images

When asked a similar question, Microsoft's Copilot chatbot also falsified its data, claiming that the 49ers, not the Chiefs, won with a final score of 24-21 and offering false citations to support its claim.

As for ChatGPT, a GenAI model that powers Copilot is comparable, if not the same, as the one that powers OpenAI's ChatGPT. However, Tech Crunch claims that ChatGPT was reluctant to commit the same hallucinations.

As per Tech Crunch, however, the hallucinated data is no longer present on the chatbot as it can no longer be replicated when asked the same questions or prompts.

AI's Biggest Flaw

AI Hallucinations prove to be a constant flaw among generative AI and chatbots; as per Straits Times, a recent Stanford University study saw AI hallucinations are "pervasive and disturbing" based on the answers provided by three cutting-edge generative AI models to 200,000 legal queries. OpenAI's ChatGPT 3.5 hallucinated 69% of the time when posed with precise, verifiable questions regarding random federal court cases, while Meta's Llama 2 model achieved 88% of the time.

Cases showing AI hallucinations continue to be prevalent, in that recently, lawyers reportedly used ChatGPT to write a legal brief document that the lawyers submitted to a Manhattan federal judge; in the brief, the chatbot referenced fictitious court cases and used phony quotes.

Google and OpenAI on AI Hallucination

Users are now reportedly advised to double-check their responses and are cautioned by both OpenAI and Google that their AI chatbots may make mistakes. Additionally, both tech companies are investigating methods to lessen hallucinations.

According to Google, one method it uses for this is user input. According to the business, users may help Bard learn and improve by clicking the thumbs-down button and explaining why the response was incorrect.

Through the use of a technique known as "process supervision," which OpenAI has adopted, the AI model would be rewarded for applying sound reasoning to produce the desired result instead of simply paying the system for creating an accurate response to a user's command.