Is ChatGPT Becoming Dumber? New Study Claims AI Chatbot's Performance Is Deteriorating

OpenAI's ChatGPT has gained widespread popularity and sparked an AI race due to its impressive performance as an artificial intelligence chatbot.

Renowned figures in the tech industry and authors alike have showered it with accolades, deeming it a groundbreaking achievement in the world of AI.

ChatGPT's abilities have been so impressive that some even question if it has surpassed the Turing test - the ultimate benchmark for measuring a machine's capability to emulate human intelligence.

The language model has demonstrated remarkable proficiency across various fields, showcasing its prowess in math (89th percentile), law (90th percentile), and GRE verbal (99th percentile).

Moreover, a study by researchers from New York University's medical school earlier this month highlighted ChatGPT's ability to provide medical advice that closely resembles responses from human medical staff. However, not all researchers are entirely convinced that ChatGPT is consistently reliable in critical decision-making scenarios.

Is ChatGPT Becoming Dumber? New Study Claims AI Chatbot's Performance Is Deteriorating — Is ChatGPT becoming dumber? Researchers analyzed the AI chatbot's performance. STEFANI REYNOLDS/AFP via Getty Images

ChatGPT Deteriorating Performance

Lingjiao Chen, Matei Zaharia, and James Zhu from Stanford University and the University of California, Berkeley, have echoed concerns expressed by some users, suggesting that ChatGPT's performance may not be entirely consistent and may even be deteriorating in some instances, Science X Network reported.

Their investigation discovered considerable variations in the performance and behavior of GPT-3.5 and GPT-4. Particularly noteworthy was the significant decline in responses to specific tasks over the four-month period, or from March to June.

The researchers concentrated on evaluating ChatGPT's aptitude in math problem solving and computer code generation. Their discoveries revealed a sharp decline in GPT-4's accuracy rate for prime number problems, plunging from 97.6% in March to a mere 2.4% in June.

ChatGPT's role in aiding coders with programming and debugging tasks also encountered obstacles. In March, GPT-4 demonstrated an impressive ability to complete accurate, ready-to-run scripts in over 50% of cases.

However, this success rate dramatically dropped to 10% by June. Similarly, ChatGPT-3.5 experienced a notable decline in accuracy, decreasing from 22% in March to a mere 2% in June, according to the study.

The researchers faced challenges pinpointing a definitive cause for these inconsistencies, but they speculated that system modifications and upgrades might be contributing factors. The opaque nature of these language models makes it difficult to fully comprehend the reasons behind such performance fluctuations.

Conspiracy Theories

Interestingly, conspiracy theorists have floated accusations that OpenAI is potentially experimenting with smaller versions of LLMs to save costs. Others have suggested that OpenAI could intentionally be downgrading the GPT-4 to drive users toward purchasing GitHub's LLM accessory, CoPilot.

OpenAI refuted such claims. In a tweet, Peter Welinder, OpenAI's VP of Product, clarified that they are continually striving to improve ChatGPT, making each new version smarter than its predecessor.

However, some remain concerned about the potential impact of "drift" in the model's results. To address these apprehensions, observers urge OpenAI to be more transparent by disclosing training material sources, code, and other structural elements of ChatGPG 4.0.

The study's findings were published in arXiv.