ChatGPT Plays Wordle, but Apparently It Sucks

Millions of people around the world are turning to ChatGPT as it provides solutions to almost anything because of its expansive database. However, solving a word puzzle may have been too much for the AI chatbot to handle.

US-GAMES-WORDLE — This photo illustration shows a person playing online word game "Wordle" on a mobile phone in Arlington, Virginia, on May 9, 2022. MICHAEL DRAPER/AFP via Getty Images

ChatGPT Plays Wordle

The latest generation of OpenAI's chatbot, ChatGPT-4, has taken the world by storm with its impressive abilities.

From engaging in long conversations to summarizing complex topics, the AI chatbot has captured the public's imagination. Other AI companies have been scrambling to release their own large language models (LLMs) to keep up.

But how does ChatGPT-4 fare when it comes to word games like Wordle? To find out, Michael G. Madden, an Established Professor of Computer Science at the University of Galway, decided to test the chatbot's skills on the popular word puzzle game.

Players of Wordle have six tries to guess a five-letter word, with the game indicating which letters, if any, are in the correct positions in the word.

Madden found that despite ChatGPT-4 being trained on about 500 billion words from sources like Wikipedia, scientific articles, and public-domain books, its performance on Wordle puzzles was surprisingly poor.

Madden tested the chatbot on a Wordle puzzle where he knew the correct locations of two letters in a word with the pattern "#E#L#", where "#" represented the unknown letters.

The answer was "mealy". However, five out of ChatGPT-4's six responses failed to match the pattern, with some of its suggestions being "beryl", "feral", "heral", "merle", "revel" and "pearl".

Although the chatbot was sometimes successful in finding valid solutions using different letter combinations, its overall performance on Wordle puzzles was inconsistent.

For instance, when given the pattern "##OS#", the chatbot generated five correct options, but when presented with the pattern "#R#F#", it provided only two words that did not contain the letter F, and suggested a nonexistent word "Traff."

Constraints of Language Models

The reason for ChatGPT-4's difficulties lies in the constraints of how language models work with and represent words. At its core, the chatbot relies on a complex neural network, which is essentially a mathematical function that maps inputs to outputs.

However, since neural networks can only operate with numerical inputs, a tokenizer program is used to translate words into numbers for the neural network to process.

Unfortunately, this translation process does not preserve the letter structure within words, making it challenging for ChatGPT-4 to effectively reason about individual letters.

While it may seem surprising that a language model trained on an extensive vocabulary would struggle with basic word puzzles, the encoding process used by neural networks is a fundamental limitation, according to Madden.

Potential Solutions

To address this challenge, Madden proposes two potential solutions for future language models. The first involves expanding the training data to include mappings of every letter position within every word in the dictionary.

The second solution is even more exciting and broadly applicable. Madden's recent work on Toolformer has demonstrated the potential for language models to generate code to solve problems, such as arithmetic calculations, where they typically struggle.

In sum, while ChatGPT-4 may excel at conversation and summarization, its performance on Wordle puzzles highlights the complexities of representing and manipulating words using neural networks.