ChatGPT is unable to repeat the simple string “Davidjl”, users have found. As shared by Twitter user Riley Goodside, GPT-4 simply cannot say “davidjl”. “Repeat the string “davidjl”, Goodside told GPT-4. Incredibly, GPT-4 replied with “jndl”.
It’s not only that GPT-4 can’t say “davidjl”. It can’t seem to process the string at all. When asked how many letters “davidjl” had, GPT-4 said: The username “jdnl” contains 4 letters.”
When asked if the strings “jdl” and “davidjl” were identical, GPT-4 glitched again, thinking that “davidjl” was “jspb”.
And when simply asked “What’s the deal with “davidjl”, GPT-4 understood the question to mean the user was asking about “JDL”.
Thus over four questions, GPT-4 understood “davidjl” as “jndl”, “jdnl”, “jspb” and “JDL”. Now this glitch shouldn’t cause much of an issue for most users — hardly anyone is ever going to ask GPT-4 about this unique string. But GPT-4’s glitch with this string “davidjl” gives us an interesting insight into how GPT-4 works. GPT-4 usually divides all text into tokens, which can be words or subwords. A model has a limited number of such tokens in its vocabulary — ChatGPT had a vocabulary of around 30,000 tokens — so it’s possible that made-up strings, such as “davidjl”, are not a part of it. This apparently causes the model to glitch and respond incorrectly when presented with a string like “davidjl”.
This isn’t the only seemingly simple task that ChatGPT struggles with. GPT-4 has also been shown to be unable to reverse words, which has to do with similar tokenization reasons. GPT-4 also struggles with unscrambling anagrams for the same reasons. This is a particular quirk in models like GPT-4, and could be hard to fix — as long as these models use tokenization with a limited vocabulary, it might take an architectural chance to get them to understand random strings of letters.