When it was first released, GPT-4 had wowed people all over with its skills and abilities. But strangely, instead of getting better with time, some users are reporting that it’s getting worse.
A top post on Hacker News has said that GPT-4’s quality appears to have degraded from its original release. “It is much faster than before but the quality of its responses is more like a GPT-3.5++,” a user complained. “It generates more buggy code, the answers have less depth and analysis to them, and overall it feels much worse than before,” they added.
The post was heavily upvoted, and other users seemed to agree. “Yes. Before the update, when its avatar was still black, it solved pretty complex coding problems effortlessly and gave very nuanced, thoughtful answers to non-programming questions. Now it struggles with just changing two lines in a 10-line block of CSS and printing this modified 10-line block again. Some lines are missing, others are completely different for no reason. I’m sure scaling the model is hard, but they lobotomized it in the process. The original GPT-4 felt like magic to me, I had this sense of awe while interacting with it. Now it is just a dumb stochastic parrot,” wrote user bbotond.
“I just tried a comparison of ChatGPT, Claude and Bard to write a python function I needed for work and ChatGPT (using GPT-4) whined and moaned about what a gargantuan task it was and then did the wrong thing. Claude and Bard gave me what I expected,” wrote user jonathan-kosgei.
Even on Twitter, there were many reports that GPT-4’s quality seemed to have mysteriously become worse with time. “woah yeah gpt-4 feels significantly worse, worse at following some instructions than 3.5 right now? seems like there’s a bug somewhere,” wrote user @willdepue.
Now LLMs don’t have deterministic outputs, so it’s hard to be able to tell with certainty if GPT-4 has actually gotten worse. But it certainly appears that many users across social media platforms seem to be reporting a drop in quality over the past week. It could be because of an update that OpenAI might have silently made to the model, or it’s perhaps that the company’s Trust and Safety guardrails are making the model perform worse. But the recent incident also serves to highlight how important open-source LLMs could be — OpenAI’s GPT-4 is a black box, and there’s no telling what’s causing its performance to improve or degrade. If there are comparable open-source alternatives out there, it might be possible for people to tinker with the code — and figure things out for themselves.