stablelm poor performance

StableLM Compares Poorly To Year-old GPT-Neo And Codegen-NL Model, Users Find

Stability AI had launched StableLM models yesterday, much to the delight of the open-source community, but initial results suggest that the released models might not compare favorably to other models released even a year ago.

StableLM performs poorly on standard benchmarks when compared to much older models such as GPT-Neo and Codegen-NL, Twitter users have found. StableLM’s base model has 3 billion parameters, but scored lower on tests compared to GPT-Neo and Codegen-NL, both of which had 2.7 billion parameters each. Across a range of parameters, StableLM was the worst performing model among the three. “The full benchmarks I ran against the new StableLM. The other two models were released over a year ago Something is missing considering the amount of tokens that StableLM has seen,” wrote Twitter user @abacaj while sharing the results of his tests.

Other users reported seeing similar results. “I got similar results on the benchmarks I ran. Something must be wrong, it places in between lm-eval scores of gpt2_774M_q8 and pythia_deduped_410M in my testing (pic w/ headers now),” wrote Twitter user @lhl.

An engineer at Stability AI meanwhile provided a hack to be able to get better results on StableLM. “Try adding “User: ” to the prompt. Because of the way these models were trained, prepending your evals with “User: ” should make things *much* better,” Stanislav Fort said.

Stability AI CEO Emad Mostaque said that the StableLM models currently aren’t as good as GPT-3.5, or even LlAma. “It does not stack up as it is not even trained on one epoch yet and still coordinating the ideal dataset. We are training lots in parallel,” he wrote on Twitter in an AMA.

While it might be early days for Stability AI’s Stable LM models, these initial results indicate how hard it could be for newer companies to catch up with the state of the art in AI. Stability AI is already a massive name, and has come up with the hugely successful Stable Diffusion model, but might have a long way to catch up with LLMs. Which is exactly why companies and businesses are trying to catch the AI wave — any leads in technology or processes taken now could yield results many years into the future.