
I don’t usually compare models publicly, but after testing a few this week, a pattern stood out.
Right now, GLM-5 and DeepSeek feel like the only two models playing in the same weight class. Not because of hype but because of how they behave when you give them real work.
I didn’t go hunting for benchmarks, but it’s worth noting that GLM-5 is the model behind “Pony Alpha,” the anonymous system that quietly dominated OpenRouter for a while. That context helped explain what I was seeing.
I ran a single prompt inside Z.ai and recorded the output.
A few things I noticed:
✓Agent Mode handles longer, structured tasks without drifting.
✓ Writing can be fast, research-first, or step-by-step depending on control.
✓ Data Insight doesn’t just visualize files, it explains what changed and why.
✓ Chat Mode works well for quick experiments and vibe coding.
At some point, it stopped feeling like “testing a model” and more like checking whether I could trust it with real work.
If you’re curious to test it yourself, GLM-5 is available here:
👉 https://chat.z.ai
Would be genuinely interested to hear what others notice when they try it..
