I don’t usually compare models publicly, but after testing a few this week, a pattern stood out.

Right now, GLM-5 and DeepSeek feel like the only two models playing in the same weight class. Not because of hype but because of how they behave when you give them real work.

I didn’t go hunting for benchmarks, but it’s worth noting that GLM-5 is the model behind “Pony Alpha,” the anonymous system that quietly dominated OpenRouter for a while. That context helped explain what I was seeing.

I ran a single prompt inside Z.ai and recorded the output.

A few things I noticed:

✓Agent Mode handles longer, structured tasks without drifting.
✓ Writing can be fast, research-first, or step-by-step depending on control.
✓ Data Insight doesn’t just visualize files, it explains what changed and why.
✓ Chat Mode works well for quick experiments and vibe coding.

At some point, it stopped feeling like “testing a model” and more like checking whether I could trust it with real work.

If you’re curious to test it yourself, GLM-5 is available here:
👉 https://chat.z.ai

Would be genuinely interested to hear what others notice when they try it..

Keep reading