Hacker News

JustSkyfall 7 hours ago [ - ]

The problem with these benchmarks is that the Chinese models tend to be incredible on paper, and absolutely terrible in practice :/

CuriouslyC 6 hours ago [ - ]

This was a problem with older Qwen/MiMo/Kimi models mostly. GLM has always been on the more robust side, and newer iterations from all those labs have improved as well. The only lab I've seen regressing this way is DeepSeek, 3.2 was fairly robust but 4.0 feels more benchmaxxed.

Mashimo 7 hours ago [ - ]

I have used GLM since version 4.8 I think and do enjoy using them. More then other models like Kimi or Deepseek. Though only tested them on smaller private projects.

Alifatisk 6 hours ago [ - ]

> I have used GLM since version 4.8 I think

You probably refer to GLM-4.7

bel8 6 hours ago [ - ]

I beg to differ. I replaced a $40/mo GitHub Copilot subscription where I used Opus 4.6 and GPT 5.5 with a $10/mo opencode Go plan where I use mostly DeepSeek V4 Flash and testing MiMo 2.5.

I work on mid-sized projects currently (200k to 1kk lines of code).

Alifatisk 6 hours ago [ - ]

> 1kk lines of code

Isn't that a million?

bel8 6 hours ago [ - ]

Yep. I consider up to a million lines of code as mid-sized.

When I worked in banking, the codebases were often larger than a million.

segmondy 7 hours ago [ - ]

You are obviously lying because it shows you have no experience with. GLM since 4.5 have been crushing it. all their models since then haven't skipped a beat. 4.5/4.5-air, 4.6, 4.7, 4.8, 5, 5.1. That aside, MiMoV2.5, MiniMax from 2.0, DeepSeek from V3, Kimi since V2, Qwen since 3, Hy3 have all been amazing models. All from China, we need to get over it. China is not losing yet as far as the AI race is concerned.

Alifatisk 6 hours ago [ - ]

Is there a GLM-4.8 model?

jingpostmedia 7 hours ago [ - ]

[flagged]