Kind of, you need at least 256 gb of vram and 24-40 gb of vram to run the 2bit quantization, because it's a moe you just need the expert to fit in vram to get significant improvement over a pure CPU setup. At 2bits though expect significant quality loss.

2bits is a joke for serious work. You'd be better with Qwen3.6 under 30G probably.

But there are EU only providers for GLM5.2. For example tensorx. Depending on your definition of "secure" it may be acceptable.

> 2bits is a joke for serious work.

I have not tried it but I will take your word on it. I don't think Qwen3.6 cuts it for large scale coding work. Reading issues, reading code sure, but biting into large issues no, it goes off the track consistently.

Depending on budget it may also be affordable to spin up servers to run it on demand.