It also means giving up vision which I don't know how I would deal with. I think I would prefer a weaker model with vision than a stronger without.

It's odd that the model doesn't support it directly, but they at least have https://docs.z.ai/devpack/mcp/vision-mcp-server

Openrouter definitely supports vision models. Why would you have to give up vision?

> Why would you have to give up vision?

Because you would have to switch model.

You can't just say "Oh, button X looks weird see [screenshot]" while coding with GLM. You would need to switch to another model and then maybe back.

For example if I want to paste a screenshot of what I mean, I can't.

If you using opencode or similar you can just temporarily switch models -- in the same session -- to something that has vision and have it look at your image. And then switch back.

Or create an agent or subagent that just looks at images, and specify a vision model for that agent.

I don't see how that helps, I would still need to somehow get the image into the coding model's context.

vision runs just fine locally for most usecases, so it's really just a skill to call that Ollama instance

Why's that?