That's what I've been doing. I use crush normally. While the codebase are by no means huge, they're not tiny either.

Are you using it in an agentic workflow? Just reading the codebase will consume a lot of cached tokens, but seemingly, z.ai counts these as normal input tokens the way they're rate limiting.

I'm not entirely sure what an agentic workflow could mean today but I think so. I use a coding agent (crush), prompt it to brainstorm an implementation with me (or sometimes I know exactly how I want to implement it but ask it to challenge it), correct any wrong assumptions or request the implementation to look differently than suggested if I don't like it. Then finally when I'm positive I've cleared the most important assumptions I ask it to actually write and edit files and run tests and such (this just ends up being a "implement this").

With any model I've tried I've found it to be a huge pain to have it fix things where it made a wrong assumption without the code becoming a mess and burning a lot of tokens. I'm aware that not everyone works like this but I'm still very opinionated on what the end result should look like so I can still work on it without an LLM.