For the past month, I've been claiming that $20/mo codex is the best deal in AI.

Now I'm going to have to find the new best deal.

Check out z.ai coder plan. The $27/mo plan is roughly the same usage as the 20x $200 Claude plan. I have both and Claude is a little better, but GLM 5.1 is much better value.

Agreed, I use Z.ai and the usage is fantastic the only temper that recommendation that it's often unreliable. Perhaps a few times per week it's unresponsive. Maybe more often it seems to become flakey.

It's very variable though recently I'm noticing it's more reliable but there was a patch where it was nearly unusable some days.

I guess I won't complain for the price and YMMV.

Agreed. They had a rough patch around the 4.7 to 5 upgrade. New architecture required hardware migration. The 5 to 5.1 upgrade was much smoother (same architecture new weights). As you say, little rough around edges, but still great value. Trick I learned is that it's max 2 parallel requests per user. You can put a billion tokens a month through it, but need to manage your parallelism.

If you're ok with a model provider that goes down all the time and has such a poor inference engine setup that once you get past 50k tokens you're going to get stuck in endless reasoning loops.

GH Copilot is still the best deal, while it lasts

I feel they will go token base at some point, currently if you only use it with precise prompts and not random suggestions, switch between models 5.4 and 5.4 mini depending on the work, it is the best deal.

Yeah, it's really good. Probably going to be the next best deal until they cut back.

I need to try the command line version.

> I need to try the command line version.

Is there any other?

Already paying for Google photo storage, AI pro for an extra $7 is a steal with anti-gravity.

Good luck sticking within limits, I have been burning up my baseline limits insanely fast within a few prompts, a marked change from a few weeks ago.

There's a few complaints online about the same happening to multiple users.

Otherwise anti-gravity has been great.

I use the free Chat AIs all the time; Claude, ChatGPT, Gemini, Grok, Mistral.

In the last month they have all clamped down quite heavily. I use to be able to deep-dive into a subject, or fix a small Python project, multiple times per day on the free Web UIs.

Claude, this morning, modified a small Python project for me and that single act exhausted all my free usage for the day. In the past I could do multiple projects per day without issue.

Same with ChatGPT. Gemini at least doesn't go full on "You can use this again at 1100AM", but it does fallback to a model that works very poorly.

Grok and Mistral I don't really use that much, but Grok's coding isn't that bad. The problem is that it is not such a good application for deep-diving a topic, because it will perform a web search before answering anything, making it take long.

Mistral tends to run out of steam very quickly in a conversation. Never tried code on it though.

I use a quota monitor and grind out code on Gemini 3 flash. Only go to sonnet or pro is there's issues flash can't deal with or I have a critical architecture I need nailed on the first try.

I still review every line generated.

Gemini 3.1 pro on the web interface still works if my problems are scoped to a single module or two and my better model quotas are exhausted in the IDE.

For $7 over what I was already paying for storage, primarily using flash is still a good development experience for me.

That's only good for the web based UI. If you want Gemini API access which is what this article is about then you must go the AIStudio route and pricing is API usage based. It does have a free usage tier and new signups can get $300 in free credits for the paid tier so it's I think it's still a good deal, just not as good as using the subscriptions would be.

No? Isn't the article about Codex, which is roughly equivalent to "Gemini CLI" and Google's Antigravity? Google's subscriptions include quotas for both of those, albeit the $20 monthly "Pro" plan has had its "Pro" model quota slashed in the last few weeks. You still get a large number of "Gemini 3 Flash" queries, which has been good enough for the projects I've toyed with in Antigravity.

I guess that's true but I find Google's models better than their public tooling. The Pro subscription includes "Gemini Code Assist and Gemini CLI" but the Gemini Code Assist plugin for IntelliJ which is my daily driver is broken most of the time to the degree that it's completely unusable. Sometimes you can't even type in the input box.

The only way I can do serious development with Gemini models is with other tooling (Cline, etc) that requires API based access which isn't available as part of the subscription.

I agree. Gemini models are held back by their segmentation of usage between multiple products, combined with their awful harnesses and tooling. Gemini cli, antigravity, Gemini code assist, Jules.... The list goes on. Each of these products has only a small limit and they must share usage.

It gets worse than that though. Most harnesses that are made to handle codex and Claude cannot handle Gemini 3.1 correctly. Google has trained Gemini 3.1 to return different json keys than most harnesses expect resulting in awful results and failure. (Based on me perusing multiple harness GitHub issues after Gemini 3.1 came out)

Google is by far the best deal for AI, they give you so many 'buckets' of usage for a variety of products, and they seem to keep adding them.

If you aggressively use all buckets Google is incredibly generous. In theory for one AI pro subscription you can get what is a ridiculous return in investment in a family plan.

You could probably be charging google literally thousands if all 6 members were spamming video and image generation and antigravity.

The family sharing is the real hack lol. I don't think any other provider does that.

I bought one of the google AI packages that came with a pile of drive storage and Gemini access.

Unfortunately gemini as a coding agent is a steaming useless pile. They have no right selling it, cheap open weight Chinese models are better at this point.

It's not stupid it just is incompetent at tool use and makes bad mistakes. It constantly gets itself into weird dysfunctional loops when doing basic things like editing files.

I'm not sure what GOOG employees are using internally, but I hope they're not being saddled with Gemini 3.1. It's miles behind.

Are you using gemini CLI or antigravity? The former is not really comparable to the latter in terms of quality. I wouldn't say antigravity is as good as the competition but it's pretty close. Miles behind is overstating it.

Gemini CLI but also used the Gemini models via opencode. They're terrible at CLI tool use. Like I said, just editing text files, they fall over rapidly, constantly making mistakes and then mistakes fixing their mistakes.

Antigravity wants me to switch IDEs, and I'm not going to do that.

Gemini 3.1 is a good coding agent. We've been totally spoiled now. Also, if you use Antigravity you can burn up Opus 4.6 credits off your Goog account instead, before you have to switch to Gem 3.1.

What has actually changed? It's unclear how much can you do right now, unless they've already switched you to the new plan and you're speaking from experience.

We are exiting a hype cycle, well into the adoption curve. Subscriptions were never going to last.

My next step is going to be evaluating open and local models to see if they are sufficiently close to par with frontier models.

My hope is that the end of seat based pricing comes with this tech cycle. I was looking for document signing provider that doesn't charge a monthly, I only need a few docs a year.

I'm developing software in this area right now, so I try a lot of the new models. They're not even close for coding tasks. It basically comes down to 26b parameters vs 1T parameters / quantisation / smaller context sizs, there's no comparison. However, for agentic work, tool calling, text summarisation, local LLMs can be quite capable. Workloads that run as background tasks where you're not concerned about TTFB, cold starts, tok/s etc., this is where local AI is useful.

If you have an M processor then I would recommend that you ditch Ollama because it performs slowly. We get double or triple tok/s using omlx or vmlx, respectively, but vmlx doesn't have extensive support for some models like gpt-oss.

Kimi K2.5 (as an example) is an open model with 1T params. I don't see a reason it has to be local for most use cases- the fact that it's open is what's important.

That is just idealism. Being "open" doesnt get you any advantage in the real world. You're not going to meaningfully compete in the new economy using "lesser" models. The economy does not care about principles or ethics. No one is going to build a long term business that provides actual value on open models. They can try. They can hype. And they can swindle and grift and scalp some profit before they become irrelevant. But it will not last.

Why? Because what was built with an open model can be sneezed into existence by a frontier model ran via first party API with the best practice configurations the providers publish in usage guides that no one seems to know exist.

The difference between the best frontier model (gpt-5.4-xhigh or opus 4.6) and the best open model is vast.

But that is only obvious when your use case is actually pushing the frontier.

If you're building a crud app, or the modern equivalent of a TODO app, even a lemon can produce that nowadays so you will assume open has caught up to closed because your use case never required frontier intelligence.

A model with open weights gives you a huge advantage in the real world.

You can run it on your own hardware, with perfectly predictable costs and predictable quality, without having to worry about how many tokens you use, or whether your subscription limits will be reached in the most inconvenient moment, forcing you to wait until they will be reset, or whether the token price will be increased, or your subscription limits will be decreased, or whether your AI provider will switch the model with a worse one, and so on.

Moreover, no matter how good a "frontier model" may be, it can still produce worse results than a worse model when the programmer who manages it does not also have "frontier intelligence". When liberated of the constraints of a paid API, you may be able to use an AI coding assistant in much more efficient ways, exactly like when the time-sharing access to powerful mainframes has been replaced with the unconstrained use of personal computers.

When I was very young I have passed through the transition from using remotely a mainframe to using my own computer. I certainly do not want to return to that straitjacket style of work.

The vision has been that the open and/or small models, while 8-16 months behind, would eventually reach sufficient capabilities. In this vision, not only do we have freedom of compute, we also get less electricity usage. I suspect long-term the frontier mega models will mainly be used for distillation, like we see from Gemini 3 to Gemma 4.

first session with gemma4:31b looks pretty good, like it may actually be up to coding tasks like gemini-3-flash levels

you can tell gemma4 comes from gemini-3

I recently experimented creating a Python library from scratch with Codex. After I was done, I took the PRD and Task list that was generated and fed them to opencode with Qwen 3.5 running locally.

Opencode was able to create the library as well. It just took about 2x longer.

Which version of Qwen 3.5 did you use?

which quant as well

Not at my computer now, either 27 or 35b not quantized.

Next week I will be trying qwopus 27b.