Antigravity may well Top the whatever benchmark but:
My Antigravity (forced) replacement for Gemini CLI requires me to log on via browser every time I use it, and my Antigravity IDE won't update at all, so:
If it's ok I'd prefer they just work on reaching a baseline acceptable rollout before worrying about being Top in anything.
Ps actual title:
OpenSCAD LLM Benchmark: Building the Pantheon
I agree, my main concern regarding Google AI products is this endless pain around the UX of login / billing / upgrades / product sunsets... but their LLM models are good and Antigravity 2.0 is not that bad either (unless you lost all you Antigravity 1.0 setup and projects - like many people did)
I just left the google I/O feeling less confident about google's execution here. - Gemini 3.5 flash is strange. Old cutoff, basically better than 3.1 pro at soem things worse at others, sometimes cheaper, sometimes more expensive than 3.1 pro. - Antigravity had seemed abandoned, and people speculated them cutting it off, and they kind of did migrating everyone to a new antigravity - Google "shipped the org chart" and they have so many AI products and none seem best of breed (e.g. the Gemini integration in google docs is worse than claude)
I was actually hoping for "Opus level intelligence at Haiku costs" model or "Sonnet level performance in Gemini 3.0 pricing", either of these would have been a workhorse, plus a competitor to Claude/Codex (1 app to do things). I got neither.
I just use Claude Code and intellij, so I don't understand why so many people complain about Antigravity ditching VS Code, what's the surface not covered by using Antigravity CLI + VS Code (or any other IDE)?
Gemini cli was open source. Antigravity cli is not. Not at feature parity, missing many features and now we are forced to migrate away from Gemini cli before anti gravity cli is ready.
The difference in its ability is immense. Even with less features it makes a lot of sense to switch. It really shows how much the harness matters almost equally to the model.
At least one of the missing features is a basic piece of functionality (showing token quota used). Without it, you're pretty much guaranteed to get locked out for a week with no warning.
I'm not GP, but I am somewhat excited about antigravity CLI. I adopted Gemini CLI early and really liked it, though over time it got dumber and dumber until a point when I realized it was foolish to use it instead of claude/codex. I'm hopefuly that antigravity CLI won't go through that path, but also can't fight a skepticism.
I don’t think it’s the cli that was dumber, just the model it was using. They drastically reduced limits on their best model so that’s likely how you got stuck downgrading model and getting worse results.
I'm sensing in reality that behind the scenes there is a difficult trade-off between quantization and usage limits. You can have a "smart" model but poor limits, or good limits and a "dumb" model.
This seems very similar to mobile data limits (remember those years?), where there wasn't enough tower bandwidth to serve everyone unlimited data, so telecos were in constant tension between data caps and bandwidth throttling.
It wasn't until 5G came along with 100x network capacity that they could finally give everyone "unlimited" data.
The forced upgrade from Gemini CLI which I liked as much, and as some ways better than Claude Code was bad. But them just sending out that email on Wednesday that basically said "Thanks for subscribing to Google One AI Pro, as of right now we're adding limits to your account. Tough shit you get nothing." left a REALLY bad taste in my mouth. I had previously praised the "AI Pro" subscription as a good value.
I quit AI Pro earlier this year for the same reason. I went to use it one day (I don't think I'd even used it much in the preceding week) and found that my limits had been reduced overnight and my usage was already too high. I had something like a 7 day wait until it reset.
I get you have to change limits, but reducing limits in a way which both applies retroactively and has a really long reset period is just infuriating. If they'd applied the new limits more gently or at the next billing period I'd probably have continued paying.
I don't mind paying a fair price for a service that provides value, but I really hate having a service I think I'm paying for rug-pulled with no clear justification.
Having my workflow disrupted is the main reason I never adopted Antigravity, despite liking it. I'm glad to see G is invested, but the older I get the more protective I am of my workflow.
And the only realistic way to protect our workflow is by avoiding vendor lock-in like the plague.
Exactly. I admit it's a bit extreme, but this is a big reason why I insist that neovim is my IDE, and I won't adopt anything else. If I can't make it work in neovim, I will move to something else (unless I have no choice, but that happens very rarely at this point).
I've got an AI pro plan and haven't been able to log in for months. Endless checking in with my google support guy. At least Dinesh wishes me good health every week, so that's nice.
Wild that it doesn't cache the creds.
Just to clarify: I believe it should cache them (it works for me).
So far I like it much more than Gemini CLI (my previous daily driver for personal projects). Seems more mature and "feels more intelligent" (very subjective ofc)
It does. It uses go-keyring under the hood, which has its own issues with certain systems.
If you're on WSL, getting dbus to work is a PITA. There may be other OS-level issues that folks are running into.
It requires a keyring service being installed (accessed over dbus) and if there isn't one it just silently doesn't store them anywhere. Pretty bad UX.
My (unfounded) guess is this is to prevent usage by other tools/openclaw. The browser login will have a fingerprinting to make sure you are a human.
"Pantheon" bloody hell, why is it people writing these articles are so up themselves, it's so overbearing.
The article is literally about asking these models to generate 3d models of the Pantheon.