If Chrome has the #optimization-guide-on-device-model and #prompt-api-for-gemini-nano flags enabled, either because it's part of some Origin Trial / Early Stable Release or something, then web pages will have access to the new Prompt API which allows any webpage to initiate the (one-time) download of the ~2.7 GiB CPU or ~4.0 GiB GPU model using LanguageModel.create()

https://developer.chrome.com/docs/ai/prompt-api

When Chrome 148 releases tomorrow, this will be the default behaviour on desktop.

To download, it should check for 22 GiB free disk space on the volume where your Chrome data dir is, and at least double the model size of free space in your tmp dir.

First the tabs came for the RAM and i did not protest, for i had plenty. Then they came for the chip and i did not protest, for it was dark silcon anyway. Then they came for the HDD.

And then they made the ram and ssd so expensive :)

I am curious if it reuses the LLM across all tabs, hard to imagine most machines can boot up 1-2 of any 4gb model unless its a more powerful system.

I think it obviously will, what would be the benefit to spinning up more than one copy?

It should only need to load one copy of the weights, but each tab/site will need a separate context and KV cache.

Okay, but the browser is basically the computer for most people.

Told ya.

The more severe problem is that Google installs model weight files on a per-user basis, meaning Chrome occupies 4 more GB of space for every OS user on your device.

The company I work at has several environments and hundreds of VDI users in each environment. Chrome is the default browser in all of them. By my rough napkin math, this one small change by Google will eat up at least 15 terabytes of new disk space in total. (I sure hope we are using deduplication at the physical storage layer...)

It's fine. Network and disk space are free, right?

Compared to human labor it is.

Only because those who can save on the labor are not paying for the increased resource use in the first place.

Shouldn't the filesystem be set to encrypt everything before it hits the physical storage layer?

Thankfully deduplication is a thing ;)

I certainly hope you don't automatically update.

Does your place review every line of every update patch note? Do you think you would catch this implication?

For every profile.

Does each playwright (or similar automation system) count as a different user, and does it keep the model around ?

If yes, it's an interesting API to call when a AI crawler hit your website.

4GB, $0.10 (whatever the HD price) that is the equivalent of a High School level intelligent brain that can perform many cognitive tasks (and in the future even PhD level intelligence) for free?

Oh, the horror!!!

Wait, let me pay my HVAC guy $500 he deserved because he came all the way from his home to replace a fuse

It doesn't make sense to apply wholesale prices for mass storage. People are running Chrome on specific devices that they already own. Storage is not fungible in this way.

If you’re pissed you had to pay your HVAC guy to drive to your house and do something you think is trivial, why didn’t you do it yourself?

As the saying goes, gp didn't pay $500 to have the fuse replaced, he paid $500 for the training and experience that was required to know that the fuse had to be replaced.

> 4GB, $0.10 (whatever the HD price) that is the equivalent of a High School level intelligent brain that can perform many cognitive tasks for free?

This is better than my current solution of an actual human with masters degreed intelligence performing all my cognitive tasks for free how? I mean, i'm the first to admit i'm extremely lazy and even i'm over here like "really??"

> Wait, let me pay my HVAC guy $500 he deserved because he came all the way from his home to replace a fuse

Right, because its totally something an LLM can do, right?

Here is your google brain on your device, whether you want it or not.

I don’t think you understand what “free” means

Tell that to Apple, I'm sure they will allow me to pay $0.025/GB for additional storage on my Macbook /s

It's annoyingly imposible to add more disk space to laptops. I think mine is soldered.

Apple laptops maybe. In many others it's just a normal M.2 NVMe module behind a screwed on bottom case plate.

Also note the Mozilla standards position on this API: https://github.com/mozilla/standards-positions/issues/1213#i...

Or this summary on its status:

> Mozilla: Opposed

> WebKit: Opposed

> Microsoft: Several concerns

> W3C TAG: Several concerns

> Developers: Mostly negative

From https://mastodon.social/@jaffathecake/116527007495775507

You can already trigger a 2 GB model download with the Summarizer API[0], which is already shipped in Chrome.

    Summarizer.create()
[0]: https://developer.chrome.com/docs/ai/summarizer-api#model-do...

I think this is a distinct model from the Prompt API, since the other shipped AI APIs use fine tuned models.

Both of them say they use Gemini Nano.

[deleted]

So now we're up to 6 GB

Per user

The problem is that some of us are still on connections that charge per GB in rural areas. Here in Montana it's very common to pay about $0.25 per GB regardless of how much you use, so this is a $1 additional cost per desktop device. Places like public school districts have hundreds of computers and this will be somewhat significant for them.

I was thinking a similar thing. Many of our customers have purpose use computers that rarely see physical infrastructure internet, but need a modern browser (many chose Chrome on their own, we never recommended it).

They're going to get blasted with cellular data charges when they fire up their computer in the field.

Google's updater service also currently ignores the windows 11 metered connection hint. It will gladly download that model over your cell connection even if you have a data cap.

This is infuriating behavior.

Silicon Valley must wake up and understand the entire world does not live like them.

It is a small model, so what utility can I / Google expect from it? What is the on-board model used for?

It's not a very good small model to be honest.

That said, you might be surprised to learn that some of the models from 3b-9b could probably replace 80% of the things nonvibe coders use chatgpt for.

Its a good idea to run small models locally if your computer can host them for privacy and cash saving reasons. But how can you trust Google to autoinstall one on your machine in 2026? I just couldn't do it.

Sure, local models good and yes, there's no way we can trust Google.

We can be positive the entire motivation of Chrome is user behavior surveillance. There's not a nano-chance in all the multiverses that Chrome model is doing anything privately. They've gone to extraordinary length to accomplish this. It's not for free.

It is entirely about user surveillance as well as pushing their product on to their users because they have the install base. Google Chrome has become Microsoft IE6 in hostile user behavior.

If Google were focused on surveillance, why haven't they been collecting keystroke data (like grammarly) for years?

You either die a hero or live long enough to see yourself become a villain.

What did we expect when they dropped "don't be evil" from their company values?

A claim about as useful then as it is now. They never wanted to be anything but, once Sergei left. The Schmidt era had them publicly declare one thing while doing something else entirely behind the curtain.

They were corporate evil from day 1. The rest was just PR slogans, and playing the good guy as long as you don't need to squeeze profits.

Isn’t it really “pushing a feature to their products”?

Not when you are appropriating 2GB or more of space for that feature.

I don't trust them either, but the same Google makes Gemma 4 available to run as locally and privately as you want, and those models are pretty amazing for their size.

Both can be true: they give a nice local model so you find it useful AND the chrome harness captures every token in and out for exfiltration.

LLMs are costing Google a ton of money in compute and storage right now. If they can farm any of that off to the users, it makes economical sense.

But yes, there is a 100% chance that logs will get sent back to Google too.

> farm

Ooh, this is interesting. There's nothing stopping them from sending jobs down to local machines. That's some 3 billion nodes. We went through this with coin mining and spam botting.

Nothing stopping it except your ire if it's discovered.

> But how can you trust Google to autoinstall one on your machine

Why are AI models something I'd be uniquely unable to trust Google to install, compared all the other code included in Chrome updates? Is your point just that you shouldn't trust Chrome in general?

Yes I would not trust Google or chrome. They have a history of class action lawsuits for doing shady things to users. Enabling them to condense data on your machine and transmit it however they want, should they choose too is suspect to me.

Google is probably still sucking up the contents of your LLM requests even with the model running locally.

Yeah, so unclear why yer again everyone is so quickly running for the pitchforks & torches. The model doesn't do anything, it's just a sandbox.

I'm really tired of such overinflated ridiculousness shrillness against Google. Yes there are very real tensions to this company and their as business is scary as heck.

But folks don't seem capable of processing duality, don't seem to be able to do much but ad-hominem until they pass out. Its really so exhausting having such empty energy charging in every single time, and it keeps obstructing any ability to think straight or assess.

I was waiting for Google to pull a local LLM onto Chrome/Android devices. It opens up some revenue streams that weren't easily possible before: for example the often memed "I was talking about cigars with my wife one single time and now all I see are adsense ads for cigars" gets much easier with a local model doing speech to text and topic classification.

> Yeah, so unclear why yer again everyone is so quickly running for the pitchforks & torches.

Cause everyone loves a good bonfire and a fresh hot roast.

> The model doesn't do anything, it's just a sandbox.

Doesn't that make it worse? They forced everyone to download 4GB of crap for nothing. They could have done one of two things:

(1) bundle the model with the application so you can tell ahead of time you're signing up for 4GB of bandwidth usage or

(2) make downloading the model some kind of opt-in thing.

Either of those would have worked. Just because you can easily tolerate 4GB of unplanned bandwidth usage doesn't mean everyone who can't is wrong.

The point is that what you're "sick of" isn't actually authentic human thought, but in reality you're responding to a recent european-driven propaganda campaign with the goal of deriding anything and everything related to US tech.

All that matters is some MBA product manager at Google was celebrated for shipping this. Hooray!

Everyone who implemented or approved this should be prosecuted under the Computer Fraud and Abuse Act (18 U.S.C. § 1030). If I was on a jury, I wouldn't hesitate to send them to prison where they belong.

What is the principle you’re using here?

A fair and impartial jury is a fundamental part of freedom. I genuinely cannot believe that we have been reduced to wanting to destroy the jury system to punish companies we don’t agree with. At this point, this is less activism and more weaponized disrespect for fundamental freedoms.

[dead]

> That said, you might be surprised to learn that some of the models from 3b-9b could probably replace 80% of the things nonvibe coders use chatgpt for.

Really? I'm a total amateur when it comes to doing anything with local models but I tried a few in this range using ollama at this point, and they didn't seem to know much about anything, and I couldn't figure out how to get them to search the web or run other tools, so that was where the experiment ended.

A small local model that can use bash would be a bit of a game-changer for me.

The latest small models are now reliable enough at simple tools like web search I think. It's just afaik none of the user friendly harnesses like ollama or LMStudio have a real one-click setup flow for this. You'll need to download models and do a fair bit of tool configuration.

Gemini CLI can use bash and run on the Gemma local model.

Local models are improving quickly so if you keep an eye open you’ll find something soon enough. But from experience, I’ll warn you that local models can lose the plot very quickly. Their little self arguments when they get stuck usually come down to:

- It failed? This must be a mistake, I’ll try it again. It failed? This must be a mistake, I’ll try it again because then I will complete the task (repeat about every six seconds until you rescue it).

- You know, the best way to deal with a permissions problem is to erase the entire system. That’ll definitely solve those pesky permissions and I’ll complete the task.

Which is why I uninstalled Chrome a (short...) while ago and my life went on unbothered.

I am amused when people fret about not using Chrome. I get it but… I have literally NEVER used Chrome. Perhaps I just don’t know what I am missing but the web seems to work just fine for me without it?

Touché…

Half of the reason to use local AI is to circumvent the censorship that Google, OpenAI and so on have. I don't want this Google crap on my computer.

It's based on Gemma 3n, and it's not the best.

I find it works fine for simple classification, translation, interpretation of images & audio. It can write longer prose, but it's pretty bad.

It can also write text in the format of a JSON schema or regexp for anything you might want to do with structured data.

I wonder why they’re using Gemma 3 and not Gemma 4?

Google has been trialling the Prompt API in chrome for the over a year, so before Gemma 4 existed. But they are indicating they'll move to Gemma 4: https://groups.google.com/a/chromium.org/g/blink-dev/c/iR6R7...

So that the big news in non-tech news sites will be the update. Thus ensuring that this is received in a positive light.

It'll probably update to that without telling you at some point.

I find models of this size (not tested this one specifically) at being very good at simple data extraction from user input. Think about things like parsing date and time of an event from a description or parsing a human-typed description of a repeating event rule.

this is considered a large model. i think you might be surprised how many "small" models chrome has already pulled down on your disk.

but to answer your question: one of the services that uses a small model: PermissionsAIv4

""" Use the Permission Predictions Service and the AIv4 model to surface permission notification requests using a quieter UI when the likelihood of the user granting the permission is predicted to be low. Requires `Make Searches and Browsing Better` to be enabled. – Mac, Windows, Linux, ChromeOS, Android """

[dead]

I ran a fairly large production test of this and on _every_ measure except for privacy it was worse than a free tier server hosted LLM.

Not happy about that as I would like to see more local models but that's the current state of things.

https://sendcheckit.com/blog/ai-powered-subject-line-alterna...

> on _every_ measure except for privacy it was worse than a free tier server hosted LLM

Would you be able to compare this to other local models in it's class and a above that would fit consumer-grade hardware?

> It is a small model, so what utility can I / Google expect from it?

Precedence for shipping models alongside consumer software.

Potentially without consent if it truly is a silent install.

Something to do with serving more ads. My guess is they will use this to “better target” or to drain more information from you for their ads.

Those two (and more) exist in chrome://flags in Chrome 147. I'm disabling them now, with the expectation that will prevent the new default.

One option I'm leaving as default is "Use LiteRT-LM runtime for on-device model service inference." Any comment on that?

I'm on Chrome 147 too and disabled:

"optimization-guide-on-device-model"

- Enables optimization guide on device

"prompt-api-for-gemini-nano"

- Prompt API for Gemini Nano

- Prompt API for Gemini Nano with Multimodal Input

and deleted weights.bin and the 2025.x folder in "OptGuideOnDeviceModel"

Will report if Chrome 148 downloads the model again.

If you touch those files into existence and chown to root and chmod to 0, it shouldn’t be able to ever overwrite them right?

You want to use chattr +i (make the empty file immutable)

I'm on my phone now so I can't check if something has changed, but what you want to protect from change is the directory, not the files. A file can be deleted and created again if the process can write the directory.

yeah, should work. Will try readonly on windows too.

Now I can't see it anymore, but shouldn't the model be under chrome://on-device-internals/ -> model-status?

Maybe you can uninstall there too.

maybe I was on the wrong side of the early release but I’ve deleted this model many times in the last year. I’ve had it for at least 12 months.

thanks, went to flags in Vivaldi and just in case disabled all flags containing "gemini" and first five results for "model"

Those flags will exist already, but will default to enabled in 148.

That other flag is for using a different open-source inference engine to the (from what I can tell) closed-source one that's used by default.

[dead]

So my understanding of that is that the download happens only when sites call the Prompt API right?

Because my Chrome stable has been updated to v148 now, and I don't see any AI models in my user profile folder. My profile size is only 328 MB, with the Code Cache subfolder occupying the most space (135 MB).

In my understanding, yes. I wrote a blog post about some of the internals here: https://news.ycombinator.com/item?id=48028662

Searching about:flags for model comes up with a whole bunch:

#omnibox-ml-url-scoring-model

#omnibox-on-device-tail-suggestions

#optimization-guide-on-device-model

#text-safety-classifier

#prompt-api-for-gemini-nano

#writer-api-for-gemini-nano

#rewriter-api-for-gemini-nano

#proofreader-api-for-gemini-nano

#summarizer-api-for-gemini-nano

#on-device-model-litert-lm-backend

Then around gemini but not caught by the search for models: #skills (maybe? I think this is implied by "gemini in chrome"?)

edit: I don't see a carte blanch AI disabling option. As much as I dislike Mozilla's growing obsession with AI, at least they give me a top level option to disable all AI stuff. I only keep Chrome around for occasional testing reasons.

I wrote a more detailed blog post here:

https://news.ycombinator.com/item?id=48028662

Next step: Invoke the prompt API from within online ads and run a "p2p" AI inference provider which forwards incoming LLM queries to website visitors. :-)

This sounds perfectly reasonable. No objection from me.

Do you know if also Chromium has thesenfkags enabled?

Depends on where you get it. By default the flags will be enabled, but some packagers may choose to disable them. I haven't seen a major distro release chromium 148 yet.

Weirdly though, chromium won't be able to actually use the model even though it can download it, because the inference engine is a closed-source blob.

https://adsm.dev/posts/prompt-api/#which-browsers-support-th...

I believe webpages that use the API must request from the user via a system permissions dialogue to aces the prompt API, according the docs a few months ago.

It can only be called after the user has interacted with the page, but there's no dialogue from the browser

https://developer.chrome.com/docs/ai/get-started#user-activa...

[dead]