Can you? I imagine e.g. Google is using material not available to the public to train their models (unsencored Google books, etc.). Also, the chat bots, like Gemini, are not just pure LLMs anymore, but they also utilize other tools as part of their computation. I've asked Gemini computationally heavy questions and it successfully invokes Python scripts to answer them. I imagine it can also use other tools than Python, some of which might not even be publicly known.

I'm not sure what the situation is currently, but I can easily see private data and private resources leading to much better AI tools, which can not be matched by open source solutions.

While they will always have premiere models that only run on data center hardware at first, the good news about the tooling is that tool calls are computationally very minimal and no problem to sandbox/run locally, at least in theory, we would still need to do the plumbing for it.

So I agree that open source solutions will likely lag behind, but that's fine. Gemini 2.5 wasn't unusable when Gemini 3 didn't exist, etc.

Yes, because local models can run Internet search tools. Even the big boys like openai etc I prefer the results quality when it's made a search - and they seem to have realised this too, the majority of my queries now kick off searches.