> The core idea is that if a model’s weights or training artifacts could enable catastrophic harm, you should treat them like top-tier secrets and secure them accordingly.
My read here is that you're implying that if an attacker has access to, for example, weight data, they can invariably find a way to exploit it.
If that's a correct assumption, I think you're playing an unwinnable game, since attackers always have indirect access through inference of the model. It feels like locking down weights/training data/etc is the ai version of security through obfuscation.
Just my 2c, for what it's worth
Thanks for the insight!
I think this is exactly why some of the work is moving away from “assume unrestricted API inference forever.”
For example, we’re prototyping ideas like air-gapped or very low-bandwidth inference gateways, where interaction happens over narrow channels (serial, optical, audio, etc.), with explicit threat models and monitoring. The point isn’t that this’s practical for today’s models, but to reason about what inference might look like for AGI/ASI-level systems where the risk profile is fundamentally different.
Others are thinking along similar lines too. For example, this SPAR project on constrained and minimal inference pathways: https://sparai.org/projects/sp26/rec7NyTst8Upfp83l
Great, thanks for adding some nuance!