It’s just still so trivial to jailbreak even the latest Anthropic models (via api, and not talking about the silly ENI or Pliny breaks) I don’t understand where the safety teams are doing their work. Is it in the default chat-trained model?

It's more of a research program than a product feature. No-one knows how to fully prevent a model from responding based on what's in its base training data, which is what you're seeing with jailbreaks.

And going to one of the roots of the issue - the base training data - comes with its own set of unsolved challenges, not least of which is the unavoidable subjectivity of what is or isn't "safe".