I think that Anthropic is gaslighting us with their new model releases. Specifically, I think they have some good base model and are just fine-tuning it until they achieve desired outcome, or the desired outcome is achieved accidentally as part of fine-tuning. My theory is based on the fact that as a long-term (if you can call it that way) Claude user I keep noticing the same patterns it outputs. It's not trivial but certainly possible to see when something has been written by Claude because it has a different style than GPT.

However they have quite good harness in their backend which is the actual model.