First test question: "Is the UV Index a good proxy for when to wear sunglasses." Immediately triggered the safety filter ... oh dear.

It triggered for me when I asked "Web search for your own model card (released today) and pick out your favourite highlights from the pdf"

Did not trigger for me (Fable answered the question), so I guess the filters are either non-deterministic or are still being tweaked.

Interesting, I assumed all model-routing was done utilizing an LLM. (I.e. non-deterministic.)

It’s possible that there’s a set of words or phrases that route deterministically to save money on obvious stuff.

I kind of wonder, though, which model they’re using to do the routing. It seems like a huge added cost to do these kinds of checks on every request

Wasn't it leaked in the Claude Code source that it was all regex?

[dead]

Iirc correctly Opus 4.7 had the same problem, safety filters were triggered way too easily at the beginning.

sunglasses _are_ safety filters