First test question: "Is the UV Index a good proxy for when to wear sunglasses." Immediately triggered the safety filter ... oh dear.
First test question: "Is the UV Index a good proxy for when to wear sunglasses." Immediately triggered the safety filter ... oh dear.
It triggered for me when I asked "Web search for your own model card (released today) and pick out your favourite highlights from the pdf"
Did not trigger for me (Fable answered the question), so I guess the filters are either non-deterministic or are still being tweaked.
Interesting, I assumed all model-routing was done utilizing an LLM. (I.e. non-deterministic.)
It’s possible that there’s a set of words or phrases that route deterministically to save money on obvious stuff.
I kind of wonder, though, which model they’re using to do the routing. It seems like a huge added cost to do these kinds of checks on every request
Wasn't it leaked in the Claude Code source that it was all regex?
[dead]
Iirc correctly Opus 4.7 had the same problem, safety filters were triggered way too easily at the beginning.
sunglasses _are_ safety filters