Hacker News

You can see their general approach to guardrail classifiers in these posts:

It's not just keyword matching, but I'm sure they tuned the Fable classifiers pretty hard to avoid false negatives.