Toxicity is dehumanizing language, threats, doxxing, encouraging self-harm, that sort of thing. We have taught it examples of various levels, so it can align with those to report a score. Something like an unpleasant, insulting attitude to someone personally is fairly low on the toxic scale (but still toxic, it's not the right way to interact), whereas threats of violence or encouraging self-harm are very high.
Toxicity is dehumanizing language, threats, doxxing, encouraging self-harm, that sort of thing. We have taught it examples of various levels, so it can align with those to report a score. Something like an unpleasant, insulting attitude to someone personally is fairly low on the toxic scale (but still toxic, it's not the right way to interact), whereas threats of violence or encouraging self-harm are very high.