The only benchmark they show against SOTA models is in bioweapons refusal.
Edit: nvm I can't read, regular benchmarks against SOTA are there
The only benchmark they show against SOTA models is in bioweapons refusal.
Edit: nvm I can't read, regular benchmarks against SOTA are there