If you're gonna fine-tune for a closed set classification problem like this, you could just fine-tune BERT and get a faster model with better performance.