It looks like it's Python, so it might be possible to use via https://github.com/livebook-dev/pythonx ? But the parallel huggingface/bumblebee idea was also good, hadn't seen or thought of, that definitely works for a lot of other models, curious if you get working! Some chance I'll play with this myself in a few months, so feel free to report back here or DM me!
I just decided to try this quickly and hit some issues on my Mac FYI, it might work better on Linux but I hit a compilation issue with `curated-tokenizers`, possibly from a typo in setup.py or pyproject.toml in curated-tokenizers, spotted by AI: -Wno-sign-compare-Wno-strict-prototypes should be -Wno-sign-compare -Wno-strict-prototypes so could perhaps fix with a PR to curated-tokenizers or by forking it...
Might well be other issues behind that, and unclear if need any other dependencies that kitten doesn't rely on directly like torch or torchaudio? but... not 5 mins easy, but looks like issues might be able to be worked through...
For reference this is all I was trying basically:
to get the above error.