Indeed. I've tried to run it locally this but couldn't get it running on my measly gaming-spec workstation.
It's seems you need lot's of ram and vram. Reading the issues on github[1], it does not seem many others have had success in using this effectively:
- someone with a 96 Gb VRAM RTX 6000 Pro had cuda oom issues
- someone somehow made it work on a RTX 4090 somehow, but RTF processing time was 12...
- someone with a RTX 5090 managed to use it, but with clips no longer than 20s
It seems utility of the model for hobbyist with consumer grade cards will be low.
[1]: https://github.com/facebookresearch/sam-audio/issues/24