For future ML developers: A post like this should include system requirements.
It's not clear from the blog post, the git page, and most other places if this will run on, even in big-O:
* CPU
* 16GB GPU
* 240GB server (of the type most business can afford)
* Meta/Google/Open AI/Anthropic-style data center
Indeed. I've tried to run it locally this but couldn't get it running on my measly gaming-spec workstation.
It's seems you need lot's of ram and vram. Reading the issues on github[1], it does not seem many others have had success in using this effectively:
- someone with a 96 Gb VRAM RTX 6000 Pro had cuda oom issues
- someone somehow made it work on a RTX 4090 somehow, but RTF processing time was 12...
- someone with a RTX 5090 managed to use it, but with clips no longer than 20s
It seems utility of the model for hobbyist with consumer grade cards will be low.
[1]: https://github.com/facebookresearch/sam-audio/issues/24
It realy depends on your runtime environment, but I agree it would be nice to have some references with commonly used setups.
It does, but my comment was "even in big-O."
Environments might mean the difference between e.g. 16GB and 24GB, but not 16GB and 160GB.