Hacker News

I was wondering the same, but llama.cpp was written to offload to system ram. If this really works, then the advantage could be that one could run transformers / sglang, etc or other tools that don't offload to system ram. However, I want to see the numbers. Perhaps I'll give this a try, but I need a throw away box I could trash if something goes wrong, but have none at the moment.