Hacker News

I wonder if you could use something like vLLM and have these subagents max out your local gpu. I’ve been looking for using a local model because I’m tired of rate limits of the cloud and also would really like to make use of my local gpu when I’m working (5090h even if it’s not on my computer I’m typing on.