> it technically requires less GPU processing to run
Not when you have to scale. There's a reason why every LLM SaaS aggressively rate limits and even then still experiences regular outages.
> it technically requires less GPU processing to run
Not when you have to scale. There's a reason why every LLM SaaS aggressively rate limits and even then still experiences regular outages.