Hacker News

Sure, but integrated graphics usually lacks vram for LLM inference.

Which means that inference would be approximately the same speed (but compute offloaded) as the suggested CPU inference engine.

7 days ago [ - ]

[deleted]