To have a gpu inference, you need a gpu. I have a demo that runs 8B llama on any computer with 4 gigs of ram
https://galqiwi.github.io/aqlm-rs/about.html
Any computer with a display has a GPU.
Sure, but integrated graphics usually lacks vram for LLM inference.
Which means that inference would be approximately the same speed (but compute offloaded) as the suggested CPU inference engine.
Any computer with a display has a GPU.
Sure, but integrated graphics usually lacks vram for LLM inference.
Which means that inference would be approximately the same speed (but compute offloaded) as the suggested CPU inference engine.