Doesn't Windows already do this by default? I can already run models bigger than my GPU VRAM and it will start using up to 50% of my system RAM as "shared memory". This is on a Desktop PC without a shared memory architecture.
Doesn't Windows already do this by default? I can already run models bigger than my GPU VRAM and it will start using up to 50% of my system RAM as "shared memory". This is on a Desktop PC without a shared memory architecture.
Yep I had a GeForce 750 Ti (2 GB) and I was able to run a ton of things on Windows without any issues at all.
As soon as I switched to Linux I had all sorts of problems on Wayland where as soon as that 2 GB was reached, apps would segfault or act in their own unique ways (opening empty windows) when no GPU memory was available to allocate.
Turns out this is a problem with NVIDIA on Wayland. On X, NVIDIA's drivers act more like Windows. AMD's Linux drivers act more like Windows out of the box on both Wayland and X. System memory gets used when VRAM is full. I know this because I got tired of being unable to use my system after opening 3 browser tabs and a few terminals on Wayland so I bought an AMD RX 480 with 8 GB on eBay. You could say my cost of running Linux on the desktop was $80 + shipping.
A few months ago I wrote a long post going over some of these details at https://nickjanetakis.com/blog/gpu-memory-allocation-bugs-wi.... It even includes videos showing what it's like opening apps both on Wayland and X with that NVIDIA card.
The nvidia windows driver enables RAM swapping by default.
Great way to backstab you if you prefer inference speed.
I don't think Windows does this, but Ollama does
It's the drivers but it was a relatively recent addition, think it was added when either the 30xx or 40xx series shipped and the lower cards had pitiful VRAM so they enabled it by default so they'd work with all games.
Most people who know it does this turns it off because it kicks in too early so if you have 24GB it'll offload to RAM and tank your inference speed when you hit around 22GB use.
https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/s...
Nicely linked!
The Nvidia driver has used system memory fallback for a couple of years now.
https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/s...
NVIDIA's GPU drivers on windows 100% do this
https://i.imgur.com/c0a3vUy.png