It's not a fixed split. I don't know if it's possible live, or if it requires a reboot, but it's not hardwired.

I want to know if it's possible. 4GB for Linux, a bit of room for the calculations, and then you can load a 122GB model entirely into VRAM.

How would that perform in real life? Someone please benchmark it!

You're still thinking of the old school thing, where you set the split in the firmware and it's fixed for that boot. There's dynamic allocation on top of it these days.

I have that split set at the minimum 2 GB and I'm giving the GPU a 20 GB model to process.