The best use is actually for a layer that "almost fits" into VRAM, such that automated offloading to system RAM will be rare enough that it doesn't impact performance.
The best use is actually for a layer that "almost fits" into VRAM, such that automated offloading to system RAM will be rare enough that it doesn't impact performance.
As in when your secondary memory is fast enough, after the first 10% of the model are processed you can swap their memory with the part for 50% to 60% and when that is done you swap back to have the 0-10% ready in time for the next iteration?
Sounds ambitious, for the small improvement in effective capacity. In particular when I start wondering if real life speed differences would be small enough for that 10% increase, or if it would be even smaller. And that's before factoring in power/cooling cost for saturating another interface.