What if one is okay with walking away and letting it run (crawl)? Then context can be 128 or 256GB system RAM while the model is jammed into what precious VRAM there is?
What if one is okay with walking away and letting it run (crawl)? Then context can be 128 or 256GB system RAM while the model is jammed into what precious VRAM there is?