Just this morning I tweaked my single 3090 setup too:
OLLAMA_FLASH_ATTENTION=1
OLLAMA_KV_CACHE_TYPE=q8_0
OLLAMA_CONTEXT_LENGTH=180000
and that fits in 23GB.[edited for format]
Just this morning I tweaked my single 3090 setup too:
OLLAMA_FLASH_ATTENTION=1
OLLAMA_KV_CACHE_TYPE=q8_0
OLLAMA_CONTEXT_LENGTH=180000
and that fits in 23GB.[edited for format]