Wow, a 5060Ti. 16gb + I'm guessing >=32gb ram. And here I am spinning Ye Olde RX 570 4gb + 32gb.

I'd like to know how many tokens you can get out of the larger models especially (using Ollama + Open WebUI on Docker Desktop, or LM Studio whatever). I'm probably not upgrading GPU this year, but I'd appreciate an anecdotal benchmark.

  - gemma3:12b
  - phi4:latest (14b)
  - qwen2.5:14b [I get ~3 t/s on all these small models, acceptably slow]

  - qwen2.5:32b [this is about my machine's limit; verrry slow, ~1 t/s]
  - qwen2.5:72b [beyond my machine's limit, but maybe not yours]

I'm guessing you probably also want to include the quantization levels you're using, as otherwise they'll be a huge variance in your comparisons with others :)

True, true. All Q4_K_M unless I'm mistaken. Thanks