I only have 8gb of vram to work with currently, but I'm running OpenWebUI as a frontend to ollamma and I have a very easy time loading up multiple models and letting them duke it out either at the same time or in a round robin.

You can even keep track of the quality of the answers over time to help guide your choice.

https://openwebui.com/

Be aware of the the recent license change of "Open"WebUI. It is no longer open source.

Thanks, somehow I missed that.

https://docs.openwebui.com/license/

AMD 6700XT owner here (12Gb VRAM) - Can confirm.

Once I figured out my local ROCm setup Ollama was able to run with GPU acceleration no problem. Connecting an OpenWebUI docker instance to my local Ollama server is as easy as a docker run command where you specify the OLLAMA_BASE_URL env var value. This isn't a production setup, but it works nicely for local usages like what the immediate parent is describing.