I only have 8gb of vram to work with currently, but I'm running OpenWebUI as a frontend to ollamma and I have a very easy time loading up multiple models and letting them duke it out either at the same time or in a round robin.
You can even keep track of the quality of the answers over time to help guide your choice.
Be aware of the the recent license change of "Open"WebUI. It is no longer open source.
Thanks, somehow I missed that.
https://docs.openwebui.com/license/
AMD 6700XT owner here (12Gb VRAM) - Can confirm.
Once I figured out my local ROCm setup Ollama was able to run with GPU acceleration no problem. Connecting an OpenWebUI docker instance to my local Ollama server is as easy as a docker run command where you specify the OLLAMA_BASE_URL env var value. This isn't a production setup, but it works nicely for local usages like what the immediate parent is describing.