I find Ollama + TypingMind (or similar interface) to work well for me. As for which models, I think this is changing from one month to the next (perhaps not quite that fast). We are in that kind of period. You'll need to make sure the model layers fit in VRAM.