Can anyone give any tips for getting something that runs fairly fast under ollama? It doesn't have to be very intelligent.
When I tried gpt-oss and qwen using ollama on an M2 Mac the main problem was that they were extremely slow. But I did have a need for a free local model.
How much ram are you running with? Qwen3 and gpt-oss:20b punch a good bit above their weight. Personally use it for small agents.
Use llama.cpp? I get 250 toks/sec on gpt-oss using a 4090, not sure about the mac speeds