One of the things I mentioned in the post:
> Local models can quickly read and explain codebases, even if they can't write them - this is a superpower
Might have been buried lower down.
And yes latency of local on a fast card with MTP enabled can be blistering 130-200 tokens per second sustained at full context on Q5. About 100+ on Q8.
On tool calling
> Agent Skills can help immensely - we had a local agent set up Slicer completely from scratch on a new mini PC. It even gave feedback on the usability of slicer CLI which we integrated
There's a link to a post showing some examples.
Occasionally, we'll also have the local model _review_ the changes of GPT/Opus - and it can return duds, but also insights the larger model overlooked, or was too intelligent to pick out.
So yes - absolutely blazing fast at understanding a codebase, very good at running skills "cheaply" and could be used with larger models as a "helper" / sub-agent.