vision runs just fine locally for most usecases, so it's really just a skill to call that Ollama instance