A recent local model I tried is Ministral 3 from a month ago. https://mistral.ai/news/mistral-3

    Vision: Enables the model to analyze images and provide insights based on visual content, in addition to text.
    Multilingual: Supports dozens of languages, including English, French, Spanish, German, Italian, Portuguese, Dutch, Chinese, Japanese, Korean, Arabic.
    ...
    Agentic: Offers best-in-class agentic capabilities with native function calling and JSON outputting.
    Edge-Optimized: Delivers best-in-class performance at a small scale, deployable anywhere.
    Apache 2.0 License: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
    Large Context Window: Supports a 256k context window.