I'm building a local medical AI app for Mac, recently published on the App Store. https://apple.co/4mlYANu
It uses medgemma 4B for analyzing medical images and generating diagnostic insights and reports, ofc must be used by caution, its not for real diagnostics, can be something to have another view maybe.
Currently, it supports chat and report generation, but I'm stuck on what other features to add beyond these. Also experimenting with integrating the 27B model, even with 4bit quantization, looks better than 4b.
Definitely experiment with quantization. In my experience, a rough rule is to just use the biggest model you can at the smallest quantization.