I see that you're using gemma3n which is a 4B parameter model and utilizes around 3GB RAM. How do you handle loading/offloading the model into the RAM? Or is it always in the memory as long as the app is running?
I see that you're using gemma3n which is a 4B parameter model and utilizes around 3GB RAM. How do you handle loading/offloading the model into the RAM? Or is it always in the memory as long as the app is running?
I can see this as a major issue. If you start using this for grammar checking, you're basically subtracting 3GB of RAM from your system.