1.54GB model? You can run this on a raspberry pi.

Performance of LLM inference consists of two independent metrics - prompt processing (compute intensive) and token generation (bandwidth intensive). For autocomplete with 1.5B you can get away with abysmal 10 t/s token generation performance, but you'd want as fast as possible prompt processing, pi in incapable of.

if you mean on the new ai hat with npu and integrated 8gb memory, maybe.