Do you also like what it costs you to browse the web via an LLM potentially swallowing millions of tokens per minutes ?

This seems like a suitable job for a small language model. Bit biased since I just read this paper[0]

[0] https://research.nvidia.com/labs/lpr/slm-agents/