> Won’t he eventually ran out of context window?

The "memories" table has a date column which is used to record the data when the information is relevant. The prompt can then be fed just information for today and the next few days - which will always be tiny.

It's possible to save "memories" that are always included in the prompt, but even those will add up to not a lot of tokens over time.

> Won’t this be expensive when using hosted solutions?

You may be under-estimating how absurdly cheap hosted LLMs are these days. Most prompts against most models cost a fraction of a single cent, even for tens of thousands of tokens. Play around with my LLM pricing calculator for an illustration of that: https://tools.simonwillison.net/llm-prices

> If one were to build this locally, can Vector DB similarity search or a hybrid combined with fulltext search be used to achieve this?

Geoffrey's design is so simple it doesn't even need search - all it does is dump in context that's been stamped with a date, and there are so few tokens there's no need for FTS or vector search. If you wanted to build something more sophisticated you could absolutely use those. SQLite has surprisingly capable FTS built in and there are extensions like https://github.com/asg017/sqlite-vec for doing things with vectors.

SQLite + sqlite-vec/DuckDB for small agents is going to be a very powerful combination.

Do we even need to think of these as agents, or will the agentic frameworks move towrads being a call_llm() sql function?

Just want to say I appreciate your posts here on HN and on your blog about AI/LLMs.