Thanks for the great questions! Here's how we're tackling these:
1. Context growth management:
We avoid full context rewrites entirely, they cause context collapse where the LLM compresses away important details. Instead, we use delta updates as the foundation and are exploring:
- Semantic de-duplication to remove redundancy - Keeping deltas as the source of truth with optional summarization layers on top - Pre-filtering the playbook to feed the model a more focused version, with tooling to let it explore further when needed
Delta updates remain our core principle, but we're actively working on preventing context bloat as playbooks scale.
2. Role separation:
Our library lets you select different models for each role, with prompts specifically tailored to each function. So far we've mostly used the same model for all three roles, but we're actively exploring model mixing as a promising direction.
3. Success signals:
The system shows strong self-assessment capabilities using execution feedback (code pass/fail, API responses, and model interactions with the environment). However, you're right that ambiguous domains are trickier, this is still an open challenge for us. Our vision is to pre-seed domain knowledge through curated playbooks or training samples, then let models self-explore and discover their own success patterns over time.
What I'm curious about:
- What feedback signals work for your Django agent?
- How do you handle planner-executor coordination overhead?
- Have you hit similar brevity bias issues?
Would love to continue this conversation on Discord if you're interested: https://discord.com/invite/mqCqH7sTyK