Post author here.
Yes, it works really well.
1) The latest models are radically better at this. We noticed a massive improvement in quality starting with Sonnet 4.5
2) The context issue is real. We solve this by using sub agents that read through logs and return only relevant bits to the parent agent’s context
So you’re not getting alerts at 2 am from hallucinations?
I would be very interested in reading about this kind of orchestration and filtering than data acquisition if you have the energy for another post :)
We started writing very recently: https://www.mendral.com/blog - there is a another post we made yesterday about the overall architecture. And we have a long list of things we're planning to write about in more details.
Taking good note of your comment :)
We've actually started to gather metrics this week to write that exact post :) Coming soon!
[dead]
[dead]