interesting but ... why not debug the actual code that is invoking the API.. like break point at the right place, edit state, step over, resume... it seems that the toolchain is a lot more mature and it will fit right into the specific programming environment that is targeted
Because this is way easier. It's effectively a printf debugger and editor you can just slot in the middle of the data stream.
You can still use normal debuggers for the code path, but we found it really valuable to isolate and inspect the agent data stream itself: the exact prompts, model outputs, tool inputs/outputs, and how that impacts cost, time, and behavior over long runs. That visibility has been a big lever for improving overall product quality for some of the deeper agentic experiences we are building. Ability to modify and change models has been useful too.