One thing I keep wondering about with agents is what happens once they start running real workflows.
Right now a lot of the focus is on getting them to execute tasks reliably. But the harder problem might be reconstructing what actually happened later, why a decision was made, what context the agent had, and how you audit or override it if something goes wrong.
Feels like autonomy is moving faster than the accountability layer around it.