I find this interesting:
…no model performed better with an Agent, a couple performed worse, and time/tokens/costs were consistently much higher with the agent in the loop, for some reason.
Somone should build a harness where features are only added if they are proven net positive to outcomes.