> If you take an LLM that makes 10 tool calls in a row for an evaluation, any reduction in unpredictable drift is welcome
why use an ambiguous natural language for a specific technical task? i get that its a cool trick but surely they can come up with another input method by now?