> Tool calling? The model emits JSON as it autocompletes the prompt, and the json is then parsed out and transformed into an HTTP call.

No. Code assistants determine which tool they can execute to meet a specific goal. They pick the tool, the execute the tool (meaning, they build command line arguments, run the command line app, analyze output, assess outcome) as subtasks.

And they do it as part of ReAct loops. If the tool fails to run, code assistants can troubleshoot problems on the fly and adapt how to call then tool until they reach the goal.

> And they do it as part of ReAct loops. If the tool fails to run, code assistants can troubleshoot problems on the fly and adapt how to call then tool until they reach the goal.

Yeah, but fundamentally all of this is implemented as next token prediction, given the context (which the tool results are).

Honestly, it's pretty amazing how much we can do with next token prediction, but that's essentially all that's happening here.