> Not trivial, you would need to do lots of evals and prompt tuning when you switch models.

Couldn’t we just train smaller models to “translate” what the harness user wants to what the worker model expects? I mean, if models understand caveman, it seems like just a small stretch