Hacker News

if you’re building on LLMs you gotta have an eval and prompt iteration pipeline, and you ought to be evaling every model release — your competitors will do this, and your users will want the latest and greatest (for frontier tasks) and the cheapest/fastest. So you should already be paying this cost anyways. i guess it depends on your team size and scale but not building this muscle seems like not having continuous delivery for regular code or even like not having tests and ci to merge to main.