This advice will be very dated when inference gets an order of magnitude faster. And it will happen—it’s classic tech. Probably will even follow moores law or something.
Wait until that 8 minute inference is only a handful of seconds and that is when things get real wild and crazy. Because if the time inference takes isn’t a bottleneck… then iteration is cheap.
Yea, I think it will be totally useless to switch at that level and instead it will be about reviewing the work more effectively. I think I would believe in the more "autonomous Claude" systems in that world.
It will be crazy. Because the cost of “failure” will be dramatically lower, meaning these things can sometimes just throw educated darts at the wall until a solution is found. It’s way too slow to do that kind of thing now.
(Presumably cost per token will be dramatically lower as well)