The other part of the question is exactly when the "build for the capabilities of future models" becomes the present.
Looking at the Mythos benchmarks, it doesn't seem like the models are that close to being truly reliable for agentic tasks.
Is it a year away, or five? That's a big difference in deciding what to build today.