It bears repeating that modern LLMs are incredibly capable, and relentless, at solving problems that have a verification test suite. It seems like this problem did (at least for some finite subset of n)!

This result, by itself, does not generalize to open-ended problems, though, whether in business or in research in general. Discovering the specification to build is often the majority of the battle. LLMs aren't bad at this, per se, but they're nowhere near as reliably groundbreaking as they are on verifiable problems.

Yes, this is where I just cannot imagine completely AI-driven software development of anything novel and complicated without extensive human input. I'm currently working in a space where none of our data models are particularly complex, but the trick is all in defining the rules for how things should work.

Our actual software implementation is usually pretty simple; often writing up the design spec takes significantly longer than building the software, because the software isn't the hard part - the requirements are. I suspect the same folks who are terrible at describing their problems are going to need help from expert folks who are somewhere between SWE, product manager, and interaction designer.