What you are describing is a the role of a manager, not a software engineer. Software engineering has very little to do with writing code, but more on architecting at the higher level on what needs to be done. The code is just the executional part. LLMs can code? Ok good. Without a clear architectural pathway / direction, that code is just useless. It's not tech debt. It's just a bunch of random strings. You can argue that Claude code and others do create a plan of attack - but still, it's not at the architectural level, but rather executional level.
To me, architecture starts all the way from the top - even before you write a single line of code, you do the DDD (Domain-Driven Design) and then create a set of rulesets (eg. use the domain name as table prefix) and contexts and then define the functionality w.r.t to that architecture. LLMs can do all this - only if you ask them to explicitly. So, they are pretty useful to brainstorm with, but not autonomously design reliably and push it to production with your eyes closed and support a 100,000 user base. It's a far cry from that.
But sure, you can upsell to management about the vanity metrics like lines of code and get that promotion with LLM. But, it's still not software engineering.
That is why we have SWE bench pro, they test architecture design too, turns out 1000 dollars of tokens outperform 10k dollars of labor in meta design.
That's just not accurate. I haven't studied SWE Bench Pro in detail, so I can't tell you exactly what the flaw is, but SOTA models routinely make bad architectural choices I have to intervene to fix.
You can read the paper here: https://labs.scale.com/papers/swe_bench_pro
TL;DR its very effective as it directly tests model on REAL codebases: "The benchmark is constructed from GPL-style copyleft repositories and private proprietary codebases". The use case is very real.
It doesn't sound to me like this benchmark is attempting to measure architecture design. As far as I see in the paper, they do not evaluate the architectural quality of a task completion, only whether the model is capable of completing it at all.
1000 dollars of subsidized tokens.
Eh.
It's "not software engineering" but neither was what most people writing code did before LLMs.
> Without a clear architectural pathway / direction, that code is just useless. It's not tech debt. It's just a bunch of random strings
This is pretty clearly false. It's a bunch of random strings that you can compile and run to do what you want. It's more akin to a black box. A compiled closed source dependency.
Agreed. I never considered myself an "engineer". Honestly just a regular code monkey. Software Engineer was just my job title. Folks higher up the ladder did engineer software. You know what? It sucked. Was always broken, we were always patching, we never saw around corners. But hey - they software engineered it.