I would not call PA-RISC boring. Already at launch there was no doubt that it is a better ISA than SPARC or MIPS, and later it was improved. At the time when PA-RISC 2.0 was replaced by Itanium it was not at all clear which of the 2 ISAs is better. The later failures to design high-performance Itanium CPUs make plausible that if HP would have kept PA-RISC 2.0 they might have had more competitive CPUs than with Itanium.
SPARC (formerly called Berkeley RISC) and MIPS were pioneers that experimented with various features or lack of features, but they were inferior from many points of view to the earlier IBM 801.
The RISC ISAs developed later, including ARM, HP PA-RISC and IBM POWER, have avoided some of the mistakes of SPARC and MIPS, while also taking some features from IBM 801 (e.g. its addressing modes), so they were better.
ISAs fail to gain traction when the sufficiently smart compilers don't eventuate.
The x86-64 is a dog's breakfast of features. But due to its widespread use, compiler writers make the effort to create compilers that optimize for its quirks.
Itanium hardware designers were expecting the compiler writers to cater for its unique design. Intel is a semi company. As good as some of their compilers are, internally they invested more in their biggest seller and the Itanium never got the level of support that was anticipated at the outset.
I am a firm believer that if AMD wasn't in the position to be able to come up with AMD64 architecture, eventually those Itanium issues would have been sorted out, Windows XP was already there and there was no other way for 64 bit going forward.
It has never happened that a compiler was able to do static scheduling of general purpose instructions over the long term.
Every CPU changes the cycles it takes for many instructions, adds new instructions etc.
Out of order execution is a huge dividing line in performance for a reason. The CPU itself needs to figure these things out to minimize memory latency, cache latency, pipelining, prefetching and all that stuff.
I don't know anything about Itanium in particular, but AMD's NPU uses a VLIW architecture and they had to break backwards compatibility in the ISA for the second generation NPU (XDNA2) to get better performance.
I mean "boring" in the sense that its ISA was relatively straightforward, no performance-entangling kinks like delay slots, a good set of typical non-windowed GPRs, no wild or exotic operations. And POWER/PowerPC and PA-RISC weren't a lot later than SPARC or MIPS, either.