ARMv9 also has read-modify-write memory instructions, so does any usable RISC-V implementation. It turns out that LL-SC (which would avoid those) does not permit efficient implementations. (LL-SC does look like a rather desperate attempt to preserve a pure RISC register-register architecture.)
I get the impression people believe that instruction density does not matter much in practice (at least for large cores). For example, x86-64 compilers generally prefer the longer VEX encoding (even in contexts where it does not help to avoid moves or transition penalties), or do not implement passes to avoid redundant REX prefixes.
LL/SC is performant, it just doesn't scale to high core counts.
The VEX encoding is actually only rarely longer than the legacy one, and frequently it is shorter.