The cursed thing is that RVA23 does basically guarantees that `vle8.v` + `vmv.x.s` on misaligned addresses is fast.

Yeah, that is quite funky; and indeed gcc does that. Relatedly, super-annoying is that `vle64.v` & co could then also make use of that same hardware, but that's not guaranteed. (I suppose there could be awful hardware that does vle8.v via single-byte loads, which wouldn't translate to vle64.v?)