Regarding misaligned reads, IIRC only x86 hides non-aligned memory access. It's still slower than aligned reads. Other processors just fault, so it would make sense to do the same on riscv.
The problem is decades of software being written on a chip that from the outside appears not to care.
ARM Cortex-A cores also allow unaligned access (MCU cores don't though, and older ARM is weird). There's perhaps a hint if the two most popular CPU architectures have ended up in the forgiving approach to unaligned access, rather than the penalising approach of raising an interrupt.
Yes, unaligned loads/stores are a niche feature that has huge implications in processor design - loads across cache-lines with different residency, pages that fault etc.
This is the classic conundrum of legacy system redesign - if customers keep demanding every feature of the old system be present, and work the exact same then the new system will take on the baggage it was designed to get rid of.
The new implementation will be slow and buggy by this standard and nobody will use it.
Unaligned load/store is crucial for zero-copy handling of mmaped data, network streams and all other kinds of space-optimized data structures.
If the CPU doesn't do it software must make many tiny conditional copies which is bad for branch prediction.
This sucks double when you have variable length vector operations... IMO fast unaligned memory accesses should have been mandatory without exceptions for all application-level profiles and everything with vector.
I think you can do this fairly efficiently with SSE for x86 - SSE/AVX has shift and shuffle. Encoding/Decoding packed data might even be faster this way.
I'm not familiar with RISC-V but from what I've seen here, they're also trying to solve this similarly with vector or bit extraction instructions.
Yes because unaligned load is no problem with SSE/AVX. On my RISC-V OrangePi unaligned vector loads beyond byte-granularity fault so you have to take extra care.
AVX shift and shuffle is mostly limited to 128 bits unfortunately for historical reasons (even for 256-bit instructions) and hardware support for AVX512/AVX10 where they fixed that is a complete mess so it's hard to rely on when you care about backwards compatibility for consumer devices, e.g. in game development.
RISC-V vector has excellent mask/shuffle/permute but the performance in real silicon can be... questionable. See the timings for vrgather here for example: https://camel-cdr.github.io/rvv-bench-results/spacemit_a100/...
For working with packed data structures where fields are irregular/non-predictable/dependent on previous fields etc. unaligned load/store is a godsend. Last time I worked on a custom DB engine that used these patterns the generated x86 code was so much nicer than the one for our embedded ARM cores.
On modern CPUs, it used not to be something to care about in the past across 8, 16, 32 bit generations, outside RISC.
PDP-11, m68k – to name a few, did not allow misaligned access to anything that was not a byte.
Neither are RISC nor modern.
In regards to 68000 I don't remember, only used it during demoscene coding parties when allowed to touch Amiga from my friends.
I have only seen PDP-11 Assembly snippets in UNIX related books, wasn't aware of its alignment requirements.
PDP-11 was a major source of inspiration for m68k architecture designers. The influence can be seen in multiple places, starting from the orthogonal ISA design down to instruction mnemonics.
It is quite likely that not allowing the misaligned access was also influenced by PDP-11.