>1 might be ok but introduces a bunch of opcode space waste.
I wouldn't call it "waste". Moreover, it's fine for misaligned instructions to use a wider encoding or be less rich than their aligned counterparts. For example, they may not have the immediate offset or have a shorter one. One fun potential possibility is to encode the misaligned variant into aligned instructions using the immediate offset with all bits set to one, as a side effect it also would make the offset fully symmetric.
Of course that'd result in entirely-avoidable slowdown for the potentially-misaligned ops. Perhaps fine for a program that doesn't use them frequently, but quite bad for ones that need misaligned ops everywhere.
In terms of correctness, there's also the possibility of partially-misaligned ops (e.g. an 8B load with 4B alignment, loading two adjacent int32_t fields) so you're not handling everything with correct faults anyways.