It wasn't possible on the 386. Ken Shirriff discusses how the Intel 80386's register file was built at https://www.righto.com/2025/05/intel-386-register-circuitry..... Only four of the registers are built to allow 32-, 16- or 8-bit writes. Reads output the entire register onto the bus and the ALU does the appropriate masking. The twist is for the legacy 16-bit upper half-registers - themselves really a legacy of the 8080, and the requirement to be able to directly translate 8080 code opcode-for-opcode. The output of these has to be shifted down 8 bits to be in the right place for the ALU, then these bits have to be selected.

AMD seem to have decided to regularise the instruction set for 64-bit long mode, making all the registers consistently able to operate as 64-bit, 32-bit, 16-bit, and 8-bit, using the lowest bits of each register. This only occurs if using a REX prefix, usually to select one of the 8 additional architectural registers added for 64-bit mode. To achieve this, the bits that are used to select the 'high' part of the legacy 8086 registers in 32- or 16-bit code (and when not using the REX prefix) are used instead to select the lowest 8 bits of the index and pointer registers.

From the "Intel 64 and IA-32 Architectures Software Developer's Manual":

"In 64-bit mode, there are limitations on accessing byte registers. An instruction cannot reference legacy high-bytes (for example: AH, BH, CH, DH) and one of the new byte registers at the same time (for example: the low byte of the RAX register). However, instructions may reference legacy low-bytes (for example: AL, BL, CL, or DL) and new byte registers at the same time (for example: the low byte of the R8 register, or RBP). The architecture enforces this limitation by changing high-byte references (AH, BH, CH, DH) to low byte references (BPL, SPL, DIL, SIL: the low 8 bits for RBP, RSP, RDI, and RSI) for instructions using a REX prefix."

In 64-bit code there is very little reason at all to be using bits 15:8 of a longer register.

This possibly puts another spin on Intel's desire to remove legacy 16- and 32-bit support (termed 'X86S'). It would remove the need to support AH, BH, CH and DH - and therefore some of the complex wiring from the register file to support the shifting. If that's what it currently does.

Actually, looking at Agner Fog's optimisation tables (https://www.agner.org/optimize/instruction_tables.pdf) it appears there is significant extra latency in using AH/BH/CH/DH, which suggests to me that the processor actually implements shifting into and out of the high byte using extra micro-ops.

> In 64-bit code there is very little reason at all to be using bits 15:8 of a longer register.

I disagree: there only exists BSWAP r32 (and by 64 extension BSWAP r64): https://www.felixcloutier.com/x86/bswap

No BSWAP r16 exists. Why? in 32 bit mode, it was not needed, because you could simply use

XCHG r/m8, r8

with, say, cl and ch (to swap the endianness of cx).

In 64 bit mode, you can thus only the endianness of a 16 bit value for the "old" registers ax, cx, dx, bx using one instruction. If you want to swap the 16 bit part of one of the "new" registers, you add least have to do a 32 bit (logical) right shift (SHL) after a BSWAP r32 (EDIT: jstarks pointed out that you could also use ROL r/m16, 8 to do this in one instruction on x86-64). By the way: this solution has a pitfall over BSWAP: BSWAP preserves the flags register, while SHL does not.

What about ROL r/m16, 8?

This would indeed work (and is likely the better solution), but in opposite to BSWAP and XCHG, it also changes flags.