The best 8-bitter video memory layout (for pixel data) I have seen is in the little known KC85/4:

The display is 320x256 pixels, organized into 40x256 bytes for pixels (8 pixels per byte) and another 40x256 bytes for Speccy-like color attribute bytes (the color blocks are just 8x1 instead of 8x8 pixels), the start address for video memory is 0x8000 with the pixels and colors in different memory banks.

Now the twist: the video memory layout is vertical, e.g. writing consecutive bytes in video memory fills vertical pixel columns.

This layout is perfect for the Z80 with its 16-bit register pairs. To 'compute' a video memory location:

    LD H, 0x80 + column    ; column = 0..39
    LD L, row              ; row = 0..255
...and now you have the address of a pixel- or color-byte in HL.

To blit an 8x8 character just load the start of the font pixels into DE and do 8x unrolled LDI.

Unfortunately the KC85/4 had a slow CPU clock (at 1.77 MHz only half as fast as a Speccy), but it's good enough for stuff like this:

https://floooh.github.io/kcide-sample/kc854.html?file=demo.k...

Vertical layout is awesome for 8 bitters. We tended to use it a lot on the C-64, too.

The c64 had a very awkward native memory layout for bitmaps (8 bytes vertical corresponding to a 8x8 or 4x8 pixel block, then jumps back up, next 8 bytes again vertical but to the right of the first 8x8 pixel block!). Super annoying and the worst of all worlds for coordinate to memory address calculations.

So for demo effects we often used a purely vertical layout by abusing customizable character sets, which are allowed to have 256 fully custom 8x8 pixel characters: arranging the characters in, for example, an 16x16 character grid = a 128 x 128 pixel grid, such that the memory for the character set will effectively result in a vertically oriented mini bitmap.

This also has nice advantages for example for fast pixel filling: if you unrolled an EOR $address; STA $address; EOR $address+1, STA $address+1, etc. etc. loop, you had a pretty fast, almost constant time filler for a bitmap where you only painted top and bottom lines of the area you wanted to have filled - one line to switch on filling, bottom line to switch off again.

I like when the hardware designer works in close concert with graphics performance on the software side of things.