It's a big omission to claim "minimalist" but then have no information about code size. Nonetheless, as someone who has written an H.261 through H.263 decoder as a learning exercise, it's good to see more people writing video codecs. Getting high performance may not be straightforward, but the algorithms themselves are well-defined by the standard.
Access to left/top macroblock values is done with direct offsets in memory instead of copying their values to a buffer beforehand.
I made use of this technique too, so I think it's not particularly novel nor non-obvious. The performance-sensitivity of video decoding necessarily means avoiding any extraneous data movement whenever possible.
Also worth noting: H.264 patents have already expired in most of the world: https://meta.wikimedia.org/wiki/Have_the_patents_for_H.264_M...