Option<U8ButNotThreeForSomeReason> would have a size of 2 bytes (1 for the discriminant, 1 for the value) whereas Option<NonZeroU8> has a size of only 1 byte, thanks to some special sauce in the compiler that you can't use for your own types. This is the only "magic" around NonZero<T> that I know of, though.
You can make an enum, with all 255 values spelled out, and then write lots of boilerplate, whereupon Option<U8ButNotThreeForSomeReason> is also a single byte in stable Rust today, no problem.
That's kind of silly for 255 values, and while I suspect it would work clearly not a reasonable design for 16-bits let alone 32-bits where I suspect the compiler will reject this wholesale.
Another trick you can do, which will also work just fine for bigger types is called the "XOR trick". You store a NonZero<T> but all your adaptor code XORs with your single not-allowed value, in this case 3 and this is fairly cheap on a modern CPU because it's an ALU operation, no memory fetches except that XOR instruction, so often there's no change to bulk instruction throughput. This works because only 3 XOR 3 == 0, other values will all have bits jiggled but remain valid.
Because your type's storage is the same size, you get all the same optimisations and so once again Option<U8ButNotThreeForSomeReason> is a single byte.