I really don't understand this so I hope you won't mind explaining it. If I would have the type U8ButNotThreeForSomeReason wouldn't that need a check at runtime to make sure you are not assigning 3 to it?

At runtime it depends. If we're using arbitrary outside integers which might be three, we're obliged to check yes, nothing is for free. But perhaps we're mostly or entirely working with numbers we know a priori are never three.

NonZero<T> has a "constructor" named new() which returns Option<NonZero<T>> so that None means nope this value isn't allowed because it's zero. But unwrapping or expecting an Option is constant, so NonZeroI8::new(9).expect("Nine is not zero") will compile and produce a constant that the type system knows isn't zero.

Three in particular does seem like a weird choice, I want Balanced<signed integer> types such as BalancedI8 which is the 8-bit integers including zero, -100 and +100 but crucially not including -128 which is annoying but often not needed. A more general system is envisioned in "Pattern Types". How much more general? Well, I think proponents who want lots of generality need to help deliver that.

Option<U8ButNotThreeForSomeReason> would have a size of 2 bytes (1 for the discriminant, 1 for the value) whereas Option<NonZeroU8> has a size of only 1 byte, thanks to some special sauce in the compiler that you can't use for your own types. This is the only "magic" around NonZero<T> that I know of, though.

You can make an enum, with all 255 values spelled out, and then write lots of boilerplate, whereupon Option<U8ButNotThreeForSomeReason> is also a single byte in stable Rust today, no problem.

That's kind of silly for 255 values, and while I suspect it would work clearly not a reasonable design for 16-bits let alone 32-bits where I suspect the compiler will reject this wholesale.

Another trick you can do, which will also work just fine for bigger types is called the "XOR trick". You store a NonZero<T> but all your adaptor code XORs with your single not-allowed value, in this case 3 and this is fairly cheap on a modern CPU because it's an ALU operation, no memory fetches except that XOR instruction, so often there's no change to bulk instruction throughput. This works because only 3 XOR 3 == 0, other values will all have bits jiggled but remain valid.

Because your type's storage is the same size, you get all the same optimisations and so once again Option<U8ButNotThreeForSomeReason> is a single byte.