This is a nice idea. A tiny implementation can be way more useful for learning than yet another wrapper around a big model, especially if it keeps the training loop and inference path small enough to read end to end.