I... wow, you made an LLM that can actually tell jokes?

With 9M params it just repeats the joke from a training dataset.