I... wow, you made an LLM that can actually tell jokes?
With 9M params it just repeats the joke from a training dataset.
With 9M params it just repeats the joke from a training dataset.