If your definition of "competitive" is loose enough, you can write your own Markov chain in an evening. Transformer models rely on a lot of prior art that has to be learned incrementally.
If your definition of "competitive" is loose enough, you can write your own Markov chain in an evening. Transformer models rely on a lot of prior art that has to be learned incrementally.
Not that loose lol.
I’m thinking it’s still llama / dense decoder only transformer.