Hacker News

bigyabai 8 days ago [ - ]

If your definition of "competitive" is loose enough, you can write your own Markov chain in an evening. Transformer models rely on a lot of prior art that has to be learned incrementally.

jasonjmcghee 8 days ago [ - ]

Not that loose lol.

I’m thinking it’s still llama / dense decoder only transformer.