Hacker News

getnormality 15 hours ago [ - ]

> I think most ML people now think of neural-network architectures as being, essentially, choices of tradeoffs that facilitate learning in one context or another when data and compute are in short supply, but not as being fundamental to learning.

Is this a practical viewpoint? Can you remove any of the specific architectural tricks used in Transformers and expect them to work about equally well?

musebox35 15 hours ago [ - ]

I think this question is one of the more concrete and practical ways to attack the problem of understanding transformers. Empirically the current architecture is the best to converge training by gradient descent dynamics. Potentially, a different form might be possible and even beneficial once the core learning task is completed. Also the requirements of iterated and continuous learning might lead to a completely different approach.

etiam 15 hours ago [ - ]

Did you see this one?

https://news.ycombinator.com/item?id=41732853

slickytail 15 hours ago [ - ]

[dead]