For those interested, Wired ran a backstory about the Attention is All You Need paper 2 years ago: https://www.wired.com/story/eight-google-employees-invented-...
It gives some context on the contributions of each of the authors. About Shazeer, from the article:
Shazeer’s joining the group was critical. “These theoretical or intuitive mechanisms, like self-attention, always require very careful implementation, often by a small number of experienced ‘magicians,’ to even show any signs of life,” says Uszkoreit. Shazeer began to work his sorcery right away. He decided to write his own version of the transformer team’s code. “I took the basic idea and made the thing up myself,” he says. Occasionally he asked Kaiser questions, but mostly, he says, he “just acted on it for a while and came back and said, ‘Look, it works.’” Using what team members would later describe with words like “magic” and “alchemy” and “bells and whistles,” he had taken the system to a new level.
> Using what team members would later describe with words like “magic” and “alchemy” and “bells and whistles,”
Ok, these peopl have all gotten extensive training on how to hype for the non-technical crowd without saying anything of substance.
As a hacker, I kinda like naom's code. I was had to implement a TC MoE kernel, and stumbled upon his code from [tensor2tensor](https://github.com/tensorflow/tensor2tensor/blob/master/tens...) and i think "alchemy" is justified. Dude writes some beautiful kernels.
He also saw LLM would replace search before anyone else, and that is something to look at the Lamda or GPT-1's output and think: yeah this will answer all of our questions one day.
There's no doubt about Noam's abilities. But I read through that code, and struggle to see its 'magic' or 'alchemy'. Can you elaborate what you find especially good about that code? (You may assume GPU kernel programming knowledge on my end.)
To me the magic Noam moment was when he came to my team and said "that cluster has a bad node in it, but this other one doesn't" and we had to spend like a week tracking down a single bad processor out of thousands.
Unrelated to the particular code above. There's a difference between writing code about or adjacent to a proven idea vs writing code in uncharted territory. I suspect that is what happened here. It's the same thing with say music and art. A lot of people today can play Chuck Berry.
It's a good point. Though I do wonder if the magic he casted was more at the conceptual level (intense belief on a set of primitives that ought to work) more than the code itself. Even by 2018's standards, the Tensorflow code above doesn't really look that impressive. It's hard to judge based on those past standards, though. But, wonder if somebody who knows more than me can elaborate.
Are you saying that with today's hindsight, or would you be saying that at the time of its creation?
Also, evaluating complicated functions with numerical stability and automatic differentiation is hard.
"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."
https://news.ycombinator.com/newsguidelines.html
Does that apply to quotes from an article? They seemed to be criticizing a second or third degree source for being PR, which feels fair.
Yes, in the sense that if there's nothing interesting to say about a quote then there's no reason to copy it into the thread.
[dead]
[flagged]
It's not a question of painful - I'm happy to "admit" what's true, as best I can, and not what's not true. Let's see if we can sort that out a bit in the present case.
HN is certainly curated - I've been "admitting" that since the day I got outed as a mod here:
https://news.ycombinator.com/item?id=7494621 (March 2014)
https://news.ycombinator.com/item?id=7507229 (April 2014)
https://news.ycombinator.com/item?id=7962942 (June 2014)
https://news.ycombinator.com/item?id=8569117 (Nov 2014)
https://news.ycombinator.com/item?id=15556105 (Oct 2017)
But we try hard to do the curation by principle, not by personal whim. What principles? Really there's just one: intellectual curiosity—we try to feature what enhances that and dampen what degrades it [1]. From that starting point, though, you can derive lots of other principles. Probably the most important is that snark and indignation are bad for HN (especially in combination!) because they drown out curious conversation. That's all that you need to see why I posted that reply to the GP; no personal preference required.
[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...
The current case still seems very heavy on personal preference. Principles application is subjective as we are all human. I found the comment as interesting as the quote it is answering.
It does seem more of a borderline case to me when I reread it, too.
> snark and indignation
These are preference-based but you're pretending they're objective. I find _your_ comments to be full of snark and indignation more than any you respond to, but of course you won't agree. (But because you don't agree, that makes me objectively wrong, I know.)
"Tonal arguments are ways of, frankly, policing working class ways of communication, and covering them in elite preferences." - someone smarter than the average HN commenter.
My view is not that these things are objective—they're subject to interpretation and different people interpret differently. It's not like temperature or length where one has a thermometer or a measuring tape to determine the value.
But that doesn't mean they're arbitrary. There's such a thing as fuzzy quasi-consensus, and I can demonstrate it easily: if the moderation calls we make were not fuzzy quasi-consensuses, the community would storm the barricades and rip us a new one. There's no pastime that internet communities love better than piling on when the mods are wrong.
That doesn't mean we're always right—not by a long shot—but we're usually (maybe 70%?) in the ballpark of the median reader's interpretation—not because we're geniuses at reading the hivemind but because we've been trained to get it mostly-quasi-somewhat-ok through the sheer pain of community backlash. Operant conditioning is a hell of a drug.
So while the personal views and preferences of the mods have some effect (how could they not?), it's not the high order bit. The high order bit is the prospective community response, because we fear what happens when we get that wrong, and I assure you we have reason to fear it.
As for whether my comments are also rife with snark and indignation: I think you have a point there! And I'd be happy to discuss it further, if you want to.
I’ve never seen any of the moderators here be snarky or indignant to anyone.
Do you have any specific examples of where dang or another moderator posted in that way?
Literally every sentence dang has ever written on this site, that I've seen, is snarky.
> Yes, in the sense that if there's nothing interesting to say about a quote then there's no reason to copy it into the thread.
This one was both snarky and indignant. Indignant that anyone would post something dang doesn't like on his site, and snarky that the the original commenter hadn't conformed to norms of what positions are acceptable to utter here.
Snark definition:
“an attitude or expression of mocking irreverence and sarcasm”
Source:
https://www.merriam-webster.com/dictionary/snark
Indignant definition:
“feeling or showing anger because of something unjust or unworthy”
Source:
https://www.merriam-webster.com/dictionary/indignant
So just based on these definitions that comment seems neither snarky nor indignant? There’s no anger because he’s a moderator calmly stating one of the rules of the site. And there don’t seem to be anything mocking, irreverent or sarcastic about him calmly stating the site guidelines?
Such a good quote, defending probably the last person on this Earth anyone would call "working class", including himself.
Bourdieu ?
[dead]
[flagged]
[dead]
The "bells and whistles" label sounds more dismissive / perjorative to me. An odd, and not a particularly nice, thing to say. Makes me wonder how the "magic" and "alchemy" terms were intended in this case, also.
If I use the words alchemy and magic about a piece of code, those are not flattering words.