Hacker News

Ok, I'll bite. Let's assume a modern cutting edge model but even with fairly standard GQA attention, and something obviously bigger than just monosemantic features per neuron.

Based on any reasonable mechanistic interpretability understanding of this model, what's preventing a circuit/feature with polysemanticity from representing a specific error in your code?

---

Do you actually understand ML? Or are you just parroting things you don't quite understand?

Lionga 16 hours ago [ - ]

Polysemantic features in modern transformer architectures (e.g., with grouped-query attention) are not discretely addressable, semantically stable units but superposed, context-dependent activation patterns distributed across layers and attention heads, so there is no principled mechanism by which a single circuit or feature can reliably and specifically encode “a particular code error” in a way that is isolable, causally attributable, and consistently retrievable across inputs.

---

Way to go in showing you want a discussion, good job.

jychang 16 hours ago [ - ]

Nice LLM generated text.

Now go read https://transformer-circuits.pub/2024/scaling-monosemanticit... or https://arxiv.org/abs/2506.19382 to see why that text is outdated. Or read any paper in the entire field of mechanistic interpretability (from the past year or two), really.

Hint: the first paper is titled "Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet" and you can ctrl-f for "We find three different safety-relevant code features: an unsafe code feature 1M/570621 which activates on security vulnerabilities, a code error feature 1M/1013764 which activates on bugs and exceptions"

Who said I want a discussion? I want ignorant people to STOP talking, instead of talking as if they knew everything.

emp17344 8 hours ago [ - ]

Your entire argument is derived from a pseudoscientific field without any peer-reviewed research. Mechanistic interpretability is a joke invented by AI firms to sell chatbots.

jychang an hour ago [ - ]

Lol that's a stupid ass response, especially when half the papers are from universities from China. You think the chinese universities are trying to sell ChatGPT subscriptions? Ridiculous. You're just falling behind in tech knowledge.

And apparently you think peer reviewed papers presented at NeurIPS or other conferences are considered pseudoscience. (For the people not versed in ML, NeurIPS is where the 2017 paper "Attention is All You Need" that started the modern ML revolution was presented)

https://neurips.cc/virtual/2023/poster/72666

https://jmlr.org/beta/papers/v26/23-0058.html

https://proceedings.mlr.press/v267/palumbo25a.html

https://iclr.cc/virtual/2026/poster/10011755

wamiks 16 hours ago [ - ]

Ok, let's chew on that. "reasonable mechanistic interpretability understanding" and "semantic" are carrying a lot of weight. I think nobody understands what's happening in these models -irrespective of narrative building from the pieces. On the macro level, everyone can see simple logical flaws.

jychang 15 hours ago [ - ]

> I think nobody understands what's happening in these models

Quick question, do you know what "Mechanistic Interpretability Researcher" means? Because that would be a fairly bold statement if you were aware of that. Try skimming through this first: https://www.alignmentforum.org/posts/NfFST5Mio7BCAQHPA/an-ex...

> On the macro level, everyone can see simple logical flaws.

Your argument applies to humans as well. Or are you saying humans can't possibly understand bugs in code because they make simple logical flaws as well? Does that mean the existence of the Monty Hall Problem shows that humans cannot actually do math or logical reasoning?

dns_snek 12 hours ago [ - ]

> do you know what "Mechanistic Interpretability Researcher" means? Because that would be a fairly bold statement if you were aware of that.

The mere existence of a research field is not proof of anything except "some people are interested in this". Its certainly doesn't imply that anyone truly understands how LLMs process information, "think", or "reason".

As with all research, people have questions, ideas, theories and some of them will be right but most of them are bound to be wrong.

jychang an hour ago [ - ]

That's a lame typical anti-intellectual argument. You might as well as say all of physics is worthless because nobody truly understands gravity.

Notice I didn't use vague terms like "think" or "reason" and instead used specific terms like "feature/circuit internal representation". You're trying to make a false equivalence of "the hard problem of gravity/reasoning/etc is not solved ... so therefore nobody understands anything" and that's obviously a false leap of logic if you've talked to any physicist or ML researcher or whatever.

That type of response is more typical GED holder who wants to feel intellectually superior so they pull out a "well you don't know anything either" to a scientist.