Of course tampering with chunks or nodes in the NNs is a way to study the "spawned" (through gradient descent etc.) configuration and "reverse-engineer the black box" to get "AI transparency".

Anthropic published an important work around one year and a half ago.

> Anthropic published an important work around one year and a half ago

> #Tracing the thoughts of a large language model#

https://www.anthropic.com/research/tracing-thoughts-language...

https://news.ycombinator.com/item?id=43495617 (27 March 2025)