Of course tampering with chunks or nodes in the NNs is a way to study the "spawned" (through gradient descent etc.) configuration and "reverse-engineer the black box" to get "AI transparency".
Anthropic published an important work around one year and a half ago.
> Anthropic published an important work around one year and a half ago
> #Tracing the thoughts of a large language model#
https://www.anthropic.com/research/tracing-thoughts-language...
https://news.ycombinator.com/item?id=43495617 (27 March 2025)