I would argue that they are not the same, but there is a symmetry between them.
The central problem of cryptology is to prevent inference about either the key or the plaintext, despite the requirement to be able to reconstruct the plaintext from the ciphertext+key. So ciphers have to almost perfectly mix information.
Machine learning is possible because in the absence of perfect mixing, inference is possible (given many input output pairs), even if the information is many decibels down below the noise. So the information about what parameters need changing is present in the output despite many subsequent layers of processing. This means that a lot of mixing can be tolerated, and it's needed because you don't know in advance what the data flow should look like in detail, so the NN has to provide as many options as possible.
> So ciphers have to almost perfectly mix information.
yesn't
most modern stream ciphers basically use XOR for encryption with one time use keys per chunk (like. AES-CTR, AES-GCM, AEGIS, ChaCha20, etc.)
no mixing of bites is needed there just high entropy uniformly distributed one time use keys being generated per block, i.e. you need a "good enough" PRNG
practically the easiest way to get them is by doing something similar to a hash on the state(key, nonce, index) in some form. Which is likely done by mixing up information, hence the yes in yesn't.
but any PRNG with sufficient properties would do, and there probably are some which use some clever math which you probably wouldn't describe as "mix information".
It's just "shuffling bits" + "bad one way function" is often "sufficient" secure and faster then alternatives.
And historical many ciphers (e.g. AES block cipher) come from a time where we didn't yet had grate frameworks/know-how about how to assess security properties and write cryptography. Hence why they did all kinds of ways of mixing information and chaining which sometimes is quite.. arbitrary.
It might be easy to assume AES stuck around as it's "just grate" but that is plain wrong. It stuck around because it spread everywhere (including standards/requirements) before we knew how to best do things and due to that then ended up with hardware acceleration support on most chips. But no one would create it that way anymore (it is prone to side channel attacks if you don't have HW accl. xor use bitslicing trickery which makes it slow). But due to everything having AES hw acceleration it became a very fast building block. Hence why most modern cipher still use (part of) it and even some hashes and other algorithms use it... It's another example of how a "good enough" and wide spread technology often wins, not the best.
Mmm. It's true that stream cyphers do not need to mix information (of the plaintext) and block cyphers do. I'm not sure I fully agree with your comment, but I'm also not quite sure what you intend to say and it's late at night here. I'd suggest that anyone reading the above make sure they fully understand the different security properties of stream cyphers Vs block cyphers, before dismissing the latter.
ChaCha20 got discovered using a computer search testing out resistance to certain attacks. Hence, the architecture came first and then the parameters came next. Any link with NN gradient descent? It would likely be an abstract one.
I don't know how true this is? Salsa20 seems like pretty standard ARX design that builds a hash function in counter mode; there's a detailed paper explaining Bernstein's decisions.