Another big reason is that we use familiar terms, models to describe things. Often leading to very different things, being described in much less different ways. Also the simplified models often used for high level/very abstract explanations have the tendency that extrapolations you do based on them often end up wrong, in a subtle but fundamental way(* 1).
E.g. graphs, matrices, linear algebra systems, digital photos (and much more) can be all perfectly transformed into each other, or in other ways this are all different ways to look at the same data. (This is also not just hypothetical * 2)
As a side effect saying things are similar "because they are just matrix algorithms" is meaningless, because most things are "just a matrix algorithm". (It's also meaningful for the same reason, as it means you can transform many problems into such with well understood solutions.)
And the high level abstraction of "encoder -> state -> decoder" structure is another of such "too generic/meaningless" things. As state can be anything and encoding is just "process input to generate state" and decoder is just "generate output from state" wen can model most algorithms with that. Like the identity function is now `encode(input): state=input; decode(state): output=state` (and indeed as long as state is large enough wrt. the input (or unbound) you can train encode decoder network to do exactly that, as meaningless as that seems.
Similar you can treat everything as everything you have in cryptography itself: All of hash, PRNG, stream cipher can be easily(kinda, * 3) build from another and like most algorithms in existence they can be formulated to "consume data and then produce output", i.e. as encoder decoder pattern ;)
So IMHO it's mostly an combination of observer bias due to how we like to model things (in a very high level POV). And a construction bias from that and what computer can compute well (in a slightly less high level POV when looking at somewhat "arbitrary choices").
I know some of the examples here might sound a bit ridiculous, but it's some of the most important insights in CS:
- a lot of algorithms can be transformed (or modeled as) a lot of other algorithms, take advantage of it
- just because something looks alike, it neither means it is alike nor that it being alike has any deeper meaning (e.g. two graphs might look alike, but only for the subset of sample data you happen to use to plot them)
---
(* 1): Anything related to quantum physics seems be especially bad affected by it.
(* 2): you navigation system might navigate by mapping a navigation graph to a matrix and then compute on the matrix, where steps involved might treat it as a linear algebra system solved using some fast approximation.
(* 3): At least some of the conversion directions are easy. Some are a bit less intuitive. Also this assumes you either have perfect properties on the used building block or don't have to estimate how the imperfect properties map to properties of the created construct... Oh and naturally you can use some simple operations in the transformation (add, xor) as long as they are used in a straight forward way. E.g. PRNG(seed, offset) = HASH(encode(seed, offset)), with encode being a bijektive pairing function (e.g. `bytes_128bit_le(seed)+bytes_64bit_le(offset)`). E.g. for encryption you chunk, xor every chunk with a single time used key and generate the key using HASH(encode(key, nonce, chunk_idx)), or PRNG(encode(key, nonce), chunk_idx). That is how AES-CTR/AES-GCM does encryption. It (roughly) uses XOR(AES(key, encode(nonce, offset)), chunk). (Yes AES is more used like a hash then a block cipher in most modern AES ciphers...)