>theses types of adaptations make it easier for the species to go extinct since its so highly specialized.

super interesting. I'm guessing because being highly dependent on a single tactic can make it difficult to adapt or course change ?

As a general rule of dynamical systems, specialization is essentially the exploitation of regularity within your environment.

A highly regular environment can allow for extreme specialization because a system can predict and "expect" certain situations. This leads to much less energy expended, since maintaining stability requires energy that scales with the turbulence of a system.

If gravity is always down, you don't need to spend energy on organs that overcome gravity in other directions. Your circulatory system can use gravity to its advantage; that's why you can't just remain upside down, there's no effective mechanism to pump blood out of your brain.

If a car passes a street exactly every 2 minutes, you don't need to spend lots of time and energy figuring out when to cross. You know once a car crosses, you're good for 2 minutes. If you know the sun comes up around the same time every day, you can allow yourself a deep sleep during the night if you're in a safe place, or bloom only during the day.

Nature exploits such regularities in order to reduce the energy needed to maintain an organism or group, which creates specialization, whether on the scale of the group, on the scale of the individual, etc. This is hierarchical; your cells specialize, your organs specialize, you get training or education to specialize your skills, etc. For example, you might specialize as a software engineer, depending on the regularity of people willing to pay money for you to solve their problems, but AI comes around and suddenly you're out of a job.

The danger is that the more regularities you depend on, the less free energy your body needs to keep around and the less free energy you have to suddenly react and adapt to a new environment. If tomorrow, gravity started being up and not down, most of us would have a bad time. If those regularities are interdependent, as geological/biological cycles tend to be, a few bad conditions could unravel the entire ecosystem.

I should add that another way to consider this constraint is the good regulator theorem: every good regulator of a system must be a model of that system.

Through this lens, you can view an organism as an hierarchical collection of various models of its environment at different scales. As the organism specializes further, its model of the world can simplify, and it is free to explore simplification of some internal structures so that others can become more optimized for the more dynamic parts of the model which may need realtime updates, such as recognizing and tracking fast-moving prey. Various positive feedback loops result, and drive evolution.

It all comes back to the amount of free energy needed to predict the next state. In the information-theoretic sense, regularity lowers the uncertainty of the next relevant state given the system’s model. Specialization is what occurs when the system transforms that lower uncertainty into structure. This saves energy because the system has less ambiguity to resolve in realtime. But in the case of most biological organisms, it takes tens to millions of years for an organism's structure to react to its environment, so the cost of these "cheap reads" (metabolically speaking) on the model is an extraordinarily long, expensive and nondeterministic write process to update the model.

If an organism specifically allocates enough free energy to allowing rapid mutation while still constraining bad evolutionary paths, while metabolically expensive it can lead to organic evolution on the span of literally just decades:

https://www.nationalgeographic.com/animals/article/lizard-ev...

> The new habitat once had its own healthy population of lizards, which were less aggressive than the new implants, Irschick said. The new species wiped out the indigenous lizard populations

> Researchers found that the lizards developed cecal valves—muscles between the large and small intestine—that slowed down food digestion in fermenting chambers, which allowed their bodies to process the vegetation's cellulose into volatile fatty acids.

> The rapid physical evolution also sparked changes in the lizard's social and behavioral structure, he said. For one, the plentiful food sources allowed for easier reproduction and a denser population.

> The lizard also dropped some of its territorial defenses

The lizard not only developed a new organ to help it eat the local vegetation, but it exploited the regularity of dominion to reduce metabolic energy which was previously allocated for modelling an environment exhibiting territorial pressure from competing species.

fascinating and thank you for the great explanation. I was actually going to followup and ask with regards to AI but your response covered it aswell :)

I just expanded more in a reply to my own comment if you're interested!

To technically expand a bit on AI:

Any regularity in an environment which an embedded system can detect but fails to exploit represents an amount of excess free energy in the organism, distributed over itself, its group, its species, etc. depending on what types of systems and scales you choose to model.

There are parallels in information theory: any recognizable patterns/relationships within a compressed message represents excess entropy (the average uncertainty of future states), since that regularity was not exploited during compression and remained in the compressed structure. This means that a perfectly compressed message is functionally indistinguishable from random noise.

You can view weights in an AI model through the same lens: The weights represent "knowledge" of the environment the model has been exposed to. The model is designed to correctly predict future states, and thus "learning" is effectively the compression of a full model of the environment, which is more efficient to traverse than the uncompressed model. A perfectly learned environment minimizes uncertainty and should translate to weights that have no discernible patterns and thus are also functionally indistinguishable from random noise, void of any regularity.

Some level of "compression" of the local environment is required for any stable embedded system, or else the energy required to continually stabilize the system would require an equal amount of energy present as that in all of the universe, because the system would become a perfect copy of the very environment it is embedded within. This is obviously thermodynamically prohibitive.

Hopefully this helps make the relationship between structure, knowledge, information and uncertainty a lot more intuitive.

As a bonus, consider Fabrice Bellard's ts_zip, a great showcase on how knowledge and compression are related.

https://bellard.org/ts_zip/

ts_zip compresses text at record efficiency (at the cost of magnitudes more memory and compute, nothing is free)

Previous attempts at text compression all purely relied on character-level patterns and semantics, syntactical structure, etc., maybe with some heuristic tweaking here and there.

That got us far, but LLMs do something never achieved before, which is to incorporate relationships beyond the surface: not just placement of characters, n-grams or words, but the actual meaning behind them, and large-scale correlations with other words or tokens across vast context windows.

The LLM actually becomes a world model with enough size and training, and thus we are able to use every fact we know about everything to compress text. If we're speaking about biology for example, that constrains the probabilities of what the most likely word might be after a given prefix. Or if the context is constrained to a specific historical period.

All of these regularities can be leveraged, at the cost of a lot of energy, in order to create compressed text that gets arbitrarily close to looking like completely random noise (actually verifying this would require infinite energy though, per Kolmogorov).

The catch is, such systems are specialized and depend on the regularity of cheap, widely-available energy networks and consumer access to cheap compute. Take that away, and it becomes ill-suited vs just using bzip. I mean, even now, bzip is a better choice when considering energy tradeoffs. And ts_zip in particular is specialized to the point of only working with text and not arbitrary byte streams.