This is about folds, not amino acids - even if you used a larger alphabet of residues, I somehow doubt that you would get many more folds.

Thinking more about the question of protein _length_ - I'm also not convinced that longer proteins (more than say 750aa) would produce more novel folds. Larger proteins tend to be multi-domain; that is, a longer chain will fold into multiple compact domains, each one a separate fold.

I suppose there could be 'megafolds' out there in fold space, beyond 1000aa - like a 12-bladed beta propeller, or a beta-helix with alpha helices on the outside or some other wacky thing. Whether that would substantially increase the numbers of total folds, I doubt, but that is of course a guess.

(ref - https://pmc.ncbi.nlm.nih.gov/articles/PMC10251718/ for protein lengths)

Amino acid (sequence) defines the folds.

And really? Just any random sequence gets you a new fold. I mean, it won't be very useful if you pick a random one, but it'll work and be a new one.

I think this is just an artifact of natural selection basing new proteins on existing ones, not an actual useful ("rational" if you can call natural selection rational) selection limit. I don't think that if you designed proteins from first principles you'd see this limitation in your results.

A random sequence may not fold at all! I seem to remember a paper that tried this, creating a bunch of random proteins, and checking how much structure they had - I think they were helical bundles, but don't quote me.

The nice thing about stable folds, is that 'nearby' sequences in sequence space - as in, point mutations - are the same fold. If each sequence had a completely different fold, then mutation would be much more destructive. Surprisingly, however, sequences that are far apart in sequence space can also adopt the same fold (convergent evolution).

This reminds me of structural studies in proteins encoded by de novo genes in eukaryotes. They are usually either intrinsically disordered or adopt a molten-globule-like state.

Yes, I was watching a video about that the other day - the 'dark proteome' or the 'ghost proteome' or similar.

But if you look at actual proteins where the function is pretty direct, you see ... a total mess. For example, the actual light catcher for photosynthesis, chlorophyll, you see rather suboptimal architecture. There is a central magnesium ion, and the entire rest of the protein is just there to keep it where it is. The only function, in other words, is to create an ion trap a a specific voltage. That's what that massive structure is there for. That's the only reason it's there.

Note: the rest of the protein being so massive has the huge problem that it results in the chlorophyll protein being toxic (even to plants). Several angles of the protein reflect the light ... away from the energy collector (it has sections that are like putting a mirror above a solar panel). Also: it's extremely INefficient. Inefficiency gets solved "the DNA way" (or should I say the Zapp Brannigan way): it's efficiency sucks, but if I just use very extremely large armies of chloroplasts I can compensate for the inefficiency by stacking them ... This sounds totally insane but yes, it works. Oh and the exact right amount of inefficiency can warm op the plant, protecting it (a little bit) from ice ages.

Now I have my suspicions on why chlorophyll + chloroplasts won (it's not actually the only photosynthesis protein or system): it's because by tuning a few amino acids you can change the depth of the ion trap, and so switch to different metals to capture, changing the color (which plants do, even just to have a particular color). It's pretty easy to accidentally adapt to either different metals or different solar frequencies (ie. using natural selection). Plus there was no need to design chlorophyll: plants "stole" the design from bacteria. So it was incredibly cheap in terms of how much computation (ie. generations of plants) had to die to make it. Of course, for the place it was stolen from the length of the protein was a very important factor so the biggest of chlorophyll's advantages (1 big protein, 10 functions that would have required 5x more space in DNA with small proteins) don't actually matter to plants. So why did it win? It was on sale!

So it works. But there has got to be a simpler/better/non-toxic way to create an ion trap using proteins and make plants work better ... I get that part of the problem is that I'm an engineer, a scientist. If one needs a design to catch energy and warm up a plant, I'd expect to create one thing for catching energy, and one plant warmer, both efficient. So there's an expectation problem. But a single mechanism to mostly randomly warm plants and catch energy at the cost of absurd inefficiency (both in warming and in energy production) ... is just not a sane way to go about this problem.

A small nitpick: chlorophyll is the pigment, photosystems are the protein complexes containing it.

> So it works. But there has got to be a simpler/better/non-toxic way to create an ion trap using proteins and make plants work better ...

I remember reading about designed minimal photosystem-like systems. I cannot find the actual paper now, though.