GRAM is another one of those "stupid specific architectures" - same as HRMs, etc. It can sort of contest LLMs at specific puzzles. It demonstrated that much. It's not a general contender with LLMs at LLM tasks.
If you subscribe to things like "there are tasks LLMs are innately bad at due to insufficient depth and lack of recurrent capability", then GRAM might be another signal towards that.
But keep in mind: even ARC-AGIs have their frontiers dominated by LLMs. Even if "innately bad" is true, it clearly doesn't go all the way to "innately incapable".
A 10m param GRAM model beat o3-mini - a model 2000x its size - on Arc AGI...
And then that 10M param GRAM went and got its shit kicked in by Grok 4.20 Blaze It Edition - on the same ARC-AGI battery. I know how that story goes.
It's the pattern with those "stupid specific architectures". Very good at this one thing. But only ever "good for their size", and only to a point.
They don't scale up and they don't generalize. Go far enough on task complexity and LLMs just kill them.
Does that make them useless? As an LLM replacement, yes. In general? Maybe not, I can think of things. But I'm yet to find any paper demonstrating a real world use.