If you listen interview with Francois it'll be clear to you that "vision" in the way you refer it, has very little do to with solving ARC.

And more to do with "fluid, adaptable intelligence, that learns on the fly"

That's fair. I care about the end result.

The problem is about taking information in 2D/3D space and solving the problem. Humans solve these things through vision. LLMs or AI can do it using another algorithm and internal representation that's way better.

I spent a long time thinking about how to solve the ARC AGI 2 puzzles "if I were an LLM" and I just couldn't think of a non-hacky way.

People who're blind use braille or touch to extract 2D/3D information. I don't know how blind people represent 2D/3D info once it's in their brain.

>AI can do it using another algorithm and internal representation that's way better

AI famously needs a boat load of energy and computation to work. How would you describe that as "way better" than a human brain that will be able to solve them faster, with practically zero energy expenditure?