Don't take this the wrong way, your opinion is also vibes.

Let's ground that a bit.

Have a look at ARC AGI 1 challenge/benchmark. Solve a problem or two yourself. Know that ARC AGI 1 is practically solved by a few LLMs as of Q1 2025.

Then have a look at the ARC AGI 2 challenge. Solve a problem or two yourself. Note that as of today, it is unsolved by LLMs.

Then observe that the "difficulty" of ARC AGI 1 and 2 for a human are relatively the same but challenge 2 is much harder for LLMs than 1.

ARC AGI 2 is going to be solved *within* 12 months (my bet is on 6 months). If it's not, I'll never post about AI on HN again.

There's only one problem to solve, i.e. "how to make LLMs truly see like humans do". Right now, any vision based features that the models exhibit comes from maximizing the use of engineering (i.e. applying CNNs on image slices, chunks, maybe zooming and applying ocr, vector search etc), it isn't vision like ours and isn't a native feature for these models.

Once that's solved, then LLMs or new Algo will be able to use a computer perfectly by feeding it screen capture. End of white collar jobs 2-5 years after (as we know it).

Edit - added "(as we know it)". And fixed missing word.

Speaking of vibes.

As long as AI is guessing answers based on what it has seen before, it's not happening.

I'm sorry. It doesn't matter how many bazillions you would cash in if it did, still not happening.

It's all wishful thinking.

I thought to myself, imagine something you’ve never imagined before. My first thought was what if there is a universe inside of every vegetable that is vegetable themed with anthropomorphic vegetable characters and all the atoms and molecules are some how veggified and everything is a vegetable. And then I wondered if an AI could ever come up with that with infinite time and resources without a prompt and then I thought about monkeys and typewriters.

If you listen interview with Francois it'll be clear to you that "vision" in the way you refer it, has very little do to with solving ARC.

And more to do with "fluid, adaptable intelligence, that learns on the fly"

That's fair. I care about the end result.

The problem is about taking information in 2D/3D space and solving the problem. Humans solve these things through vision. LLMs or AI can do it using another algorithm and internal representation that's way better.

I spent a long time thinking about how to solve the ARC AGI 2 puzzles "if I were an LLM" and I just couldn't think of a non-hacky way.

People who're blind use braille or touch to extract 2D/3D information. I don't know how blind people represent 2D/3D info once it's in their brain.

>AI can do it using another algorithm and internal representation that's way better

AI famously needs a boat load of energy and computation to work. How would you describe that as "way better" than a human brain that will be able to solve them faster, with practically zero energy expenditure?

>I'll never post about AI on HN again

Saving this. One less overconfident AI zealot, the better.