If I tasked you to find a novel hallucination in a leading LLM, how long would it take you? I used to be able to find these and run into them often, but right now I can't generate new failure modes, I just have my list of known failures and run into them organically once every couple of weeks.

I don't think anyone at this stage believes that they don't make mistakes, but we prefer to use them for the times they are useful.

It can do very difficult things, and fail at very basic things. If you look at either of those and try to extrapolate, you can generate a hot take that it's super smart, or super dumb, sure. But it's a reductionist take that fails to see the bigger picture either way.

I agree with you here, especially regarding the reductionist view point.

My only gripe was that single sentence, and we might just mean something slightly different there.

Also, I'm out of my depth here, but I believe these sort of issues are solved in a post-training step, which may look more like applying a band-aid. I'm not convinced these issues can actually be fully fixed (due to the way these work) - but of course this tradeoff doesn't make LLMs useless, and it can be limited/eliminated via clever applications.