Amazing that they are trying to solve this with hardware rather than with a new software architecture but I suppose the current technology underlying LLM software must be far and away the best theoretically or most established, or the time taken to seek a new model isn't worth it for the big companies.

I know Yann LeCun is trying to do a completely different architecture and I think that's expected to take 2-3 years before showing commercial results, right? Is that why they're finding it quicker to change the hardware?

It is both a software and hardware problem. Software because you can train LLMs that get better at very large contexts. Hardware because no matter what you do in software, you still need faster and bigger chips.

Yann LeCunn has been very wrong in the past about LLMs.[0] The approach he wants to take is to train using sensor data in the physical world. I think it's going to fail because there's near infinite amount of physical data down to Schrodinger's equation on how particles behave. There's too much signal to noise. My guess is that they'll need magnitudes more compute to even get something useful but they do not have more compute than OpenAI and Anthropic. In other words, I think LLMs will generate revenue as a stepping stone for OpenAI and Anthropic such that they will be the ones who will ultimately train the AI that LeCunn dreams of.

[0]https://old.reddit.com/r/LovingAI/comments/1qvgc98/yann_lecu...

I don't know. Some of those statements still look correct at the time they were made and then people found out how to work around them. I don't think anyone has shown his general assumption is wrong really. The issue is we don't know what the ceiling is for these things is yet because we haven't hit it. But that doesn't mean there is no ceiling.

His generation assumptions were wrong. That's the point.

I haven't seen any indication that they are. Can you point me at some?

Nvidia has so much money, it would be a waste if they wouldn't attack current problems on multiply points at once.

People, Researcher, Investor etc. probably also want to see what would be possible and someone has to do it.

I can also imagine, that an inferencing optimized system like this could split the context for different requests if it doesn't need to use the full context.

Could also be that they have internal use cases which require this amount of context.

[dead]