Ah FireShip, I forgot that channel existed at all. I asked YouTube to not recommend that channel after every vaguely AI-related news was "BIG NEWS!!!", the videos were also thin on actual content, and there were repeated factual errors over multiple videos too. At that point, the only thing it's good for is to make yourself (falsely) feel like you're keeping up.
Hey if you enjoy it, go for it. I used to like it a couple of years ago too, but I found that more and more lately, it was neither entertaining nor reliably informative. The jokes/memes were lazy and recycled a lot, the tech content was often poorly researched, and it started feeling like content produced for the sake of having content.
Much more preferred to what OpenAI always did and Anthropic recently started doing. Just write some complicated narrative about how scary this new model is and how it tried to escape and deceive and hack the mainframe while telling the alignment operators bed time stories.
Anthropic "warned" Claude 4 is so smart that it will try to use the terminal (if using Claude Code) or any other tools available (depending on where you're invoking it from) to contact local authorities if you're doing something very immoral.
Yeah the timing seems strange. Considering how much money will move hands based on those results this might be some kind of play to manipulate the market at least a bit.
Hard to say exactly how it will affect the market, but IIRC when deepseek was first released Nvidia stock took a big hit as people realized that you could develop high performing LLMs without access to Nvidia hardware.
I thought the reaction was more so that you can train SOTA models without an extremely large quantity of hyper-expensive GPU clusters?
But I would say that the reaction was probably vastly overblown as what Deepseek really showed was there are much more efficient ways of doing things (which can also be applied with even larger clusters).
If this checkpoint is trained using non-Nvidia GPUs that would definitely be a much bigger situation but it doesn't seem like there has been any associated announcements.
Plans take time to adjust; I imagine a big part of the impact was companies realizing that they need to buy/rent much less expensive GPU compute to realize the plans they've already committed to for the next couple years. Being able to spend less to get the same results is an immediate win; expanding the plan to make use of suddenly available surplus money/compute takes some time.
And then part of the impact was just "woah, if some noname team from China can casually leapfrog major western players on a tiny budget and kill one of their moats in the same move, what other surprises like this are possible?". The event definitely invalidated a lot of assumptions investors had about what is or isn't possible near-term; the stock market reacted to suddenly increased uncertainty.
Except that, all Deepseek models so far have been trained on Nvidia hardware. For Deepseek v3, they literally mention that they used 2,048 NVIDIA H800 GPUs right in the abstract: https://arxiv.org/html/2505.09343v1
I know of enterprises in APAC now spending millions of dollars on Huawei GPUs, while they might not be as efficient, they are seen as geopolitically more stable (especially given the region).
DeepSeek helped "prove" to a lot of execs that "Good" is "Good enough" and that there are viable alternatives with less perceived risk of supply chain disruption - even if facts differ may from this narrative.
Honest question, how do you know this is a big improvement? Are there any benchmarks anywhere?
There will be a video from FireShip if its a big one. /s
Ah FireShip, I forgot that channel existed at all. I asked YouTube to not recommend that channel after every vaguely AI-related news was "BIG NEWS!!!", the videos were also thin on actual content, and there were repeated factual errors over multiple videos too. At that point, the only thing it's good for is to make yourself (falsely) feel like you're keeping up.
Fireship consistently makes some of the most entertaining tech content out there
Hey if you enjoy it, go for it. I used to like it a couple of years ago too, but I found that more and more lately, it was neither entertaining nor reliably informative. The jokes/memes were lazy and recycled a lot, the tech content was often poorly researched, and it started feeling like content produced for the sake of having content.
[dead]
Much more preferred to what OpenAI always did and Anthropic recently started doing. Just write some complicated narrative about how scary this new model is and how it tried to escape and deceive and hack the mainframe while telling the alignment operators bed time stories.
Really? I missed this. The new hype trick is implying the new LLM releases are almost AGI? Love it.
Anthropic "warned" Claude 4 is so smart that it will try to use the terminal (if using Claude Code) or any other tools available (depending on where you're invoking it from) to contact local authorities if you're doing something very immoral.
I think they did make an announcement on WeChat.
I like it too, but some benchmark numbers would be nice at least.
On the day Nvidia report earnings too. Pretty sure it's just a coincidence, bro.
Yeah the timing seems strange. Considering how much money will move hands based on those results this might be some kind of play to manipulate the market at least a bit.
I believe that they are funded by a hedge fund. So, there are no coincidences here.
Is releasing a better product really "market manipulation"? It seems to me like regular, good competition.
It's "manipulating the market" only when your geopolitical adversary brings the competition.
How does releasing it today affect the market compared to releasing it last week?
Hard to say exactly how it will affect the market, but IIRC when deepseek was first released Nvidia stock took a big hit as people realized that you could develop high performing LLMs without access to Nvidia hardware.
I thought the reaction was more so that you can train SOTA models without an extremely large quantity of hyper-expensive GPU clusters?
But I would say that the reaction was probably vastly overblown as what Deepseek really showed was there are much more efficient ways of doing things (which can also be applied with even larger clusters).
If this checkpoint is trained using non-Nvidia GPUs that would definitely be a much bigger situation but it doesn't seem like there has been any associated announcements.
Plans take time to adjust; I imagine a big part of the impact was companies realizing that they need to buy/rent much less expensive GPU compute to realize the plans they've already committed to for the next couple years. Being able to spend less to get the same results is an immediate win; expanding the plan to make use of suddenly available surplus money/compute takes some time.
And then part of the impact was just "woah, if some noname team from China can casually leapfrog major western players on a tiny budget and kill one of their moats in the same move, what other surprises like this are possible?". The event definitely invalidated a lot of assumptions investors had about what is or isn't possible near-term; the stock market reacted to suddenly increased uncertainty.
Except that, all Deepseek models so far have been trained on Nvidia hardware. For Deepseek v3, they literally mention that they used 2,048 NVIDIA H800 GPUs right in the abstract: https://arxiv.org/html/2505.09343v1
Actually, the "narrative" crashed Nvidia for no reason.
Not only DeepSeek uses a lot of Nvidia hardware for the training.
But even more so, by releasing an open weight frontier model, people around the world need more Nvidia chips than ever for inference.
I know of enterprises in APAC now spending millions of dollars on Huawei GPUs, while they might not be as efficient, they are seen as geopolitically more stable (especially given the region).
DeepSeek helped "prove" to a lot of execs that "Good" is "Good enough" and that there are viable alternatives with less perceived risk of supply chain disruption - even if facts differ may from this narrative.
Yes, I know them too, I live there!
The hardware is great, CANN is not CUDA yet.
someone has not heard about huawei GPU
Plenty of manipulation to go around..
"Tech Chip software stocks sink on report Trump ordered halt to China sales" - https://www.cnbc.com/2025/05/28/chip-software-trump-china.ht...
What big improvements?
Anyone got benchmarks?