At what point do the OEMs begin to realize they don’t have to follow the current mindset of attaching a GPU to a PC and instead sell what looks like a GPU with a PC built into it?
At what point do the OEMs begin to realize they don’t have to follow the current mindset of attaching a GPU to a PC and instead sell what looks like a GPU with a PC built into it?
The vast majority of computers sold today have a CPU / GPU integrated together in a single chip. Most ordinary home users don't care about GPU or local AI performance that much.
In this video Jeff is interested in GPU accelerated tasks like AI and Jellyfin. His last video was using a stack of 4 Mac Studios connected by Thunderbolt for AI stuff.
https://www.youtube.com/watch?v=x4_RsUxRjKU
The Apple chips have both power CPU and GPU cores but also have a huge amount of memory (512GB) directly connected unlike most Nvidia consumer level GPUs that have far less memory.
Most ordinary home users don't care about GPU or local AI performance that much.
Right now, sure. There's a reason why chip manufacturers are adding AI pipelines, tensor processors, and 'neural cores' though. They believe that running small local models are going to be a popular feature in the future. They might be right.
It's mostly marketing gimmicks though - they aren't adding anywhere near enough compute for that future. The tensor cores in an "AI ready" laptop from a year ago are already pretty much irrelevant as far as inferencing current-generation models go.
NPU/Tensor cores are actually very useful for prompt pre-processing, or really any ML inference task that isn't strictly bandwidth limited (because you end up wasting a lot of bandwidth on padding/dequantizing data to a format that the NPU can natively work with, whereas a GPU can just do that in registers/local memory). Main issue is the limited support in current ML/AI inference frameworks.
Exactly. With the Intel-Nvidia partnership signed this September, I expect to see some high-performance single-board computers being released very soon. I don't think the atx form-factor will survive another 30 years.
One should also remember that NVidia does have organisational experience on designing and building CPUs[0].
They were a pretty big deal back in ~2010, and I have to admit I didn't know that Tegra was powering Nintendo Switch.
0: https://en.wikipedia.org/wiki/Tegra
I had a Xolo Tegra Note 7 tablet (marketed in the US as EVGA Tegra Note 7) in around 2013. I preordered it as far as I remember. It had a Tegra 4 SoC with quad core Cortex A15 CPU and a 72 core GeForce GPU. Nvidia used to claim that it is the fastest SoC for mobile devices at the time.
To this day, it's the best mobile/Android device I ever owned. I don't know if it was the fastest, but it certainly was the best performing one I ever had. UI interactions were smooth, apps were fast on it, screen was bright, touch was perfect and still had long enough battery backup. The device felt very thin and light, but sturdy at the same time. It had a pleasant matte finish and a magnetic cover that lasted as long as the device did. It spolied the feel of later tablets for me.
It had only 1 GB RAM. We have much more powerful SoCs today. But nothing ever felt that smooth (iPhone is not considered). I don't know why it was so. Perhaps Android was light enough for it back then. Or it may have had a very good selection and integration of subcomponents. I was very disappointed when Nvidia discontinued the Tegra SoC family and tablets.
I'd argue their current CPUs aren't to be discounted either. Much as people love to crown Apple's M-series chips as the poster child of what arm can do, Nvidia's grace CPUs too trade blows with the best of the best.
It leaves one to wonder what could be if they had any appetite for devices more in the consumer realm of things.
At this point what you really need is an incredibly powerful heatsink with some relatively small chips pressed against it.
Transhcan Mac Pro was this idea, triangular heatsink core cpu+gpu+gpu for each side
If you disassemble a modern GPU, that's what you'll find. 95% by weight of a GPU card is cooling related.
So basically going back to the old days of Amiga and Atari, in a certain sense, when PCs could only display text.
I'm not familiar with that history. Could you elaborate?
In the home computer universe, such computers were the first ones having a programmable graphics unit that did more than paste the framebuffer into the screen.
While the PCs were still displaying text, or if you were lucky to own an Hercules card, gray text, or maybe a CGA one, with 4 colours.
While the Amigas, which I am more confortable with, were doing this in the mid-80's:
https://www.youtube.com/watch?v=x7Px-ZkObTo
https://www.youtube.com/watch?v=-ga41edXw3A
The original Amiga 1000, had on its motherboard, later reduced to fit into an Amiga 500,
Motorola 68000 CPU, a programmable sounds chip with DMA channels (Paula), and a programable blitter chip (Agnus aka early GPUs).
You would build in RAM the audio, or graphics instructions for the respetive chipset, set the DMA parameters, and let them lose.
Thanks! Early computing history is very interesting (I know that this wasn't the earliest). They also sometimes explain certain odd design decisions that are still followed today.
Hey! I had an Amiga 1000 back in the day - it was simply awesome.
In the olden days we didn't have GPUs, we had "CRT controllers".
What it offered you was a page of memory where each byte value mapped to a character in ROM. You feed in your text and the controller fetches the character pixels and puts them on the display. Later we got ASCII box drawing characters. Then we got sprite systems like the NES, where the Picture Processing Unit handles loading pixels and moving sprites around the screen.
Eventually we moved on to raw framebuffers. You get a big chunk of memory and you draw the pixels yourself. The hardware was responsible for swapping the framebuffers and doing the rendering on the physical display.
Along the way we slowly got more features like defining a triangle, its texture, and how to move it, instead of doing it all in software.
Up until the 90s when the modern concept of a GPU coalesced, we were mainly pushing pixels by hand onto the screen. Wild times.
The history of display processing is obviously a lot more nuanced than that, it's pretty interesting if that's your kind of thing.
Small addendum, there was already stuff like TMS34010 in the 1980's, just not at home.
Those machines multiplexed the bus to split access to memory, because RAM speeds were competitive with or faster than the CPU bus speed. The CPU and VDP "shared" the memory, but only because CPUs were slow enough to make that possible.
We have had the opposite problem for 35+ years at this point. The newer architecture machines like the Apple machines, the GB10, the AI 395+ do share memory between GPU and CPU but in a different way, I believe.
I'd argue with memory becoming suddenly much more expensive we'll probably see the opposite trend. I'm going to get me one of these GB10 or Strix Halo machines ASAP because I think with RAM prices skyrocketing we won't be seeing more of this kind of thing in the consumer market for a long time. Or at least, prices will not be dropping any time soon.
You are right, hence my "in a certain sense", because I was too lazy to point out the differences between a motherboard having everything there without pluggable graphics unit[0], and having everything now inside of a single chip.
[0] - Not fully correct, as there are/were extensions cards that override the bus, thus replacing one of the said chips, on Amiga case.
It's funny how ideas come and go. I made this very comment here on Hacker News probably 4-5 years ago and received a few down votes for it at the time (albeit that I was thinking of computers in general).
It would take a lot of work to make a GPU do current CPU type tasks, but it would be interesting to see how it changes parallelism and our approach to logic in code.
> I made this very comment here on Hacker News probably 4-5 years ago and received a few down votes for it at the time
HN isn't always very rational about voting. It will be a loss if you judge any idea on their basis.
> It would take a lot of work to make a GPU do current CPU type tasks
In my opinion, that would be counterproductive. The advantage of GPUs is that they have a large number of very simple GPU cores. Instead, just do a few separate CPU cores on the same die, or on a separate die. Or you could even have a forest of GPU cores with a few CPU cores interspersed among them - sort of like how modern FPGAs have logic tiles, memory tiles and CPU tiles spread out on it. I doubt it would be called a GPU at that point.
GPU compute units are not that simple, the main difference with CPU is that they generally use a combination of wide SIMD and wide SMT to hide latency, as opposed to the power-intensive out-of-order processing used by CPU's. Performing tasks that can't take advantage of either SIMD or SMT on GPU compute units might be a bit wasteful.
Also you'd need to add extra hardware for various OS support functions (privilege levels, address space translation/MMU) that are currently missing from the GPU. But the idea is otherwise sound, you can think of the 'Mill' proposed CPU architecture as one variety of it.
> GPU compute units are not that simple
Perhaps I should have phrased it differently. CPU and GPU cores are designed for different types of loads. The rest of your comment seems similar to what I was imagining.
Still, I don't think that enhancing the GPU cores with CPU capabilities (OOE, rings, MMU, etc from your examples) is the best idea. You may end up with the advantages of neither and the disadvantages of both. I was suggesting that you could instead have a few dedicated CPU cores distributed among the numerous GPU cores. Finding the right balance of GPU to CPU cores may be the key to achieving the best performance on such a system.
As I recall, Gartner made the outrageous claim that upwards of 70% of all computing will be “AI” in some number of years - nearly the end of cpu workloads.
I'd say over 70% of all computing is already been non-CPU for years. If you look at your typical phone or laptop SoC, the CPU is only a small part. The GPU takes the majority of area, with other accelerators also taking significant space. Manufacturers would not spend that money on silicon, if it was not already used.
> I'd say over 70% of all computing is already been non-CPU for years.
> If you look at your typical phone or laptop SoC, the CPU is only a small part.
Keep in mind that the die area doesn't always correspond to the throughput (average rate) of the computations done on it. That area may be allocated for a higher computational bandwidth (peak rate) and lower latency. Or in other words, get the results of a large number of computations faster, even if it means that the circuits idle for the rest of the cycles. I don't know the situation on mobile SoCs with regards to those quantities.
This is true, and my example was a very rough metric. But the computation density per area is actually way, way higher on GPU's compared to CPU's. CPU's only spend a tiny fraction of their area doing actual computation.
> If you look at your typical phone or laptop SoC, the CPU is only a small part
In mobile SoCs a good chunk of this is power efficiency. On a battery-powered device, there's always going to be a tradeoff to spend die area making something like 4K video playback more power efficient, versus general purpose compute
Desktop-focussed SKUs are more liable to spend a metric ton of die area on bigger caches close to your compute.
If going by raw operations done, if the given workload uses 3d rendering for UI that's probably true for computers/laptops. Watching YT video is essentially CPU pushing data between internet and GPU's video decoder, and to GPU-accelerated UI.
Looking at home computers, most of "computing" when counted as flops is done by gpus anyway, just to show more and more frames. Processors are only used to organise all that data to be crunched up by gpus. The rest is browsing webpages and running some word or excel several times a month.
[dead]
Is there any need for that? Just have a few good CPUs there and you’re good to go.
As for how the HW looks like we already know. Look at Strix Halo as an example. We are just getting bigger and bigger integrated GPUs. Most of the flops on that chip is the GPU part.
I still would like to see a general GPU back end for LLVM just for fun.
It would just make everything worse. Some (if anything, most) tasks are far less paralleliseable than typical GPU loads.
HN in general is quite clueless about topics like hardware, high performance computing, graphics, and AI performance. So you probably shouldn't care if you are downvoted, especially if you honestly know you are being correct.
Also, I'd say if you buy for example a Macbook with an M4 Pro chip, it is already is a big GPU attached to a small CPU.
People on here tend to act as if 20% of all computers sold were laptops, when it’s the reverse.
Maybe at the point where you can run Python directly on the GPU. At which point the GPU becomes the new CPU.
Anyway, we're still stuck with "G" for "graphics" so it all doesn't make much sense and I'm actually looking for a vendor that takes its mission more seriously.
I mean, that's kind of what's going on at a certain level with the AMD Strix Halo, the NVIDIA GB10, and the newer Apple machines.
In the sense that the RAM is fully integrated, anyways.