Crazy to think that my first personal computer's entire storage (was 160MB IIRC?) could fit into the L3 of a single consumer CPU!

It's probably not possible architecturally, but it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.

https://github.com/coreboot/coreboot/blob/main/src/soc/intel...

Context: Early in the firmware boot process the memory controller isn't configured yet so the firmware uses the cache as RAM. In this mode cache lines are never evicted since there's no memory to evict them to.

There may be server workloads for which the L3 cache is sufficient, would be interesting if it made sense to create boards for just the CPU and no memory at scale.

I imagine for such a workload you can always solder a small memory chip to avoid having to waste L3 on unused memory and a non-standard booting process so probably not.

Most definitely, I work in finance and optimizing workloads to fit entirely in cache (and not use any memory allocations after initialization) is the de-facto standard of writing high perf / low latency code.

Lots of optimizations happening to make a trading model as small as possible.

I remember the talk about the Wii/WiiU hacking they intentionally kept the early boot code in cache so that the memory couldn’t be sniffed or modified on the ram bus which was external to the CPU and thus glitchable.

In my case it began with 16K (yes, 161024 bytes) and 90K (yes, 901024 bytes) 5.25" floppy disks (although the floppies were a few months after the computer). Eventually upgraded to 48K RAM and 180K double density floppy disks. The computer: Atari 800.

I'll see your Atari 800 and raise you my Atari 2600 with its whopping 128 bytes of RAM. Bytes with a B. I can kinda sorta call it a computer because you could buy a BASIC cartridge for it (I didn't and stand by that decision - it was pretty bad).

I thought the timex Sinclair 1000 win 2 Kbytes of ram was bad.

The membrane keyboard wasn’t great (the lack of a space bar was a wierd choice) but it did work. We had programs on casette and did get the 16Kbyte memory expansion.

https://en.wikipedia.org/wiki/Timex_Sinclair_1000

I didn’t realize the Atari 2600 had basic, always thought of it as a game console.

You can buy this bad boy [attiny11] with no ram, only registers.

https://ww1.microchip.com/downloads/en/DeviceDoc/1006S.pdf

My first PC had a 20MB HDD with 512Kb of RAM. So yeah that could fit into cache 10 times now.

Maybe in 50 years the cache of CPUs and GPUs will be 1TB. Enough to run multiple LLMs (a model entirely run for each task). Having robots like in the movies would need LLMs much much faster than what we see today.

doubtful that we will still have this computer architecture by then

KolibriOS would fit in there, even with the data in memory. You cannot load it into the cache directly, but when the cache capacity is larger than all the data you read there should be no cache eviction and the OS and all data should end up in the cache more or less entirely. In other words it should be really, really fast, which KolibriOS already is to begin with.

Unless you lay everything out continuously in memory, you’ll still get cache eviction due to associativty and depending on the eviction strategy of the CPU. But certainly DOS or even early Windows 95 could conceivably just run out of the cache

Windows 95 only needed 4MB RAM and 50 MB disk, so that's certainly doable. The trick is to have a hypervisor spread that allocation across cache lines.

Yeah, cache eviction is the reason I was assuming it is "probably not possible architecturally", but I also figured there could be features beyond my knowledge that might make it possible.

Edit: Also this 192MB of L3 is spread across two Zen CCDs, so it's not as simple as "throw it all in L3" either, because any given core would only have access to half of that.

Well, yeah, reality strikes again. All you need is an exploit in the microcode to gain access to AMD's equivalent to the ME and now you can just map the cache as memory directly. Maybe. Can microcode do this or is there still hardware that cannot be overcome by the black magic of CPU microcode?

That assumes KolibriOS or any major component is pinned to one core and one cache slice instead of getting dragged between CCDs or losing memory affinity. Throw actual users, IO, and interrupts at it and you get traffic across chiplets, or at least across L3 groups, so the nice 'everything lives in cache' story falls apart fast.

Nice demo, bad model. The funny part is that an entire OS can fit in cache now, the hard part is making the rest of the system act like that matters.

You had ~160,000 times more storage than I did for my first personal computer.

Commodore PET for me - 8 KB of RAM and all the data you could store and read back from a TDK 120 cassette tape . . .

* https://en.wikipedia.org/wiki/Commodore_PET

Same time as the Trash-80 and BBC micro were making inroads.

IIRC some relatively strange CPUs could run with unbacked cache.

Intel's platform, at the very least, use cache-as-ram during the boot phase before the DDR interface can be trained and started up. https://github.com/coreboot/coreboot/blob/main/src/soc/intel...

> it would be amusing to see an entire early 90's OS running entirely in the CPU's cache.

There’s actually already two running (MINIX and UEFI), and it’s the opposite OS amusing - https://www.zdnet.com/article/minix-intels-hidden-in-chip-op...

I wonder how much faster dos would boot, especially with floppy seek times...

Instantly.

If you run a VM on a CPU like this, using a baremetal hypervisor, you can get very close to "everything in cache".

You can get close with a VM, but there's overhead in device emulation that slows things down.

Consider a VM where that kind of stuff has been removed, like the firecracker hypervisor used for AWS Lambda. You're talking milliseconds.

My first pc had 40MB hrs and 8MB ram :D

640K ought to be enough for anybody.

My first computer whole RAM could fit in L1 of a single core (128k)