> The important thing to remember is that each of these cannot be split into separate instructions.
Nitpick, but they absolutely can be split into several instructions, and this is the most common way it’s implemented on RISClike processors, and also single instructions aren’t necessarily atomic.
The actual guarantee is that the entire operation (load, store, RMW, whatever) occurs in one “go” and no other thread can perform an operation on that variable during this atomic operation (it can’t write into the low byte of your variable as you read it).
It’s probably best euphemismed by imagining that every atomic operation is just the normal operation wrapped in a mutex, but implemented in a much more efficient manner. Of course, with large enough types, Atomic variables may well be implemented via a mutex
> It’s probably best euphemismed by imagining that every atomic operation is just the normal operation wrapped in a mutex
Sort-of, but that's not quite right: if you imagine it as "taking a mutex on the memory", there's a possibility of starvation. Imagine two threads repeatedly "locking" the memory location to update it. With a mutex, it's possible that one of them get starved, never getting to update the location, stalling indefinitely.
At least x86 (and I'm sure ARM and RISC-V as well) make a much stronger progress guarantee than a mutex would: the operation is effectively wait-free. All threads are guaranteed to make progress in some finite amount of time, no one will be starved. At least, that's my understanding from reading much smarter people talking about the cache synchronization protocols of modern ISAs.
Given that, I think a better mental model is roughly something like the article describes: the operation might be slower under high contention, but not "blocking" slow, it is guaranteed to finish in a finite amount of time and atomically ("in one combined operation").
x86 and modern ARM64 has instructions for this, but original ARM and RISC approaches are literally a hardware-assisted polling loop. Unsure what guarantees they make, I shall look.
Definitely a good clarification though yeah, important
I got curious about this, because really the most important guarantee is in the C++ memory model, not any of the ISAs, and conforming compilers are required to fulfill them (and it's generally more restrictive than any of the platform guarantees). It's a little bit hard to parse, but if I'm reading section 6.9.2.3 of the C++23 standard (from here [1]), operations on atomics are only lock-free, not wait-free, and even that might be a high bar on certain platforms:
I'm guessing that note is for platforms like you mention, where the underlying ISA makes this (more or less) impossible. I would assume in the modern versions of these ISAs though, essentially everything in std::atomic is wait-free, in practice.[1] https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/n4950.p...
My understanding is that C++ memory model essentially pulled the concurrency bit from their own imagination, and as a result the only architecture that actually maps to it is RISC-V which explicitly decided to support it.
Really finicky ones, and the initial ones made none. I think for RISC-V it's something like max 16 instructions covered with no other memory accesses to be assured progress on ll/sc sequences.
That's to enable very minimal hardware implementations that can only track one line at a time.
Early Alpha CPUs (no idea about laters) essentially had a single special register that asserted a mutex lock on a word-sized (64bit) location for any read/write operation on the Sysbus.
https://en.wikipedia.org/wiki/Load-link/store-conditional