If ARM starts dominating in desktop and laptop spaces with a quite different set of applications, might we start seeing more software bugs around race conditions? Caused by developers writing software with X86 in mind, with its differing constraints on memory ordering.
That's a possibility. Some code still assumes (without realizing!) x86 style ordered loads and stores. This is called a strong memory model, specifically TSO, Total Store Order. If you tell x86 to execute "a=1; b=2;", it will always store value to 'a' first. Of course compilers might reorder stores and loads, but that's another matter.
ARM is free to reorder stores and loads. This is called a weak memory model. So unless it's explicitly told to the compiler, like C++ memory_order::acquire and memory_order::release, you might get invalid behavior. Heisenbugs in the worst case.
The major issue is these days most software is electron based or a webapp. I miss the days of 98/XP, where you'd find tons of desktop software. A PC actually felt something that had a purpose. Even if you spin up a XP/98(especially 98/2000 VM) now, you'd see the entire OS feels something that you can spend some time on. Nowadays most PCs feel like a random terminal where I open the browser and do some basic work(except for gaming ofcourse). I really hate the UX of win 11 , even 10 isn't much better compared to XP. I really hope we go back to that old era.
> Nowadays most PCs feel like a random terminal
It's a fun perception. For the longest time, all the "serious" computers were used through networks and terminals and didn't even come with any ability to connect a monitor or a keyboard (although a serial terminal would work as the system console). I used to joke (usually looking at Unisys Windows-based big servers), if the computer had VGA and PS/2 ports, it wasn't a computer, but a toy. Those Unisys servers weren't toys, but you could run Pinball and Minesweeper directly on them, which kind of said otherwise.
I think we got used to such levels of platform bloat that we don't care if the UI toolkit these days is bigger than the entire operating system that runs 95% of the world's payment transactions.
I think that's less likely than you'd expect because the memory ordering model used by C++ and others essentially requires you to write code that works even without x86's total storage order. If you don't then you can get bugs even on x86, because the compiler will violate the ordering you thought you had in your program, even if the CPU doesn't.
Also most software runs on ARM now and I don't think that has actually happened in practice.
> Also most software runs on ARM now and I don't think that has actually happened in practice.
At least in my house, ARM cores outnumber x86 cores by at least four to one. And I'm not even counting the 32-bit ARM cores in embedded devices.
There is a lot of space for memory ordering bugs to manifest in all those devices.
It's definitely a real issue in real code, since the CPU isn't bound by things like function boundaries or alias analysis or pointer validity. For example:
The compiler cannot reorder the load of b before the load of a, because it may not be a valid pointer if x is false. But the CPU is free to speculate long ahead, and if the pointer in b isn't valid, that's fine, the CPU can attempt a speculative load and fail.It's not particularly common and code that has this issue will probably crash only rarely, but it's not too hard to do.
This is actually one reason I feel like developing my systems level stuff on ARM64 instead of x86 (I have a DGX Spark box) is not a bad idea. Building lower level concurrent data structures, etc. it just seems wiser to have to deal with this more immanently.
That said, I've never actually run into one of these issues.
If it is programmed in assembly. This kind of nasty detail should be handled by the compilers.
If it's programmed in assembly, it just wont compile for a different architecture.
Wouldn't the compiler take care of producing the correct machine code?
The issue is that the C memory model allows more behaviours than the memory model of x86-64 processors. You can thus write code which is incorrect according to the C language specification but will happen to work on x86-64 processors. Moving to arm64 (with its weaker memory model than x86-64) will then reveal the latent bug in your program.
This architecture trick was often used for precisely this - finding bugs in the program that would work in one architecture and fail in another. A very common class of issues like these was about endianness, and PowerPC was very handy because it could boot as both high and low-endian modes (I think I remember different versions of Linux for each mode, but I'm no longer sure).
Starting with POWER8, the Linux kernel and some of the BSDs support 64-bit PowerPC in both big- and little-endian modes. Older PowerPC chips had more limited support for little-endian, and all the commercial desktop/server PowerPC OSes that come immediately to mind (classic Mac OS, Mac OS X, NEXTSTEP / OpenStep, OS/400 / IBM i, AIX, BeOS) are big-endian only.
As you'd expect, Linux distribution support for big- and little-endian varies.
And “happen to work on x86-64 processors” also will depend on the compiler. If you write
both the compiler and the CPU can freely pick the order in which those two happen (or even execute them in parallel, or do half of one first, then the other, then the other half of the first, but I think those are hypothetical cases)x86-64 will never do such a swap, but x86-64 compilers might.
If you write
, things might be different for the C compiler because a and b can alias. The hardware still is free to change that order, though.OpenBSD famously keeps a lot of esoteric platforms around, because running the same code on multiple architectures reveal a lot of bugs. At least that was one of the arguments previously.
Which is why Windows NT was multiplatform in 1993.
Developed on Intel i860, then MIPS, and only then on x86, alongside Alpha.
Big endian MIPS, no less! At least initially.
I don't think the i860 port lasted very long. IIRC, the performance in context switches was atrocious.
What is "correct"? If you write code that stores two values and the compiler emits two stores, that's correct. If the programmer has judged that the order of those stores is important, the compiler may not have any obligation to agree with the programmer. And even if it does, the compiler is likely only obligated to ensure the ordering as seen by the current thread, so two plain load instructions in the proper order would be enough to be "correct." But if the programmer is relying on those stores being seen in a particular order by other threads, then there's a problem.
Compilers can only be relied on to emit code that's correct in terms of the language spec, not the programmer's intent.
The compiler relies on the language and programmer to enforce and follow a memory consistency model
If you go around your OS yes that could be the case but you can already have issues using the application from machine to machine with the same OS having different amounts of RAM and different CPU's. But I am not an expert in these matters.
Only for the hand-written assemply parts of the source code. The rest will be handled by the compilers.
You don't need to be writing assembly. Anything sharing memory between multiple threads could have bugs with ARM's memory model, even if written in C, C++, etc.
Not even close. Except maybe in Rust /s
For rustaceans missing that /s, if you just use Relaxed ordering everywhere and you aren't sure why, but hey tests pass on x86, then yeah on arm it may have a problem. On x86 it effectively is SeqCst even if you specify Relaxed.