The problem is that it's relatively easy to add "supports both endiannesses" in hardware and architecture but the ongoing effect on the software stack is massive. You need a separate toolchain for it; you need support in the kernel for it; you need distros to build all their stuff two different ways; everybody has to add a load of extra test cases and setups. That's a lot of ongoing maintenance work for a very niche use case, and the other problem is that typically almost nobody actually uses the nonstandard endianness config and so it's very prone to bitrotting, because nobody has the hardware to run it.
Architectures with only one supported endianness are less painful. "Supports both and both are widely used" would also be OK (I think mips was here for a while?) but I think that has a tendency to collapse into "one is popular and the other is niche" over time.
Relatedly, "x32" style "32 bit pointers on a 64 bit architecture" ABIs are not difficult to define but they also add a lot of extra complexity in the software stack for something niche. And they demonstrate how hard it is to get rid of something once it's nominally supported: x32 is still in Linux because last time they tried to dump it a handful of people said they still used it. Luckily the Arm ILP32 handling never got accepted upstream in the first place, or it would probably also still be there sucking up maintenance effort for almost no users.
Major difference between "we won't support big endian" and calling RISC-V out as stupid for adding optional support.
The academic argument Linus himself made is alone reason enough that big-endian SHOULD be included in the ISA. When you are trying to grasp the fundamentals in class, adding little endian's "partially backward, but partially forward" increases complexity and mistakes without meaningfully increasing knowledge of the course fundamentals.
No zbb support is also a valid use. Very small implementations may want to avoid adding zbb, but still maximize performance. These implementations almost certainly won't be large enough to run Linux and wouldn't be Linus' problem anyway.
While I've found myself almost always agreeing with Linus (even on most of his notably controversial rants), he's simply not correct about this one and has no reason to go past the polite, but firm "Linux has no plans to support a second endianness on RISC-V".
> Relatedly, "x32" style "32 bit pointers on a 64 bit architecture" ABIs are not difficult to define but they also add a lot of extra complexity in the software stack for something niche.
I'm not sure that there's much undue complexity, at least on the kernel side. You just need to ensure that the process running with 32-bit pointers can avoid having to deal with addresses outside the bottom 32-bit address space. That looks potentially doable. You need to do this anyway for other restricted virtual address spaces that arise as a result of memory paging schemes, such as 48-bit on new x86-64 hardware where software may be playing tricks with pointer values and thus be unable to support virtual addresses outside the bottom 48-bit range.
In practice it seems like it's not as simple as that; see this lkml post from a few years back pointing out some of the weird x32 specific syscall stuff they ended up with: https://lkml.org/lkml/2018/12/10/1145
But my main point is that the complexity is not in the one-off "here's a patch to the kernel/compiler to add this", but in the way you now have an entire extra config that needs to be maintained and tested all the way through the software stack by the kernel, toolchain, distros and potentially random other software with inline asm or target specific ifdefs. That's ongoing work for decades for many groups of people.
Then the real question is whether this bespoke syscall mechanism will be needed going forward, especially as things like time_t adopt 64-bit values anyway. Can't we just define a new "almost 32-bit" ABI that just has 64-bit clean struct layouts throughout for all communication with the kernel (and potentially with system-wide daemons, writing out binary data, etc. so there's no real gratuitous breakage there, either), but sticks with 32-bit pointers at a systems level otherwise? Wouldn't this still be a massive performance gain for most code?
You could definitely do better than x32 did (IIRC it is a bit of an outlier even among "32-bit compat ABI" setups). But even if the kernel changes were done more cleanly that still leaves the whole software stack with the ongoing maintenance burden. The fact that approximately nobody has taken up x32 suggests that the performance gain is not worth it in practice for most people and codebases.
Defining yet another 32-bit-on-64-bit x86 ABI would be even worse, because now everybody would have to support x32 for the niche users who are still using that, plus your new 32-bit ABI as well.
But that maintenance burden has been paid off for things like 64-bit time_t on 32-bit ABI's. One couod argue that this changes the calculus of whether it's worth it to deprecate the old x32 (as has been proposed already) but also propose more general "ABI-like" ways of letting a process only deal with a limited range of virtual address space, be that 32-bit, 48-bit or whatever - which is, arguably, where most of the gain in "x32" is.