How do I do an "add with carry" (or subtract with carry/borrow) on RISC-V (for this, of course, the addition has to set a carry flag, and the subtraction either has to set a carry or borrow flag).
This feature is very important for a high-performant implementation of arbitrary-precision arithmetic.
Yes yes that's the other widely quoted criticism of RISC-V, from a GNU MP maintainer.
At the time there was no widely-available RISC-V hardware. There is now, and earlier this year I tested it.
It turns out that the GNU MP project's own benchmark runs better on SiFive RISC-V cores than on comparable µarch Arm cores, specifically better on U74 than on A53, and better on P550 than on A72, despite (obviously) the Arm cores having carry flags and ADC instructions and the RISC-V cores not having those.
The ensuing discussion also came to a consensus that once you get to very wide cores e.g. 8+ like Apple's M1 and several upcoming RISC-V cores, carry flags are ACTIVELY BAD because they serialise the computation (even with OoO, renaming the flags register etc) while only one 64 bit A + 64 bit B + 1 bit carry-in limb computation in 18,446,744,073,709,551,616 has an actual carry-out dependency on the carry-in, so you can almost always simply add all the limbs in parallel. The carry-in only affects carry-out when A+B is exactly 0xFFFFFFFFFFFFFFFF.
Stand-alone array indexing, not in a loop, is quite rare and seldom affects code speed. When it is in a loop, the calculation is usually strength-reduced to a bump of a pointer by a constant, so things such as scaled register indexing aren't actually used a lot.
It is true that there is a range of offsets between ±1 MB and ±128 MB in which arm64 can use a single BL instruction while RISC-V's JAL won't reach. Both need two instructions (8 bytes) for a ±2 GB subroutine call. You'd have to analyse program traces to find out how important this it in practice.
I haven't done that study myself, but I expect it would be a losing thing to bet that one of the authors (and his students / colleagues) of the famous "Computer Architecture: A Quantitative Approach" textbook didn't think to do this while making ISA design tradeoffs.
RISC-V does get genuine extra utility from being able to use other registers as the link register e.g. using the `-msave-restore` compiler flag to call runtime functions to save and restore registers instead of having load/store multiple, and being able to call that function before saving the Link Register.
I mean, that goes for the array indexing thing too, and all the other criticisms erincandescent makes. It's a very easy thing to say that constants and offsets should always be bigger, better, instruction sequences shorter but you then have to ask what else should be left out to make room, especially in an ISA with fixed-length instructions.
OK, if it has been addressed or it is irrelevant:
How do I do an "add with carry" (or subtract with carry/borrow) on RISC-V (for this, of course, the addition has to set a carry flag, and the subtraction either has to set a carry or borrow flag).
This feature is very important for a high-performant implementation of arbitrary-precision arithmetic.
Yes yes that's the other widely quoted criticism of RISC-V, from a GNU MP maintainer.
At the time there was no widely-available RISC-V hardware. There is now, and earlier this year I tested it.
It turns out that the GNU MP project's own benchmark runs better on SiFive RISC-V cores than on comparable µarch Arm cores, specifically better on U74 than on A53, and better on P550 than on A72, despite (obviously) the Arm cores having carry flags and ADC instructions and the RISC-V cores not having those.
The ensuing discussion also came to a consensus that once you get to very wide cores e.g. 8+ like Apple's M1 and several upcoming RISC-V cores, carry flags are ACTIVELY BAD because they serialise the computation (even with OoO, renaming the flags register etc) while only one 64 bit A + 64 bit B + 1 bit carry-in limb computation in 18,446,744,073,709,551,616 has an actual carry-out dependency on the carry-in, so you can almost always simply add all the limbs in parallel. The carry-in only affects carry-out when A+B is exactly 0xFFFFFFFFFFFFFFFF.
Full thread here:
https://www.reddit.com/r/RISCV/comments/1jsnbdr/gnu_mp_bignu...
Well, maybe. I feel like he still has a point with array indexing and JAL.
Stand-alone array indexing, not in a loop, is quite rare and seldom affects code speed. When it is in a loop, the calculation is usually strength-reduced to a bump of a pointer by a constant, so things such as scaled register indexing aren't actually used a lot.
It is true that there is a range of offsets between ±1 MB and ±128 MB in which arm64 can use a single BL instruction while RISC-V's JAL won't reach. Both need two instructions (8 bytes) for a ±2 GB subroutine call. You'd have to analyse program traces to find out how important this it in practice.
I haven't done that study myself, but I expect it would be a losing thing to bet that one of the authors (and his students / colleagues) of the famous "Computer Architecture: A Quantitative Approach" textbook didn't think to do this while making ISA design tradeoffs.
RISC-V does get genuine extra utility from being able to use other registers as the link register e.g. using the `-msave-restore` compiler flag to call runtime functions to save and restore registers instead of having load/store multiple, and being able to call that function before saving the Link Register.
I mean, that goes for the array indexing thing too, and all the other criticisms erincandescent makes. It's a very easy thing to say that constants and offsets should always be bigger, better, instruction sequences shorter but you then have to ask what else should be left out to make room, especially in an ISA with fixed-length instructions.