OK, look.
Since my previous attempt to measure the impact of trap on signed overflow didn't seem to have moved your position one bit, I thought I'd give it a go in the most representable way I could think of:
I build the same version of clang on a x86, aarch64 and RISC-V system using clang. Then I build another version with the `-ftrapv` flag enabled and compared the compiletimes of compiling programs using these clang builds running on real hardware:
runtime: x86 | aarch64 | RISC-V (RVA23)
Zen1 | A78 A55* | X100 A100 !!! all cores clocked to about 2.2GHz, Zen1 can reach almost 4GHz
clang A: 3.609±0.078 | 4.209±0.050 9.390±0.029 | 5.465±0.070 11.559±0.020
clang-ftrapv A: 3.613±0.118 | 4.290±0.050 9.418±0.056 | 5.448±0.060 11.579±0.030
clang B: 8.948±0.100 | 10.983±0.188 22.827±0.016 | 13.556±0.016 28.682±0.023
clang-ftrapv B: 8.960±0.125 | 11.099±0.294 22.802±0.039 | 13.511±0.018 28.741±0.050
As you can see, once again the overhead of -ftrapv is quite low.Suprizinglt the -ftrapv overhead seems the highest on the Cortex-A78. My guess is that this because clang generates a seperate brk with unique immediate for every overflow check, while on RISC-V it always branches to one unimp per function.
Please tell me if you have a better suggestion for measuring the real world impact.
Or heck, give me some artificial worst case code. That would also be an interesting data point.
Notes:
* The format is mean±variance
* Spacemit X100 is a Cortex-A76 like OoO RISC-V core and A100 an in-order RISC-V core.
* I tried to clock all of the cores to the same frequency of about 2.2GHz. *Except for the A55, which ran at 1.8GHz, but I linearly scaled the results.
* Program A was the chibicc (8K loc) compiler and program B microjs (30K loc).
binary size:
x86 aarch64 RISC-V
clang: 212807768 216633784 195231816
clang-ftrapv: 212859280 216737608 195419512
increase: 0.24% 0.047% 0.09%