Other than just the niceness of the interface, a key one is that the M4 generation added profiling of CPU branching and afaik instruments is the only thing that supports it right now
afaik there is no limitation to implementing the fetching of all these counters based on ibireme’s research on kperf.
btw, forked "poop" already can fetch BRANCH_MISPRED_NONSPEC.
Other than just the niceness of the interface, a key one is that the M4 generation added profiling of CPU branching and afaik instruments is the only thing that supports it right now
In the M4, Apple mostly added counters only for the SME engine. The full list of supported counters can be found in the official guide: https://developer.apple.com/documentation/apple-silicon/cpu-...
Regarding branch profiling, all arm64 (M1+) cpus support these counters: - BRANCH_CALL_INDIR_MISPRED_NONSPEC - BRANCH_COND_MISPRED_NONSPEC - BRANCH_INDIR_MISPRED_NONSPEC - BRANCH_MISPRED_NONSPEC - BRANCH_RET_INDIR_MISPRED_NONSPEC - INST_BRANCH - INST_BRANCH_CALL - INST_BRANCH_COND - INST_BRANCH_INDIR - INST_BRANCH_RET - INST_BRANCH_TAKEN
afaik there is no limitation to implementing the fetching of all these counters based on ibireme’s research on kperf. btw, forked "poop" already can fetch BRANCH_MISPRED_NONSPEC.
An interface designed around finding real bottlenecks instead of "here is some data do you think it is good".