In the M4, Apple mostly added counters only for the SME engine. The full list of supported counters can be found in the official guide: https://developer.apple.com/documentation/apple-silicon/cpu-...
Regarding branch profiling, all arm64 (M1+) cpus support these counters: - BRANCH_CALL_INDIR_MISPRED_NONSPEC - BRANCH_COND_MISPRED_NONSPEC - BRANCH_INDIR_MISPRED_NONSPEC - BRANCH_MISPRED_NONSPEC - BRANCH_RET_INDIR_MISPRED_NONSPEC - INST_BRANCH - INST_BRANCH_CALL - INST_BRANCH_COND - INST_BRANCH_INDIR - INST_BRANCH_RET - INST_BRANCH_TAKEN
afaik there is no limitation to implementing the fetching of all these counters based on ibireme’s research on kperf. btw, forked "poop" already can fetch BRANCH_MISPRED_NONSPEC.