Great question! I actually just touched on this in another thread that went up right around the same time you asked this. It is clearly the next big frontier!
The short answer is: It's something I'm actively thinking about, but instrumenting micro-level events (like ZGC's load barriers or G1's write barriers) directly inside application threads without destroying throughput (or creating observer effects invalidating the measurements) is incredibly difficult.
Do you think it can be done by adjusting GC aggressiveness (or even disabling it for short periods of time) and correlating it with execution time?