You're right. O_DIRECT is the endgame, but that's a full engine rewrite for us.

We're trying to stabilize the current architecture first. The complexity of hidden page fault blocking is definitely what's killing us, but we have to live with mmap for now.

I am curious -- what is the application and the language it's written in?

There are insanely dirty hacks that you could do to start controlling the fallout of the page faults (like playing games with userfaultfd), but they're unmaintainable in the long term as they introduce a fragility that results in unexpected complexity at the worst possible times (bugs). Rewriting / refactoring is not that hard once one understands the pattern, and I've done that quite a few times. Depending on the language, there may be other options. Doing an mlock() on the memory being used could help, but then it's absolutely necessary to carefully limit how much memory is pinned by such mappings.

Having been a kernel developer for a long time makes it a lot easier to spot what will work well for applications versus what can be considered glass jaws.