Oops, I actually replied to the wrong comment. I replied here: https://news.ycombinator.com/item?id=48617774
But it's a relevant reply to both comments, so copied here:
Yes, my understanding is that I should be able to emulate sendfile via splice. The problem with that is that splice requires one end to be a pipe. So I think this means two extra file descriptors per connection (one per side of the pipe). And per connection this adds 5 slots in the submit/completion queue, with a LINK dependency. Maybe the trade off is worth it. I've not done concrete experiments with it, but I'm guessing it would be if the saved copy_from_user is large enough.
So for optimal performance this may mean using write() for short files, and a pipe(), a pair of splice() calls, and a pair of close() calls, for larger files.