I developed for windows before moving to linux. I was surprised to find that was no system call similar to windows WaitForMultipleObjects. Sure you can implement something similar using poll() or using condition variables. but WaitForMultipleObjects seems so much simpler and more versatile

The article mentions this: "A few years back, Linux added a way for software to wait on several events at once, which is something Windows had built in for decades, but Linux didn't."

This is not really my area, but from a quick web search, I think they mean io_uring. Here's a blog post about it: https://mazzo.li/posts/uring-multiplex.html

I thought that's what NTSync was all about?

https://docs.kernel.org/next/userspace-api/ntsync.html

No, I believe it's futex_waitv

it's both, futex_waitv can also be dispatched via io_uring so you can wait on file descriptors and futexes simultaneously.

Epoll / select? since everything is a file, you can wait on everything.

The last time I asked the same question here, user dwattttt finally pointed out[1][2] to me that there is a significant difference: wfmo can actually acquire semaphores in addition to waiting for them, which poll can't do in a non-racy way and efficient way. It can also do rendezvous synchronization (i.e. signal-and-wait).

[1] https://news.ycombinator.com/item?id=47513667 [2] https://lore.kernel.org/lkml/f4cc1a38-1441-62f8-47e4-0c67f5a...

A lot of that flexibility is what makes it hard to efficiently emulate (especially without kernel level support), but some of it seems too flexible to make sense as the default choice. How often does a video game really need a lock that can be shared between processes, and why should that lock type be the one that a game engine uses for almost all of its locks?

> How often does a video game really need a lock that can be shared between processes,

What do you mean? SRWLock (or the older CRITICAL_SECTION) cannot be shared between processes. A (Win32) Mutex does work across processes, but that's its entire purpose. So Windows does have different tools for different jobs.

In fact, it's really the other way round: on Linux, a futex also works across processes, but there is no equivalent in Windows. (Sadly, WaitOnAddress can only be used in a single process.)

It very often being used for thread management inside single process etc. Very convenient. Nobody says it has to be default.

How often does a video game really need a lock that can be shared between processes,

That seems hugely useful for interprocess communication and I can immediately think of reasons to use IPC in a game. Having a separate voice process for one.

But that goes back to "how often". Not how many games use it, but how many times per second they use it. You might touch your voice process lock once per frame? That's negligible in terms of CPU time. Any half-reasonable overhead makes no difference in that lock, but might have a big impact in a more common lock.

It absolutely can make a difference because if you have locks that are supposed to sync or wake up other processes you care about latency not cpu usage.

What specifically are you saying can make a difference?

I'm saying that extra overhead from making your lock work across processes should be very tiny. That overhead shouldn't add much more than a microsecond in either latency or CPU usage, compared to an in-process lock.

You were saying "reasonable overhead" makes no difference because something "isn't called much". This is not only ambiguous but also not true because latency is important.

What calls specifically are you talking about between windows and linux? This was started by someone talking about WaitForMultipleObjects.

I wasn't excusing all overhead, I was excusing the difference in overhead caused by making the lock more flexible. Because that's what the discussion is about, a lock that can be shared between processes versus a lock that can't be. The penalty for being "too flexible".

But assuming reasonable implementations, the difference between those two lock styles shouldn't be more than about a microsecond, should it? So that's fine for a lock that's only used 100 times a second.

I'm not comparing windows and linux anywhere.

I was excusing the difference in overhead caused by making the lock more flexible

What are the two functions you're comparing and what is the actual difference in overhead that you're talking about?

a lock that can be shared between processes versus a lock that can't be.

This is a dramatic black and white difference, these would be used for two different things. In that case it's apple and oranges, one would be for interprocess communication and one wouldn't.

the difference between those two lock styles shouldn't be more than about a microsecond,

What are you basing this on? Do you have an examples or benchmarks of the actual calls and their timings?

fine for a lock that's only used 100 times a second.

Again, latency isn't about how many times something is called per second. That would matter for throughput.

Its IO completion ports I miss.