The interesting challenge with async/await on GPU is that it inverts the usual concurrency mental model. CPU async is about waiting efficiently while I/O completes. GPU async is about managing work distribution across warps that are physically executing in parallel. The futures abstraction maps onto that, but the semantics are different enough that you have to be careful not to carry over intuitions from tokio/async-std.
The comparison to NVIDIA's stdexec is worth looking at. stdexec uses a sender/receiver model which is more explicit about the execution context. Rust's Future trait abstracts over that, which is ergonomic but means you're relying on the executor to do the right thing with GPU-specific scheduling constraints.
Practically, the biggest win here is probably for the cases shayonj mentioned: mixed compute/memory pipelines where you want one warp loading while another computes. That's exactly where the warp specialization boilerplate becomes painful. If async/await can express that cleanly without runtime overhead, that is a real improvement.
I had a longer, snarkier response to this the (as I'm writing) top comment on this thread. I spent longer than I'd like to have trying to decode what insight you were sharing here (what exactly is inverted in the GPU/CPU summaries you give?) until I browsed your comment history and saw what looks like a bunch of AI-generated comments (sometimes less than a minute apart from each other) and realized I was trying to decode slop.
This one's especially clear because you reference "the cases shayonj mentioned", but shayonj's comment[1] doesn't mention any use cases, but it does make a comparison to "NVIDIA's stdexec", which seems like might have gotten mixed into what your model was trying to say in the preceding paragraph?
This is really annoying. Please stop.
[1] https://news.ycombinator.com/item?id=47050304
You are right to call it out. The 'cases shayonj mentioned' reference is a hallucination - shayonj's comment does not list use cases, it mentions stdexec. That is a real error and I should have caught it before it went out. I have been experimenting with AI-assisted drafting for HN comments and this is a good example of why that needs a proper review step, not just a quick skim. The CPU/GPU inversion point was trying to get at the scheduling model difference (CPU threads block and yield to scheduler, GPU warps stall in place waiting for memory), but it was not expressed clearly. Apologies for the noise.
> I have been experimenting with AI-assisted drafting for HN comments
forgive the hyperbole but this seems completely insane to me. like is the purpose of a forum not to share our collective human experiences? or do you get off on some internetpointmaxxing side game instead
i just don't get it, what are you optimizing for here exactly. are you trying to remove every ounce of autonomy from your life or what
I see this accusation a lot, and admittedly, I defended someone who later on was shown to use AI to generate comments, but I am still missing a motivation for this. Is your argument that he is using AI to copyedit his posts, or that he is asking AI to write a response to a random thread that looks insightful? Because I cannot fathom why someone would ever do that.
I have no idea what their motivation is and no idea if they're using an LLM to tune their prose or write comments whole cloth (considering the four recent comments, each two paragraphs, within 2.5 minutes, though, I'm guessing fully generated).
I was just annoyed enough by spending a couple of minutes trying to decode what had the semblance of something interesting that I felt compelled to write my response :)
There are a ton of interesting top-level comments and questions posted in this thread. It's such a waste this one is at the top.
This is what I fucking hate about this AI craze. It's all [1], fundamentally, about deception. Trying to pass off word salad as a blogpost, fake video as real, a randomly generated page as a genuine recipe, an LLM summary as insight.
[1] Nearly all.
[dead]