How is this better than the equivalent coroutine code? I don't see any upsides from a user's perspective.
> The main thing is that obviously, you have to worry about the capture lifetime yourself
This is a big deal! The fact that the coroutine frame is kept alive and your state can just stay in local variables is one of the main selling points. I experienced this first-hand when I rewrote callback-style C++ ASIO code to the new coroutine style. No more [self=shared_from_this()] and other shenanigans!
Using shared_ptr everywhere is an antipattern.
The whole point of controlling the capture is controlling the memory layout, which is what C++ is all about.
Even with Asio, you don't really have to do this. It's just the style the examples follow, and Asio itself isn't necessarily the best design.
With callbacks you have to make sure that your data persists across the function calls. This necessarily requires more heap allocations (or copies) than in a coroutine where most data can just live on the stack.
A coroutine doesn't do anything more than a callback does -- it's just syntactic sugar.
The default behaviour of many asynchronous systems is to extend the lifetime of context data until all the asynchronous handlers have run. You can also just bind them to the resource instead which is arguably more elegant, but which depends on how cancellation is implemented.