Hacker News

I would guess this is just to make the explanation of the bug easier.

In real world, the futurelock could occur even with very short locks, it just wouldn't be so deterministic. Having a minimal reproducer that you have to run a thousand times and it will maybe futurelock doesn't really make for a good example :)

imtringued 2 days ago [ - ]

>In real world, the futurelock could occur even with very short locks, it just wouldn't be so deterministic.

You have to explain the problem properly then. The problem here has nothing to do with duration whatsoever so don't bring that up. The problem here is that if you acquire a lock, you're inside a critical section. Critical sections have a programming paradigm that is equivalent to writing unsafe Rust. You're not allowed to panic inside unsafe Rust or inside critical sections. It's simply not allowed.

You're also not allowed to interrupt the critical section by something that does not have a hard guarantee that it will finish. This rules out await inside the critical section. You're not allowed to do await. It's simply not allowed. The only thing you're allowed to do is execute an instruction that guarantees that N-1 instructions are left to be executed, where N is a finite number. Alternatively you do the logical equivalent. You have a process that has a known finite bound on how long it will take to execute and you are waiting for that external process.

After that process has finished, you release the lock. Then you return to the scheduler and execute the next future. The next future cannot be blocked because the lock has already been released. It's simply impossible.

You now have to explain how the impossible happened. After all, by using the lock you've declared that you took all possible precautions to avoid interrupting the critical section. If you did not, then you deserve any bugs coming your way. That's just how locks are.

dap a day ago [ - ]

I think you misunderstand the problem. The only purpose of the sleep in this example is to control interleaving of execution to ensure the problem happens. Here's a version where the background task (the initial lock holder) only runs a bounded number of instructions with the lock held, just as you suggest:

https://play.rust-lang.org/?version=stable&mode=debug&editio...

It still futurelocks.

> After that process has finished, you release the lock. Then you return to the scheduler and execute the next future. The next future cannot be blocked because the lock has already been released. It's simply impossible.

This is true with threads and with tasks that only ever poll futures sequentially. It is not true in the various cases mentioned in this RFD (notably `tokio::select!`, but also others). Intuitively: when you have one task polling on multiple futures concurrently, you're essentially adding another layer to the scheduler (kernel thread scheduler, tokio task scheduler, now some task is acting as its own future scheduler). The problem is it's surprisingly easy to (1) not realize that and (2) accidentally have that "scheduler" not poll the next runnable future and then get stuck, just like if the kernel scheduler didn't wake up a runnable thread.