.NET has had task-local vars for about a decade now: https://learn.microsoft.com/en-us/dotnet/api/system.threadin...
Python added them in 3.7: https://docs.python.org/3/library/contextvars.html
.NET has had task-local vars for about a decade now: https://learn.microsoft.com/en-us/dotnet/api/system.threadin...
Python added them in 3.7: https://docs.python.org/3/library/contextvars.html
I'll admit to unfamiliarity with the .NET version, but for Python even `threading.local` is a useless implementation if you care at all about performance.
Performant thread-local variables require ahead-of-time mapping to a 1-or-2-level integer sequence with a register to quickly the base array, and some kind of trap to handle the "not allocated" case. Task-local variables are worse than thread-locals since they are swapped out much more frequently.
This requires special compiler support, not being a mere library.
I would argue that if you're using Python, you already don't care about performance (unless it's just a little glue between other things).
In .NET they do virtual dispatch via a very basic map-like interface that has a bunch of micro-optimized implementations that are swapped in and out as needed if new items are added. For N up to 4 variables, they use a dedicated implementation that stores them as fields and does simple branching to access the right one, for each N. Beyond that it becomes an array, and at some point, a proper Dictionary. I don't know the exact perf characteristics, but FWIW I don't recall that ever being a source of an actual, non-hypothetical perf problem. Usually you'll have one local that is an object with a bunch of fields, so you only need one lookup to fetch that, and from there it's as fast as field access.