Sorry, I should have elaborated. I believe that copy-on-write with virtual memory (VM) can be used to achieve a runtime that appears to use copy-by-value everywhere with near-zero overhead when the VM block size is small, like 4k.

If we imagine a function passing a block of memory to sub functions which may write bytes to it randomly, then each of those writes may allocate another block. If those allocations are similar in size to the VM block size, then each invocation can potentially double the amount of memory used.

A do-one-thing-and-do-it-well (DOTADIW?) program works in a one-shot fashion where the main process fires off child processes that return and free the memory that was passed by value. Surrounded by pipes, so that data is transmuted by each process and sent to the next one. VM usage may grow large temporarily per-process, but overall we can think of each concurrent process as roughly doubling the amount of memory.

Writing this out, I realized that the worst case might be more like every byte changing in a 4k block, so a 4096 times increase in memory. Which still might be reasonable, since we accept roughly a 200x speed decrease for scripting languages. It might be worth profiling PHP to see how much memory increases when every byte in a passed array is modified. Maybe they use a clever tree or refcount strategy to reduce the amount of storage needed when arrays are modified. Or maybe they just copy the entire array?

Another avenue of research might be determining whether a smarter runtime could work with "virtual" VMs (VVMs?) to use a really small block size, maybe 4 or 8 bytes to match the memory bus. I'd be willing to live with a 4x or 8x increase in memory to avoid borrow checkers, refcounts or garbage collection.

-

Edit: after all these years, I finally looked up how PHP handles copy-on-write, and it does copy the whole array on write unfortunately:

http://hengrui-li.blogspot.com/2011/08/php-copy-on-write-how...

If I were to write something like this today, I'd maybe use "smart" associative arrays of some kind instead of contiguous arrays, so that only the modified section would get copied. Internally that might be a B-Tree with perhaps 8 bytes per leaf to hold N primitives like 1 double, 2 floats, etc. In practice, a larger size like 16-256 bytes per leaf might improve performance at the cost of memory.

Looks like ZFS deduplication only copies the blocks within the file that changed, not the entire file. Their strategy could be used for a VM so that copy-on-write between processes only copies the 4k blocks that change. Then if it was a realtime unix, functions could be synchronous blocking processes that could be called with little or no overhead.

This is the level of work that would be required to replace Rust with simpler metaphors, and why it hasn't happened yet.