Thanks for the tips! I'll take a look at perf on Linux. If it supports freestanding programs I'll definitely start using it.
> On 32-bit this was a genuine problem but on 64-bit with large virtual address spaces, the chance of an arbitrary integer being a valid heap pointer drops alot. Especially if your allocator isnt using the low addresses. So the false retention problem is probably less bad than you'd expect.
False retention could be a problem in spite of that. Lone has moved to an index-based tagged value representation. Heap values are now indexes into a large contiguous array of objects. Pointers are calculated on the fly from the heap base pointer.
I did this in order to support easy zero copy heap reallocation with mremap. Uncoupling the values from their addresses also paved the way for heap compaction. It did turn the pointers into small integers though which are probably extremely common everywhere. I could devise a scheme to XOR some constant into the value in order to randomize it a bit and just undo the operation before dereferencing the actual pointers. Not sure if it's worth the trouble though. I haven't measured anything so far.
> One thing id be curious about is how your stack depth reduction after removing the recursive evaluator affects pause time. Conservative GC pause time is often dominated by how much stack there is to scan, so getting rid of recursive eval might have already improved your worst-case pauses more than you realize.
Indeed! I expect the C stack to remain shallow throughout the entire program. It was one of the major wins of the new evaluator design.
I just used a debugger to trace garbage collection during lone's execution of my recursive (fibonacci 10) test program. The garbage collector fired four times. The conservative stack scanner worked up from the bottom of the stack to the top. The difference between top and bottom was always 1008 in all four cases. Accesses are 8 byte aligned so that means 126 iterations. It's scanning around 16 cache lines. Added a counter for conservatively discovered values and the results were: 40, 38, 37, 33. So around 26~31% of values scanned were hits. Not too bad I guess?