Very cool, thanks. I hadn’t seen Graft before, but that sounds pretty adjacent in a lot of interesting ways. I looked at the repo and see what I can apply.
I've tried out all sorts of optimizations - for free pages, I've considered leaving empty space in each S3 object and serving those as free pages to get efficient writes without shuffling pages too much. My current bias has been to over-store a little if it keeps the read path simpler, since the main goal so far has been making cold reads plausible rather than maximizing space efficiency. Especially because free pages compress well.
I have two related roadmap item: hole-punching and LSM-like writing. For local on non-HDD storage, we can evict empty pages automatically by releasing empty page space back to the OS. For writes, LSM is best because it groups related things together, which is what we need. but that would mean doing a lot of rewriting on checkpoint. So both of these feel a little premature to optimize for vs other things.
Both of those roadmap items make sense! Excited to see how you evolve this project!
Thanks! I think the hole punching is the key one, as it is important for "this delete has happened but we need to free the space without a vacuum".