Hacker News

I’m curious how this technique works, or not, with unified memory architectures such as Apple’s M series. It seems like it’s relying on using overlapping processes to help speed things up, but I would assume that having everything unified in main memory such that you don’t have to transfer everything back and forth to the GPU would also have some advantages. Can someone wiser explain this to me?