These days you could probably build a pretty performant numpy like using shared memory with Arrow format and IPC for control. Though it would be considerably more complex and not at all easier than FFI...