is this an attempt at nerd sniping? ;-)
on GPU databases sometimes we go up to the GB range per "item of work" (input permitting) as it's very efficient.
I need to add it to my TODO list to have a look at your github code...
is this an attempt at nerd sniping? ;-)
on GPU databases sometimes we go up to the GB range per "item of work" (input permitting) as it's very efficient.
I need to add it to my TODO list to have a look at your github code...
It definitely worked on myself :)
Do have a look, I've tried to roughly keep it small and readable. It's ~250 LOC effectively.
Also, this is CPU only. I'm not super sure what a good GPU version of my benchmark would be, though ... Maybe measuring a "map" more than a "reduction" like I do on the CPU? We should probably take a look at common chunking patterns there.