I use duckdb HEAVILY at work and it's been a game changer. I'm sifting through terabytes of data multiple times a day, mixing, matching, updating, filtering, DuckDB is second to none. For anyone that hasn't used it: you are missing out.
I use duckdb HEAVILY at work and it's been a game changer. I'm sifting through terabytes of data multiple times a day, mixing, matching, updating, filtering, DuckDB is second to none. For anyone that hasn't used it: you are missing out.
This may be useful for somebody: We are also using DuckDB heavily at my workplace (we do Tax analytics of very large companies with huge amounts of data). We have certain DuckDB processes that happened in AWS infrastructure, where the data is saved in GP3 disks.
We didn't know that for GP3 disks, you can increase not only IOPS but also Read/Write Throughput [1] which by default is 125 MB/s. So by default we were not seeing the performance we expected.
Once we increased the throughput of the EBS, it was amazing. So if you are not seeing the performance you read about online when using DuckDB, it may be something like that.
[1] https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-p...
This seems crazy low to me. AWS has default 3K IOPS and 125 MB/s throughput, meanwhile my Macbook Pro has 700K IOPS and 14.5GB/s throughput.
Is Amazon running on super outdated legacy networking?
SAN vs local. Local NVME (“instance storage“) on AWS is wicked fast too, but live and dies with the instance