Also, I learned that Hive-partitioned Parquet on S3 is much slower than on disk.
S3 is high latency unless you use for S3 Express Zones (the low latency version).
We used EFS (not EBS) and it was much faster.
Also, I learned that Hive-partitioned Parquet on S3 is much slower than on disk.
S3 is high latency unless you use for S3 Express Zones (the low latency version).
We used EFS (not EBS) and it was much faster.
Test out the nvme drives though. It’s blazing.