Hacker News

possible, seems this is done in other modes.

quote - google ai mode:

"DuckDB offers robust capabilities for querying data stored partially on S3, particularly when dealing with Parquet files. This is achieved through several optimization techniques:

Predicate Pushdown: When you apply a WHERE clause to filter data, DuckDB can "push down" this filter directly into the Parquet file scan. If the Parquet file contains zonemaps (metadata about value ranges within columns), DuckDB can use this information to skip reading entire sections of the file that do not contain relevant data, significantly reducing the amount of data transferred from S3.

Projection Pushdown: When you select only specific columns in your SELECT statement, DuckDB automatically reads only those required columns from the Parquet file. This means you avoid downloading and processing unnecessary data, leading to faster queries and reduced S3 transfer costs.

HTTP Range Reads: DuckDB leverages HTTP range headers when interacting with S3 (or other object storage supporting range reads). This allows it to fetch only the necessary parts of the Parquet file, such as metadata or specific column chunks, rather than downloading the entire file."

dav43 3 days ago [ - ]

lolive 3 days ago [ - ]

« How does it handle [multi-source] joins ? » is the obvious next question.

VHRanger 3 days ago [ - ]

In memory, and if larger than memory it makes .duckdbtmp files to work from

lolive 2 days ago [ - ]

So when you start doing crazy joins, a NVMe + big amount of RAM will really help DuckDB, in its work. Right?