I have a deep appreciation for DuckDB, but I am afraid the confluence of brilliant ideas makes it ever more complicated to adopt —- and DuckLake is another example for this trend.
When I look at SQLite I see a clear message: a database in a file. I think DuckDb is that, too. But it’s also an analytics engine like Polars, works with other DB engines, supports Parquet, comes with a UI, has two separate warehouse ideas which both deviate from DuckDB‘s core ideas.
Yes, DuckLake and Motherduck are separate entities, but they are still part of the ecosystem.
that's a valid concern!
However I'd like to point out that that is exactly the reason why DuckDB relies so heavily on its extension mechanism, even for features that some may consider to be "essential" for an analytical system. Take for example the parquet, json, and httpfs extensions. Also features like the UI you mention are isolated from core DuckDB by living in an extension.
I'd argue that core DuckDB is still very much the same lightweight, portable, no-dependency system that it started out as (and which was very much inspired by how effective SQLite is by being so).
Maybe some interesting behind-the-scenes: to further solidify core DuckDB and guard it from the complexity of its ever growing extension ecosystem, one of the big items currently on our roadmap (see https://duckdb.org/roadmap) is to make significant improvements to DuckDB's stable C extension API.
disclaimer: I work at DuckDB Labs ;)
Thank you for your thoughtful reply. The extension system makes great sense.
But it's also stuff like `"SELECT * FROM my_df"` – It's super cool but why is my database connecting to an in-memory pandas data frame? On the other hand, DuckDB can connect to remote Parquet files and interact with them without (explicitly) importing them.
In these examples, DuckDB feels more like an ephemeral SQL-esque Pandas/Polars alternative rather than a database.
Probably it's just me losing track of what a database is and we've evolved from "a monolithic and permanent thing that you store data on and read data from via queries".
i think "SELECT * FROM my_df" is a convenience from the python module and how tightly integrated it is, but i can’t get this to replicate using the cli or dbeaver or datagrip.
and yes, being able to layer analytical sql on top of your csv/json/parquet/gpx/arrow (but not xml?) is the massive appeal of duckdb for a variety of reasons. it’s a paradigm shift for me as an old timer but it’s also suited my needs quite well over the past few years