Their product looks promising. It looks like the PostGres schema and writes have to be "Iceberg-aware": special work to get around the fact that a small write results in a new, small Parquet file. That's not the end of the world - but perhaps ideally, you wouldn't be aware of Iceberg much at all when using PostGres. That might be a dream though.
Fully using PostGres without awareness of Iceberg would require full decoupling, and a translation layer in between (Debezium, etc). That comes with its own problems.
So perhaps some intimacy between the PostGres and Iceberg schemas is a good thing - especially to support transparent schema evolution.
DuckLake and CrunchyBridge both support SQL queries on the backing Iceberg tables. That's a good option. But a big part of the value of Iceberg comes in being able to read using Spark, Flink, etc.
I'd argue the bigger value is keeping the data in one storage place and bringing the compute to it. Works especially well for Big Corp use cases where entire divisions of the corp go their own way. Throw in M&A activity and it is a good hedge for the unknown (I.e you might be an Databricks and Azure shop and you just bought a Snowflake & AWS company). Keep the data in an open table format, and let everyone query using their preferred engine to their hearts desire.
Their product looks promising. It looks like the PostGres schema and writes have to be "Iceberg-aware": special work to get around the fact that a small write results in a new, small Parquet file. That's not the end of the world - but perhaps ideally, you wouldn't be aware of Iceberg much at all when using PostGres. That might be a dream though.
Fully using PostGres without awareness of Iceberg would require full decoupling, and a translation layer in between (Debezium, etc). That comes with its own problems.
So perhaps some intimacy between the PostGres and Iceberg schemas is a good thing - especially to support transparent schema evolution.
DuckLake and CrunchyBridge both support SQL queries on the backing Iceberg tables. That's a good option. But a big part of the value of Iceberg comes in being able to read using Spark, Flink, etc.
I'd argue the bigger value is keeping the data in one storage place and bringing the compute to it. Works especially well for Big Corp use cases where entire divisions of the corp go their own way. Throw in M&A activity and it is a good hedge for the unknown (I.e you might be an Databricks and Azure shop and you just bought a Snowflake & AWS company). Keep the data in an open table format, and let everyone query using their preferred engine to their hearts desire.
There's two problems being discussed in this article and thread:
1) Combining OLTP and OLAP databases into one system
2) Using an open data format to be able to read/write from many system (OLTP/PostGres, analytics engine/Spark)
> I'd argue the bigger value is keeping the data in one storage place and bringing the compute to it.
Yes, I agree with you. This observation is the idea behind #2, and why Iceberg has so much momentum now.