Hacker News

> There's so many moving parts here.

Yep. At the scope of a single table, append-only history is nice but you're often after a clone of your source table within Iceberg, materialized from insert/update/delete events with bounded latency.

There are also nuances like Postgres REPLICA IDENTITY and TOAST columns. Enabling REPLICA IDENTITY FULL amplifies you source DB WAL volume, but not having it means your CDC updates will clobber your unchanged TOAST values.

If you're moving multiple tables, ideally your multi-table source transactions map into corresponding Iceberg transactions.

Zooming out, there's the orchestration concern of propagating changes to table schema over time, or handling tables that come and go at the source DB, or adding new data sources, or handling sources without trivially mapped schema (legacy lakes / NoSQL / SaaS).

As an on-topic plug, my company tackles this problem. Postgres => Iceberg is a common use case.

[0] https://docs.estuary.dev/reference/Connectors/materializatio...

gjvc 3 days ago [ - ]

can you explain this please "not having it means your CDC updates will clobber your unchanged TOAST values" ?

sgarland 3 days ago [ - ]

They’re referring to this: https://debezium.io/blog/2019/10/08/handling-unchanged-postg...

gunnarmorling 3 days ago [ - ]

Funny timing, just took a fresh look at this topic in this new post earlier this week: https://www.morling.dev/blog/backfilling-postgres-toast-colu....

lmz 3 days ago [ - ]

This may be helpful for you https://clickhouse.com/docs/integrations/clickpipes/postgres...