I am not an expert on Spark RDDs, but AFAIK they are a more low-level data structure that offer resilience and a lower level map-reduce API.
Polars Cloud maps the Polars API/DSL to distributed compute. This is more akin to Spark's high level DataFrame API.
With regard to implementation, we create stages that run parts of Polars IR (internal representation) on our OSS streaming engine. Those stages run on 1 or many workers create data that will be shuffled in between stages. The scheduler is responsible for creating the distributed query plan and work distribution.
We have full iceberg read support. We have done some preliminary work for iceberg write support. I think we will ship that once we have decided which Catalog we will add. The iceberg write API is intertwined with that.
I am not an expert on Spark RDDs, but AFAIK they are a more low-level data structure that offer resilience and a lower level map-reduce API.
Polars Cloud maps the Polars API/DSL to distributed compute. This is more akin to Spark's high level DataFrame API.
With regard to implementation, we create stages that run parts of Polars IR (internal representation) on our OSS streaming engine. Those stages run on 1 or many workers create data that will be shuffled in between stages. The scheduler is responsible for creating the distributed query plan and work distribution.
Can you tell a little about the status of Iceberg write support? Partitioning, maintenance etc.
We have full iceberg read support. We have done some preliminary work for iceberg write support. I think we will ship that once we have decided which Catalog we will add. The iceberg write API is intertwined with that.