Love it!

Still don't get why one of the biggest player in the space, Databricks is overinvesting in Spark. For startups, Polars or DuckDB are completely sufficient. Other companies like Palantir already support bring your own compute.

That's a good question! Especially after Frank McSherry's COST paper [1], it's hard to imagine where the sweet spot for Spark is. I guess for Databricks it makes sense to push Spark, since they are the ones who created it. In a way, it's their competitive advantage.

[1]: https://www.usenix.org/system/files/conference/hotos15/hotos...

Databricks is targeting large enterprises, who have a variety of users. Having both Python and SQL as first class languages is a selling point.