Cool. But abstract away the infra knowledge to the actual instance types. Instead I’d expect the polars cloud abstraction to find me the most cost effective (spot instance) that meets my cpu and memory reqs and disk reqs. Why do I have to give it — looking at the example — the AWS instance type?
You don't have to. Passing cpu and memory works as well.
We are working on a minimal cluster and auto-scaling based on the query.Nice!
Ritchie, curious you mentioned in other responses that the SQL context stuff is out of scope for now. But I thought the SQL things were basically syntactic sugar to the dataframes in other words they both “compile” down to the same thing. If true then being able to run arbitrary SQL queries should be doable out of the box?
Not right now. Our current SQLContext locally inspects schema's to convert the SQL to Polars LazyFrames (DSL).
However, this should happend during IR-resolving. E.g. the SQL should translate directly to Polars IR, and not LazyFrames. That way we can inspect/resolve all schema's server-side.
It requires a rewrite of our SQL translation in OSS. This should not be too hard, but it is quite some work. Work we eventually get to.
Thanks for the context.