Hacker News

Just started this weekend on https://gitlab.com/get-otter/otter-sdk

Its dbt inspired stream ETL tool (or maybe just the TL?), it currently just has a dev mode that does RabbitMQ to local Parque files while I'm getting the core of it to a place I'm happy with.

It runs SQL models against the incoming messages and outputs the results to one or more output tables. Has a local WAL so you can tune it to have sensible sized output files (or not, if you need regular updates but at the expense of query perf.)

Planning on adding Protobuf messages, Kafka as a source and S3 and Iceberg tables as sinks this week.

Lightly inspired by a some projects at work where a lot of time and effort was spent doing this and resulted in something not very reusable without a lot of refactor work. Feel like the stream -> data lake pattern should be something that is just SQL + Config, same way dbt is for transformations within a data warehouse.

No plans on adding any cross message joins or aggregations as that would require cross worker communications and I explicitly want to keep the workers stateless (minus the WAL of course)

Would really appreciate any feedback on the core concept, esp. if this is something you'd actually use in prod (if it were finished!) Not sure if there is something that does this already that I don't know about, or if this genuinely fills some sort of hole in the exisitng tooling