Hacker News

You can do row by row appends to a Feather (Arrow IPC — the naming is confusing). It works fine. The main problem is that the per-append overhead is kind of silly — it costs over 300 bytes (IIRC) per append.

I wish there was an industry standard format, schema-compatible with Parquet, that was actually optimized for this use case.

yencabulator 9 hours ago [ - ]

Creating a new record batch for a single row is also a huge kludge leading to lot of write amplification. At that point, you're better off storing rows than pretending it's columnar.

I actually wrote a row storage format reusing Arrow data types (not Feather), just laying them out row-wise not columnar. Validity bits of the different columns collected into a shared per-row bitmap, fixed offsets within a record allow extracting any field in a zerocopy fashion. I store those in RocksDB, for now.

https://git.kantodb.com/kantodb/kantodb/src/branch/main/crat...

amluto 7 hours ago [ - ]

> Creating a new record batch for a single row is also a huge kludge leading to lot of write amplification.

Sure, except insofar as I didn’t want to pretend to be columnar. There just doesn’t seem to be something out there that met my (experimental) needs better. I wanted to stream out rows, event sourcing style, and snarf them up in batches in a separate process into Parquet. Using Feather like it’s a row store can do this.

> kantodb

Neat project. I would seriously consider using that in a project of mine, especially now that LLMs can help out with the exceedingly tedious parts. (The current stack is regrettable, but a prompt like “keep exactly the same queries but change the API from X to Y” is well within current capabilities.)

yencabulator 6 hours ago [ - ]

Frankly, RocksDB, SQLite or Postgres would be easy choices for that. (Fast) durable writes are actually a nasty problem with lots of little detail to get just right, or you end up with corrupted data on restart. For example, blocks may be written out of order so on a crash you may end up storing <old_data>12_4, and if you trust all content seen in the file, or even a footer in 4, you're screwed.

Speaking as a Rustafarian, there's some libraries out there that "just" implement a WAL, which is all you need, but they're nowhere near as battle-tested as the above.

Also, if KantoDB is not compatible with Postgres in something that isn't utterly stupid, it's automatically considered a bug or a missing feature (but I have plenty of those!). I refuse to do bug-for-bug compatible and there's some stuff that are just better not implement in this millennia, but the intent is to make it be I Can't Believe It's Not Postgres, and to run integration tests against actual everyday software.

Also, definitely don't use KantoDB for anything real yet. It's very early days.

amluto 5 hours ago [ - ]

> Frankly, RocksDB, SQLite or Postgres would be easy choices for that. (Fast) durable writes are actually a nasty problem with lots of little detail to get just right, or you end up with corrupted data on restart. For example, blocks may be written out of order so on a crash you may end up storing <old_data>12_4, and if you trust all content seen in the file, or even a footer in 4, you're screwed.

I have a WAL that works nicely. It surely has some issues on a crash if blocks are written out of order, but this doesn’t matter for my use case.

But none of those other choices actually do what I wanted without quite a bit of pain. First, unless I wire up some kind of CDC system or add extra schema complexity, I can stream in but I can’t stream out. But a byte or record stream streams natively. Second, I kind of like the Parquet schema system, and I wanted something compatible. (This was all an experiment. The production version is just a plain database. Insert is INSERT and queries go straight to the database. Performance and disk space management are not amazing, but it works.)

P.S. The KantoDB website says “I’ve wanted to … have meaningful tests that don’t have multi-gigabyte dependencies and runtime assumptions“. I have a very nice system using a ~100 line Python script that fires up a MySQL database using the distro mysqld, backed by a Unix socket, requiring zero setup or other complication. It’s mildly offensive that it takes mysqld multiple seconds to do this, but it works. I can run a whole bunch of copies in parallel, in the same Python process even, for a nice, parallelized reproducible testing environment. Every now and then I get in a small fight with AppArmor, but I invariably win the fight quickly without requiring any changes that need any privileges. This all predates Docker, too :). I’m sure I could rig up some snapshot system to get startup time down, but that would defeat some of the simplicity of the scheme.