Hacker News

We use Apache Arrow at my company and it's fantastic. The performance is so good. We have terabytes of time-series financial data and use arrow to store it and process it.

kccqzy 15 hours ago [ - ]

We use Apache Arrow at my company too. It is part of a migration from an old in-house format. When it works it’s good. But there are just way too many bugs in Arrow. For example: a basic arrow computation on strings segfaults because the result does not fit in Arrow’s string type, only the large string type. Instead of casting it or asking the user to cast it, it just segfaults. Another example: a different basic operation causes an exception complaining about negative buffer sizes when using variable-length binary type.

thinkharderdev 14 hours ago [ - ]

This will obviously depend on which implementation you use. Using the rust arrow-rs crate you at least get panics when you overflow max buffer sizes. But one of my enduring annoyances with arrow is that they use signed integer types for buffer offsets and the like. I understand why it has to be that way since it's intended to be cross-language and not all languages have unsigned integer types. But it does lead to lots of very weird bugs when you are working in a native language and casting back and forth from signed to unsigned types. I spent a very frustrating day tracking down this one in particular https://github.com/apache/datafusion/issues/15967

dionian 15 hours ago [ - ]

stumbled upon it recently while optimizing parquet writes. It worked flawlessly and 10-20x'd my throughput