[flagged]

> high-performance data processing tools in JS

I may be naive in asking this, but what leads someone to building high perf data tools in JS? JS doesn't seem to me like it would be the tool of choice for such things

I have a SaaS project where the backend is in JS. I also have some data processing to do with large file (several TB). Doing it is in JS is more convenient as I can reuse code from the backend, and it is also the language I know best.

Performance-wise, I get about half the throughput I had with the same processsing done it rust, which doesn't change anything for my use-case.

However that's not really relevant to the context of the post as I'm using node.js streams which are both saner and fast. I'm guessing that the post is relevant to people using server-side runtimes that only implement web streams.

Browsers are now able to stream files from disk so you can create a high performance tool that'll read locally, do [x] with it and present the results, all without any network overhead.

You don't always have a choice on where you deliver your software. It'd be nice to have good tools wherever you are forced to work.

Browsers

Since when are browsers themselves built in JavaScript? Mainstream, fast ones?

Clarification - in the past when I've written high performance data tools in JS, it was almost entirely to support the use case of needing it to run in a browser. Otherwise, there are indeed more suitable environments available.

To your question, I was about to point out Firefox[1], but realized you clarified 'mainstream'[2]...

[1] https://briangrinstead.com/blog/firefox-webcomponents

[2] https://gs.statcounter.com/browser-market-share

[deleted]

but instead of trying to solve that, this api is just like “too hard no one uses it let’s forget about it”.

right now when i need to wrangle bytes, i switch languages to Golang. it’s easy gc language, and all its IO is built around BYOB api:

interface Reader { read(b: Uint8Array): [number, Error?] }

you pass in your own Uint8Array allocation (in go terms, []byte), the reader fills at most the entire thing, and returns (bytes filled, error). it’s a fully pull stream API with one method at its core. now, the api gets to be that simple because it’s always sync, and blocks until the reader can fill data into the buffer or returns an error indicating no data available right now.

go has a TeeReader with no buffering - it too just blocks until it can write to the forked stream.

https://pkg.go.dev/io#TeeReader

we can’t do the same api in JS, because go gets to insert `await` wherever it wants with its coroutine/goroutine runtime. but we can dream of such simplicity combined with zero allocation performance.