JSON encoding is a huge impediment to interprocess communication in NodeJS.
Sooner or later is seems like everyone gets the idea of reducing event loop stalls in their NodeJS code by trying to offload it to another thread, only to discover they’ve tripled the CPU load in the main thread.
I’ve seen people stringify arrays one entry at a time. Sounds like maybe they are doing that internally now.
If anything I would encourage the V8 team to go farther with this. Can you avoid bailing out for subsets of data? What about the CString issue? Does this bring faststr back from the dead?
Based off of my first ever forays into node performance analysis last year, JSON.stringify was one of the biggest impediments to just about everything around performant node services. The fact that everyone uses stringify to for dict keys, the fact that apollo/express just serializes the entire response into a string instead of incrementally streaming it back (I think there are some possible workarounds for this, but they seemed very hacky)
As someone who has come from a JVM/go background, I was kinda shocked how amateur hour it felt tbh.
> JSON.stringify was one of the biggest impediments to just about everything around performant node services
That's what I experienced too. But I think the deeper problem is Node's cooperative multitasking model. A preemptive multitasking (like Go) wouldn't block the whole event-loop (other concurrent tasks) during serializing a large response (often the case with GraphQL, but possible with any other API too). Yeah, it does kinda feel like amateur hour.
That's not really a Node problem but a JavaScript problem. Nothing about it was built to support parallel execution like Go and other languages. That's why they use web workers, separate processes, etc. to make use of more than a single core. But then you'll probably be dependent on JSON serialization to send data between those event loops.
I'm not a Node dev, but I've done some work with JavaScript on the web. Why are they using JSON to send data between v8 isolates when sendMessage allows sending whole objects (using an implementation defined binary serialization protocol under the hood that is 5-10x faster)?
The biggest reason is most likely that they don't understand serialization, and simply think that objects are somehow being "sent" from one worker to another, which sounds like a cheap reference operation.
> That's not really a Node problem but a JavaScript problem.
Nowadays Node is JavaScript. They are the guys driving JavaScript standards and new features for a decade or so. Nothing prevents them from incrementally starting to add proper parallelism, multithreading, ...
I disagree - JavaScript is the most used programming language in the world, and were I a betting man I'd happily wager client JS is still a much bigger share of that than server.
Yes, there have been a lot of advances in modern JS for things like atomics and other fun memory stuff, but that's just the natrual progression for a language as popular as JS. The 3 main JS engines are still developed primarily for web browsers, and web developers are the primary audience considered in ES language discussions (although I'll concede that in recent years server runtimes have been considered more and more)
>I disagree - JavaScript is the most used programming language in the world, and were I a betting man I'd happily wager client JS is still a much bigger share of that than server.
Clients are 90% still v8, so hardly different than Node.
"Nothing prevents them from incrementally starting to add proper parallelism, multithreading, ..."
In principle perhaps not. In practice it is abundantly clear now from repeated experience that trying to retrofit such things on to a scripting language that has been single-threaded for decades is an extremely difficult and error-prone process that can easily take a decade to reach production quality, if indeed it ever does, and then take another decade or more to become something you can just expect to work, expect to find libraries that use properly, etc.
I don't think it's intrinsic to scripting languages. I think someone could greenfield one and have no more problems with multithreading than any other language. It's trying to put it into something that has been single-threaded for a decade or two already that is very, very hard. And to be honest, given what we've seen from the other languages that have done this, I'd have a very, very, very serious discussion with the dev team as to whether it's actually worth it. Other scripting languages have put a lot of work into this and it is not my perception that the result has been worth the effort.
> Yeah, it does kinda feel like amateur hour.
NodeJS is intended for IO-heavy workloads. Specifically, it's intended for workloads that don't benefit from parallel processing in the CPU.
This is because Javascript is strictly a single-threaded language; IE, it doesn't support shared access to memory from multiple threads. (And this is because Javascript was written for controlling a UI, and historically UI is all handled on a single thread.)
If you need true multithreading, there are plenty of languages that support it. Either you picked the wrong language, or you might want to consider creating a library in another language and calling into it from NodeJS.
> Based off of my first ever forays into node performance analysis last year, JSON.stringify was one of the biggest impediments to just about everything around performant node services
Just so. It is, or at least can be, the plurality of the sequential part of any Amdahl's Law calculation for Nodejs.
I'm curious if any of the 'side effect free' commentary in this post is about moving parts of the JSON calculation off of the event loop. That would certainly be very interesting if true.
However for concurrency reasons I suspect it could never be fully off. The best you could likely do is have multiple threads converting the object while the event loop remains blocked. Not entirely unlike concurrent marking in the JVM.
Node is the biggest impediment to performant Node services. The entire value proposition is "What if you could hire people who write code in the most popular programming language in the world?" Well, guess what
I'd say the value prop is you can share code (and with TS, types as well) between your web front end and back end.
That is useful, but you can achieve a similar benefit if you manage to spec out your api with openapi, and then generate the typescript api client. A lot of web frameworks make it easy to generate openapi spec from code.
The maintenance burden shifts from hand syncing types, to setting up and maintaining the often quite complex codegen steps. Once you have it configured and working smoothly, it is a nice system and often worth it in my experience.
The biggest benefit is not the productivity increase when creating new types, but the overall reliability and ease of changing stuff around that already exists.
100% this.
I’ve been doing this for a long time and have never once “shared code between front end and back end” but sharing types between languages is the sweet spot.
In my other comment in this tree I mentioned that with TypeScript you can do even better. You don't need codegen if you can import types from the back-end. OpenAPI is fine, but I really hate having an intermediary like that.
Just define your API interface as a collection of types that pull from your API route function definitions. Have the API functions pull types from your model layer. Transform those types into their post-JSON deserialization form and now you're trickling up schema from the database right into the client. No client to compile. No watcher to run. It's always in sync and fast to evaluate.
You're right of course, it is better without an intermediary. But only if you already are, can or want to use typescript in the backend. If you have good reasons to not do so, then those usually outweigh the cost of having to go through an intermediary codegen step. The tooling is often good enough.
Plus, openapi can be useful for other things as well: generating api documentation for example, mock servers or clients in multiple programming languages.
I'm not disagreeing with you, what is best always depends on context and also on the professional judgement of the one who is making the trade-offs. A certain perspective or even taste always slips into these judgement calls as well, which isn't invalid.
I haven't found this to pay off in reality as much as I'd hoped… have you?
Me neither, for multiple reasons, especially if you are bundling your frontend. Very easy to accidentally include browser-incompatible code, becomes a bit of a cat and mouse game.
On types, I think the real value proposition is having a single source of truth for all domain types but because there's a serialisation layer in the way (http) it's rarely that simple. I've fallen back to typing my frontend explicitly where I need to, way simpler and not that much work.
(basically as soon as you have any kind of context-specific serialisation, maybe excluding or transforming a field, maybe you have "populated" options in your API for relations, etc - you end up writing type mapping code between the BE and FE that tends to become brittle fast)
It's been very useful for specific things: unified, complicated domain logic that benefits from running faster than it would take to do a round trip to the server and back.
I've only rarely needed to do this. The two examples that stick in my mind are firstly event and calendar logic, and secondly implementing protocols that wrap webrtc.
Yes, incredible productivity gains from using a single language in frontend and backend.
That's Java's story.
A single language to rule them all: on the server, on the client, in the browser, in appliances. It truly was everywhere at some point.
Then people massively wish for something better and move to dedicated languages.
Put another way, for most shops the productivity gains and of having single languages are far from incredible, to being negatives in the most typical settings.
Java applets were never ubiquitous the same was JS is on the web though - there's literally a JS environment always available on every page unless the user explicitly disables it, which very few people do.
JS is here to stay as the main scripting language for the web which means there probably will be a place for node as a back end scripting language. A lot of back ends are relatively simple CRUD API stuff where using node is completely feasible and there are real benefits to being able to share type definitions etc across front end and back end
> there are real benefits to being able to share type definitions etc across front end and back end
There are benefits, but cons as well. As you point out, if the backend is only straight proxying the DB, any language will do so you might as well use the same as the frontend.
I think very few companies running for a few years still have backends that simple. At some point you'll want to hide or abstract things from the frontend. Your backend will do more and more processing, more validation, it will handle more and more domain specific logic (tax/money, auditing, scheduling etc). It becomes more and more of a beast on its own and you won't stay stuck with a language which's only real benefit is partially sharing types with the frontend.
Java never ran well on the desktop or in the browser (arguably it never truly ran in the browser at all), and it was an extremely low-productivity language in general in that era.
There is a significant gain from running a single language everywhere. Not enough to completely overwhelm everything else - using two good languages will still beat one bad language - but all else being equal, using a single language will do a lot better.
Yes, Java was never really good (I'd argue on any platform. Server side is fine, but not "really good" IMHO)
It made me think about the amount of work that went into JS to make it the powerhouse it is today.
Even in the browser, we're only able to do all these crazy things because of herculian efforts from Google, Apple and Firefox to optimize every corner and build runtimes that have basically the same complexity as the OS they run on, to the point we got Chrome OS as a side product.
From that POV, we could probably take any language, pour that much effort into it and make it a more than decently performing platform. Java could have been that, if we really wanted it hard enough. There just was no incentive to do so for any of the bigger players outside of Sun and Oracle.
> all else being equal, using a single language will do a lot better.
Yes, there will be specific cases where a dedicated server stack is more of a liability. I still haven't found many, tbh. In the most extreme cases, people will turn to platforms like Firebase, and throw money at the pb to completely abstract the server side.
It's useful for running things like Zod validators on both the client and server, since you can have realtime validation to changes that doesn't require pinging the server.
I’ve used the ability to import back end types into the front end to get a zero-cost no-file-watcher API validator.
My blog post here isn’t as good as it should be, but hopefully it gets the point across
https://danangell.com/blog/posts/type-level-api-client/
But that's only really relevant in the last layer, the backend-for-frontend pattern; as an organization or domain expands, more layers can be added. e.g. in my current job, the back-end is a SAP system that has been around for a long time. The Typescript API layer has a heap of annotations and whatnots on each parameter, which means it's less useful to be directly used in the front-end. What happens instead is that an OpenAPI spec is generated based on the TS / annotations, and that is used to generate an API client or front-end/simplified TS types.
TL;DR, this value prop is limited.
Nodejs will never be as bad as VB was.
I know Javascript is fast but...
I cannot go with such a messy ecosystem. I find Python highly preferrable for my backend code for more or less low and middle traffic stuff.
I know Python is not that good deployment-wise, but the language is really understandable, I have tools for every use case, I can easily provide bindings from C++ code and it is a joy to work with.
If on top of that, they keep increasing its performance, I think I will stick to it for lots of backend tasks (except for high performance, where I have lately been doing with C++ and Capnproto RPC for distributed stuff).
It comes down to language preferences. I find Python to be the worst thing computer science has to offer. No nested scoping in functions, variables leak through branches and loops due to lack of scopes, no classic for loops, but worst of all, installing python packages and frameworks never ever goes smoothly.
I would like to love Jupyter notebooks because Notebooks are great for prototyping, but Jupyter and Python plotting libs are so clunky and slow, I always have to fall back to Node or writing a web page with JS and svg for plotting and prototyping.
That depends entirely on what you measure:
- Rapid application development
VB was easier and quicker
- GUI development
At least on Windows, in my opinion, VB is still the best language ever created for that. Borland had a good stab at it with their IDEs but nothing really came close to VB6 in terms of speed and ease of development.
Granted this isn't JS's fault, but CSS et al is just a mess in comparison.
- Cross-platform development
You have a point there. VB6 was a different era though.
- Type safety
VB6 wins here again
- Readability
This is admittedly subjective, but I personally don't find idiomatic node.js code all that readable. VB's ALGOL-inspired roots aren't for everyone but if I personally don't mind Begin/End blocks.
- Consistency
JS has so many weird edge cases. That's not to say that VB didn't have its own quirks. However they were less numerous in my experience.
Then you have inconsistencies between different JS implementations too.
- Concurrency
Both languages fail badly here. Yeah node has async/await but i personally hate that design and, ultimately, node.js is still single-threaded at its core. So while JS is technically better, it's still so bad that I cannot justify giving it the win here.
- Developer familiarity
JS is used by more people.
- Code longevity
Does this metric even deserve a rebuttal given the known problem of Javascript framework churn? You can't even recompile any sizable 2 year old Javascript projects without running into problems. Literally every other popular language trumps Javascript in that regard.
- Developer tooling
VB6 came with everything you needed and worked from the moment you finished the VB Visual Studio install.
With node.js you have a plethora of different moving parts you need to manually configure just to get started.
---
I'm not suggesting people should write new software in VB. But it was unironically a good language for what it was designed for.
Node/JS isn't even a good language for its intended purpose. It's just a clusterfuck of an ecosystem. Even people who maintain core JS components know this -- which is what tooling is constantly being migrated to other languages like Rust and Go. And why some many people are creating businesses around their bespoke JS runtimes aiming to solve the issues that node.js create (and thus creating more problems due to ever-increasing numbers of "standards").
Literally the only redeemable factor of node.js is the network effect of everyone using it. But to me that feels more like Stockholm Syndrome than a ringing endorsement.
And if the best compliment you can give node.js is "it's better than this other ecosystem that died 2 decades ago" then you must realise yourself just how bad things really are.
> But to me that feels more like Stockholm Syndrome
Just an FYI, but Stockholm Syndrome isn't real. In general I agree with the intended point though, people just like what they are familiar with and probably have a bias for what they learned first or used longest.
More or less accidentally I turned a simple Excel spreadsheet into a sizable data management system. Once you learn where the bottlenecks are, it is surprising how fast VB is nowadays.
It already is worse. VB was, for all its shortcommings as a language, an insanelly productive development environment for what it was intended to.
Low bar!
> If anything I would encourage the V8 team to go farther with this.
That feels the wrong way to go. I would encourage the people that have this problem to look elsewhere. Node/V8 isn't well suited to backend or the heavier computational problems. Javascript is shaped by web usage, and it will stay like that for some time. You can't expect the V8 team to bail them out.
The Typescript team switched to Go, because it's similar enough to TS/JS to do part of the translation automatically. I'm no AI enthousiast, but they are quite good at doing idiomatic translations too.
> Node/V8 isn't well suited to backend
Node was literally designed to be good for one thing - backend web service development.
It is exceptionally good at it. The runtime overhead is tiny compared to the JVM, the async model is simple as hell to wrap your head around and has a fraction of the complexity of what other languages are doing in this space, and Node running on a potato of a CPU can handle thousands of requests per second w/o breaking a sweat using the most naively written code.
Also the compactness of the language is incredible, you can get a full ExpressJS service up and running, including auth, in less than a dozen lines of code. The amount of magic that happens is almost zero, especially compared to other languages and frameworks. I know some people like their magic BS (and some of the stuff FastAPI does is nifty), but Express is "what you see is what you get" by default.
> The Typescript team switched to Go, because it's similar enough to TS/JS to do part of the translation automatically.
The TS team switched to go because JS is horrible at anything that isn't strings or doubles. The lack of an int type hinders the language, so runtimes do a lot of work to try and determine when a number can be treated like an int.
JS's type system is both absurdly flexible and also limiting. Because JS basically allows you to do anything with types, Typescript ends up being one of the most powerful type systems that has seen mass adoption. (Yes other languages have more powerful type systems, but none of them have the wide spread adoption TS does).
If I need to model a problem domain, TS is an excellent tool for doing so. If I need to respond to thousands of small requests, Node is an excellent tool for doing so. If I need to do some actual computation on those incoming requests, eh, maybe pick another tech stack.
But for the majority of service endpoints that consist of "get message from user, query DB, reformat DB response, send to user"? Node is incredible at solving that problem.
> Node was literally designed to be good for one thing - backend web service development.
I don't think it was, at least not originally, But even if it was, that doesn't mean it actually is good, and certainly not for all cases.
> Node running on a potato of a CPU can handle thousands of requests per second w/o breaking a sweat using the most naively written code.
The parent comment is specifically about this. It breaks down at a certain point.
> you can get a full ExpressJS service up and running, including auth, in less than a dozen lines of code
Ease of use is nice for a start, but usually becomes technical debt. E.g., you can write a pretty small search algorithm, but it will perform terribly. Not a problem at the start. You can set up a service with a just a little bit of code in any major language using some framework. Heck, there are code free servers. But you will have to add more and more work-arounds as the application grows. There's no free lunch.
> The TS team switched to go because JS is horrible at anything that isn't strings or doubles.
They switched because V8 is too slow and uses quite a bit of memory. At least, that's what they wrote. But that was not what I wanted to address. I was trying to say that if you have to switch, Go is a decent option, because it's so close to JS/TS.
> But for the majority of service endpoints ...
Because they are simple, as you say. But when you run into problems, asking the V8 team to bail you out with a few more hacks doesn't seem right.
> Ease of use is nice for a start, but usually becomes technical debt.
The difference with the Express ecosystem is that you aren't getting any less power than with FastAPI or Spring Boot, you just get less overhead. Spring Boot has 10x the config to get the same endpoint up and running as Express, and FastAPI has at least 3x the magic. Now some of FastAPI's magic is really useful (auto converting pydantic types to JSON Schemas on endpoints, auto generating API docs, etc), but it is still magic compared to what Express gets you.
The scaling story of Node is also really easy to think about and do capacity planning for. You aren't worried about contention or IPC (as this thread has pointed out, if you are doing IPC in Node you are in for a bad time, so just don't!), your unit of scaling is the Node process itself. Throw it in a docker image, throw that in a k8s cluster, assign .25 CPU to each instance. Scale up and down as needed.
Sometimes having one really damn simple and easy to understand building block is more powerful than having 500 blocks that can be misconfigured in ten thousand different ways.
Same problem in Python. It'd be nice to have good/efficient IPC primitives with higher level APIs on top for common patterns
It's a major problem for JVM performance as well. Json encoding is simply fundamentally an expensive thing to do.
One thing that improves performance for the JVM that'd be nice to see in node realm is that JSON serialization libraries can stream out the serialization. One of the major costs of JSON is the memory footprint. Strings take up a LOT more space in memory than a regular object does.
Since the JVM typically only uses JSON as a communication protocol, streaming it out makes a lot of sense. The IO (usually) takes long enough to give a CPU reprieve while simultaneously saving memory.
Yeah. I think I've only ever found one situation where offloading work to a worker saved more time than was lost through serializing/deserializing. Doing heavy work often means working with a huge set of data- which means the cost of passing that data via messages scales with the benefits of parallelizing the work.
I think the clues are all there in the MDN docs for web workers. Having a worker act as a forward proxy for services; you send it a URL, it decides if it needs to make a network request, it cooks down the response for you and sends you the condensed result.
Most tasks take more memory in the middle that at the beginning and end. And if you're sharing memory between processes that can only communicate by setting bytes, then the memory at the beginning and end represents the communication overhead. The latency.
But this is also why things like p-limit work - they pause an array of arbitrary tasks during the induction phase, before the data expands into a complex state that has to be retained in memory concurrent with all of its peers. By partially linearizing you put a clamp on peak memory usage that Promise.all(arr.map(...)) does not, not just the thundering herd fix.
You mean:
> JSON encoding is a huge impediment to communication
I wonder how much computational overhead JSON'ing adds to communications at a global scale, in contrast to just sending the bytes directly in a fixed format or something far more efficient to parse like ASN.1.
No. Because painful code never gets optimized as much as less painful code. People convince themselves to look elsewhere, and an incomplete picture leads to local optima.
BTW, I often shorthand this as, "I'm not so much better [at optimization] as I am more stubborn."
The number of times I've gone in looking for 10%, backed out a bit and rearranged the code first to find 25%, better maintainability, and space for a feature that marketing has been bitching about for three years and development keeps insisting we cannot do in any reasonable amount of time? Probably averages out to at least three times per employer, which is a good number of miracles to perform.
> Sooner or later is seems like everyone gets the idea of reducing event loop stalls in their NodeJS code by trying to offload it to another thread, only to discover they’ve tripled the CPU load in the main thread.
Why not use structuredClone to communicate with the worker? So long as your object upholds all the rules you can pass it into postMessage directly.
Structured clone only gets you a new object in this heap, not a new object in an isolate. You’re still doing stringify/parse under the hood every time you sendMessage.
postMessage uses it transparently: https://developer.mozilla.org/en-US/docs/Web/API/Worker/post...
So I wonder if any of these improvements could be applied to structuredClone.
Benchmarks aren’t showing structuredClone favorably versus JSON round tripping. Likely because JSON-compatible data structures are less complex than clonable data. I suspect with this change JSON will now be faster than structuredClone.
> Benchmarks aren’t showing structuredClone favorably versus JSON round tripping.
Which ones?
Structured clone seems to be approximately the same as FF according to this (structuredClone is slightly faster on my machine, probably margin of error): https://measurethat.net/Benchmarks/Show/23052/0/structuredcl... (Linux: FF 141, Chromium 138).
So think about how that graph is going to look once the stringify part is twice as fast. Instead of being slightly faster it'll be statistically slower.
Now to just write the processing code in something that compiles to WebAssembly, and you can start copying and sending ArrayBuffers to your workers!
Or I guess you can do it without the WebAssembly step.
A JSON.toBuffer would be another good addition to V8. There are a couple code paths that look like they might do this, but from all accounts it goes Object->String->Buffer, and for speed you want to skip the intermediate.
I was actually imagining skipping the Object step; if you go from wire -> buffer, and only ever work on it in buffer form (i.e. in WebAssembly, in a language more amenable to working on buffers/bytes), you skip needing the Object -> JSON step. Notwithstanding whatever you need to do in the wire -> buffer step.
Working with bytes in JS is still gross and I wouldn’t wish it on my worst enemy. Like I said, if you had a way to automate it, it wouldn’t be so bad.
When it comes time to do serious work in Node, that's when you start using TypedArrays and SharedArrayBuffers and work with straight binary data. stringifying is mostly for toy apps and smaller projects.
If I took all the coworkers I’ve had in 30 years of coding whom I could really trust with the sort of bit fiddling array buffers, I could populate maybe two companies and everyone else would be fucked.
TypedArray is a toy. Very few domains actually work well with this sort of data. Statistics, yes. Games? Do all the time, but also games: are full of glitches used by speed runners, due to games playing fast and loose to maintain an illusion they are doing much more per second than they should be able to.
DataView is a bit better. I am endlessly amazed at how many times I managed to read people talk about TypedArrays and SharedByteArrays before I discovered that DataView exists and has existed basically forever. Somebody should have mentioned it a lot, lot sooner.
DataView has a lot of overhead. Just do it raw.