JSON parser libraries in general is a black hole of suffering imo.
They're either written with a different use case in mind, or a complex mess of abstractions; often both.
It's not a very difficult problem to solve if you only write exactly what you need for your specific use case.
It's astonishing how involved a fucking modern JSON library becomes.
The once "very simple" C++ single-header JSON library by nlohmann is now
* 13 years old
* is still actively merging PRs (last one 5 hours ago)
* has 122 __million__ unit tests
Despite all this, it's self-admittedly still not the fastest possible way to parse JSON in C++. For that you might want to look into simdjson.
Don't start your own JSON parser library. Just don't. Yes you can whiteboard one that's 90% good enough in 45 minutes but that last 10% takes ten thousand man hours.
I did write one, but I needed to because the already-written data must be recoverable on a crash (to be able to recover partially written files) since this is in a crash reporter - and also the encoder needs to be async-safe.
https://github.com/kstenerud/KSCrash/blob/master/Sources/KSC...
And yeah, writing a JSON codec sucks.
So I'm in the process of replacing it with a BONJSON codec, which has the same capabilities, is still async-safe and crash resilient, and is 35x faster with less code.
https://github.com/kstenerud/ksbonjson/blob/main/library/src...
https://github.com/kstenerud/ksbonjson/blob/main/library/src...
Yeah, but as long as I'm not releasing in public, I don't need to support 20 different ways of parsing.
That's the thing with reinventing wheels, a wheel that fits every possible vehicle and runs well in any possible terrain is very difficult to build. But when you know exactly what you need it's a different story.
I am very surprised to hear the unit testing statistic. What kind of unholy edge cases would JSON parsing require to make it necessary to cover 122 million variations?
The more speed optimizations you put in, the gnarlier the new edge cases that pop up.
This may say more about C++ than JSON
The best language to handle unusual JSON correctly would probably be Python. It has arbitrary size integers, mpmath for arbitrary precision floats and good Unicode support.
Many of the problems disappear when performance is not critical, because that opens up the options for many much nicer, much safer, and simpler languages and C/C++, to write a correct parser in.
122 million unit tests? What?
Most people don't need the remaining 10% but value a small and easy to maintain codebase (which nlohmann definitely isn't).
Yeah I use this and I think most of friends do too :)
yeah it seems like every other C++ project uses it
holy shit
Parsing JSON is a Minefield (2016)
https://seriot.ch/projects/parsing_json.html
Not if I'm also the producer.
Finally, I have found someone who understands the purpose of using someone else's tiny header-only C library; someone who sincerely thought about it before looking for an excuse to bitch and complain.
You can't get much more 'opinion-less' than this library though. Iterate over keys and array items, identify the value type and return string-slices.
It also feels like only half the job to me. Reminds me of SAX "parsers" that were barely more than lexers.
I mean, what else is there to do when iterating over a JSON file? Delegating number parsing and UNICODE handling to the user can be considered a feature (since I can decide on my own how expensive/robust I want this to be).
That is what I like Common Lisp libraries. They are mostly about the algorithms, leaving data structures up to the user. So you make sure you got those rights before calling the function.
Extracting the data into objects. Libraries like Serde and Pydantic do this for you. Hell the original eval() JSON loading method did that too.
Then you lose the ability to do streaming.
True, but usually you only need that if your data is so large it can't fit in memory and in that case you shouldn't be using JSON anyway. (I was in this situation once where our JSON files grew to gigabytes and we switched to SQLite which worked extremely well.)
Actually, you'll hit the limits of DOM-style JSON parsers as soon as your data is larger than about half the available memory, since you'd most likely want to build your own model objects from the JSON, so at some point both of them must be present in memory (unless you're able to incrementally destroy those parts of the DOM that you're done with).
Anyhow, IMO a proper JSON library should offer both, in a layered approach. That is, a lower level SAX-style parser, on top of which a DOM-style API is provided as a convenience.
> since you'd most likely want to build your own model objects from the JSON, so at some point both of them must be present in memory
Not really because the JSON library itself can stream the input. For example if you use `serde_json::from_reader()` it won't load the whole file into memory before parsing it into your objects:
https://docs.rs/serde_json/latest/serde_json/fn.from_reader....
But that's kind of academic; half of all memory and all memory are in the same league.
That's only true if your model objects are serde structs, which is not desirable for a variety of reasons, most importantly because you don't want to tie your models to a particular on-disk format.
In the vast majority of cases you can and should just load directly into Serde structs and use those directly. That's kind of the point.
In some minority of cases you might not want to do that (e.g. because you need to support multiple versions of a format), but that is rare and can also be handled in various ways directly in Serde.
Anyone who claims "it's not a very difficult problem" hasn't actually had to solve that problem.
Except I have, several times, with gopd results.
So in this case you're wrong.
General purpose is a different can of worms compared to solving a specific case.
> JSON parser libraries in general is a black hole of suffering imo.
Sexprs sitting over here, hoping for some love.
I still mourn the timeline where we got a real Lisp in the browser instead of the current abomination.
The project advertises that it has zero-allocations with minimal state. I don’t think it is fair or our problems are very different. Single string, (the most used type), and you need an allocation.