Hacker News

codr7 3 days ago [ - ]

JSON parser libraries in general is a black hole of suffering imo.

They're either written with a different use case in mind, or a complex mess of abstractions; often both.

It's not a very difficult problem to solve if you only write exactly what you need for your specific use case.

mbac32768 3 days ago [ - ]

It's astonishing how involved a fucking modern JSON library becomes.

The once "very simple" C++ single-header JSON library by nlohmann is now

* 13 years old

* is still actively merging PRs (last one 5 hours ago)

* has 122 __million__ unit tests

Despite all this, it's self-admittedly still not the fastest possible way to parse JSON in C++. For that you might want to look into simdjson.

Don't start your own JSON parser library. Just don't. Yes you can whiteboard one that's 90% good enough in 45 minutes but that last 10% takes ten thousand man hours.

kstenerud 3 days ago [ - ]

I did write one, but I needed to because the already-written data must be recoverable on a crash (to be able to recover partially written files) since this is in a crash reporter - and also the encoder needs to be async-safe.

https://github.com/kstenerud/KSCrash/blob/master/Sources/KSC...

And yeah, writing a JSON codec sucks.

So I'm in the process of replacing it with a BONJSON codec, which has the same capabilities, is still async-safe and crash resilient, and is 35x faster with less code.

https://github.com/kstenerud/ksbonjson/blob/main/library/src...

codr7 3 days ago [ - ]

Yeah, but as long as I'm not releasing in public, I don't need to support 20 different ways of parsing.

That's the thing with reinventing wheels, a wheel that fits every possible vehicle and runs well in any possible terrain is very difficult to build. But when you know exactly what you need it's a different story.

vovavili 3 days ago [ - ]

I am very surprised to hear the unit testing statistic. What kind of unholy edge cases would JSON parsing require to make it necessary to cover 122 million variations?

kstenerud 3 days ago [ - ]

The more speed optimizations you put in, the gnarlier the new edge cases that pop up.

modeless 3 days ago [ - ]

This may say more about C++ than JSON

0x000xca0xfe 3 days ago [ - ]

The best language to handle unusual JSON correctly would probably be Python. It has arbitrary size integers, mpmath for arbitrary precision floats and good Unicode support.

zelphirkalt 2 days ago [ - ]

Many of the problems disappear when performance is not critical, because that opens up the options for many much nicer, much safer, and simpler languages and C/C++, to write a correct parser in.

typpilol 3 days ago [ - ]

122 million unit tests? What?

flohofwoe 3 days ago [ - ]

Most people don't need the remaining 10% but value a small and easy to maintain codebase (which nlohmann definitely isn't).

EasyMark 3 days ago [ - ]

Yeah I use this and I think most of friends do too :)

mbac32768 a day ago [ - ]

yeah it seems like every other C++ project uses it

fHr 3 days ago [ - ]

holy shit

forty 3 days ago [ - ]

Parsing JSON is a Minefield (2016)

https://seriot.ch/projects/parsing_json.html

codr7 3 days ago [ - ]

Not if I'm also the producer.

president_zippy 3 days ago [ - ]

Finally, I have found someone who understands the purpose of using someone else's tiny header-only C library; someone who sincerely thought about it before looking for an excuse to bitch and complain.

flohofwoe 3 days ago [ - ]

You can't get much more 'opinion-less' than this library though. Iterate over keys and array items, identify the value type and return string-slices.

IshKebab 3 days ago [ - ]

It also feels like only half the job to me. Reminds me of SAX "parsers" that were barely more than lexers.

flohofwoe 3 days ago [ - ]

I mean, what else is there to do when iterating over a JSON file? Delegating number parsing and UNICODE handling to the user can be considered a feature (since I can decide on my own how expensive/robust I want this to be).

skydhash 3 days ago [ - ]

That is what I like Common Lisp libraries. They are mostly about the algorithms, leaving data structures up to the user. So you make sure you got those rights before calling the function.

IshKebab 3 days ago [ - ]

Extracting the data into objects. Libraries like Serde and Pydantic do this for you. Hell the original eval() JSON loading method did that too.

meindnoch 3 days ago [ - ]

Then you lose the ability to do streaming.

IshKebab 3 days ago [ - ]

True, but usually you only need that if your data is so large it can't fit in memory and in that case you shouldn't be using JSON anyway. (I was in this situation once where our JSON files grew to gigabytes and we switched to SQLite which worked extremely well.)

meindnoch 3 days ago [ - ]

Actually, you'll hit the limits of DOM-style JSON parsers as soon as your data is larger than about half the available memory, since you'd most likely want to build your own model objects from the JSON, so at some point both of them must be present in memory (unless you're able to incrementally destroy those parts of the DOM that you're done with).

Anyhow, IMO a proper JSON library should offer both, in a layered approach. That is, a lower level SAX-style parser, on top of which a DOM-style API is provided as a convenience.

IshKebab 2 days ago [ - ]

> since you'd most likely want to build your own model objects from the JSON, so at some point both of them must be present in memory

Not really because the JSON library itself can stream the input. For example if you use `serde_json::from_reader()` it won't load the whole file into memory before parsing it into your objects:

https://docs.rs/serde_json/latest/serde_json/fn.from_reader....

But that's kind of academic; half of all memory and all memory are in the same league.

meindnoch 2 days ago [ - ]

That's only true if your model objects are serde structs, which is not desirable for a variety of reasons, most importantly because you don't want to tie your models to a particular on-disk format.

IshKebab 2 days ago [ - ]

In the vast majority of cases you can and should just load directly into Serde structs and use those directly. That's kind of the point.

In some minority of cases you might not want to do that (e.g. because you need to support multiple versions of a format), but that is rare and can also be handled in various ways directly in Serde.

TheRealPomax 3 days ago [ - ]

Anyone who claims "it's not a very difficult problem" hasn't actually had to solve that problem.

codr7 3 days ago [ - ]

Except I have, several times, with gopd results.

So in this case you're wrong.

General purpose is a different can of worms compared to solving a specific case.

patrickmay 3 days ago [ - ]

> JSON parser libraries in general is a black hole of suffering imo.

Sexprs sitting over here, hoping for some love.

codr7 2 days ago [ - ]

I still mourn the timeline where we got a real Lisp in the browser instead of the current abomination.

nicce 3 days ago [ - ]

The project advertises that it has zero-allocations with minimal state. I don’t think it is fair or our problems are very different. Single string, (the most used type), and you need an allocation.