Hacker News

An int will be 32 bits on any non-ancient platform, so this means, for each of those lines:

- a JSON file with nested values exceeding 2 billion depth

- a file with more than 2 billion lines

- a line with more than 2 billion characters

The depth is 32 bit, not the index into the file.

If you are nesting 2 Billion times in a row ( at minimum this means repeat { 2 billion times followed by a value before } another 2 billion times. You have messed up.

You have 4GB of "padding"...at minimum.

You file is going to be Petabytes in size for this to make any sense.

You are using a terrible format for whatever you are doing.

You are going to need a completely custom parser because nothing will fit in memory. I don't care how much RAM you have.

Simply accessing an element means traversing a nested object 2 billion times in probably any parser in the world is going to take somewhere between minutes and weeks per access.

All that is going to happen in this program is a crash.

I appreciate that people want to have some pointless if(depth > 0) check everywhere, but if your depth is anywhere north of million in any real world program, something messed up a long long time ago, never mind waiting until it hits 2 billion.

account42 a day ago [ - ]

> I appreciate that people want to have some pointless if(depth > 0) check everywhere

An after the fact check would be the wrong way to deal with UB, you'd need to check for < INT_MAX before the increment in order to avoid it.

3 days ago [ - ]

[deleted]

ranger_danger 3 days ago [ - ]

What is your definition of non-ancient? There are still embedded systems being produced today that don't have 32-bit integers.

account42 a day ago [ - ]

And those will need careful review of any code you want to run on them because no one cares about your weird architecture nor should they have to.

ranger_danger a day ago [ - ]

I wouldn't call 8 or 16-bit microcontrollers (with no concept of a 32-bit int) that are in billions of devices "weird". But ok.

klysm 3 days ago [ - ]

2 billion characters seems fairly plausible to hit in the real world

ricardobeat 3 days ago [ - ]

In a single line. Still not impossible, but people handling that amount of data will likely not have “header only and <150 lines” as a strong criteria for choosing their JSON parsing library.

naasking 3 days ago [ - ]

2GB in a single JSON file is definitely an outlier. A simple caveat when using this header could suffice: ensure inputs are less than 2GB.

layer8 3 days ago [ - ]

Less than INT_MAX, more accurately. But since the library contains a check when decreasing the counter, it might as well have a check when increasing the counter (and line/column numbers).

EasyMark 3 days ago [ - ]

Or fork and make a few modifications to handle it? I have to admit I haven't looked at the code to see if this particular code would allow for that.

jeroenhd 3 days ago [ - ]

I've seen much bigger, though technically that wasn't valid json, but rather structured logging with JSON on each line. On the other hand, I've seen exported JSON files that could grow to such sizes without doing anything weird, just nothing exceeding a couple hundred megabytes because I didn't use the software for long enough.

Restricting the input to a reasonable size is an easy workaround for sure, but this limitation isn't indicated everywhere, so anyone deciding to consume this random project into their important code wouldn't know to defend against such situation.

In a web server scenario, 2GiB of { (which would trigger two overflows) in a compressed request would require a couple hundred kilobytes to two megabytes, depending on how old your server software is.

3 days ago [ - ]

[deleted]

account42 a day ago [ - ]

To be fair, anyone who uses a 150 line library without bothering to read it deserves what they get.

And in the spirit of your profile text I'm quite glad for such landmines being out there to trip up those that do blindly ingest all code they can find.

maleldil 3 days ago [ - ]

Not really. I deal with this everyday. If the library has a limit on the input size, it should mention this.

johnisgood 3 days ago [ - ]

It is ~150 lines of code. Submit a PR, or when you git clone it add your checks, or stop complaining because the author does not owe you anything.

naasking 3 days ago [ - ]

If you deal with this every day, you're an outlier.

xigoi 3 days ago [ - ]

For such big data, you should definitely be using an efficient format, not JSON.

klysm 2 days ago [ - ]

I agree, but 2GB json files absolutely exist. It fits in ram easily

layer8 3 days ago [ - ]

All very possible on modern platforms.

Maybe more importantly, I won’t trust the rest of the code if the author doesn’t seem to have the finite range of integer types in mind.

johnisgood 3 days ago [ - ]

Personally, all my C code is written with SEI C Coding Standard in mind.