2 billion characters seems fairly plausible to hit in the real world

In a single line. Still not impossible, but people handling that amount of data will likely not have “header only and <150 lines” as a strong criteria for choosing their JSON parsing library.

2GB in a single JSON file is definitely an outlier. A simple caveat when using this header could suffice: ensure inputs are less than 2GB.

Less than INT_MAX, more accurately. But since the library contains a check when decreasing the counter, it might as well have a check when increasing the counter (and line/column numbers).

Or fork and make a few modifications to handle it? I have to admit I haven't looked at the code to see if this particular code would allow for that.

I've seen much bigger, though technically that wasn't valid json, but rather structured logging with JSON on each line. On the other hand, I've seen exported JSON files that could grow to such sizes without doing anything weird, just nothing exceeding a couple hundred megabytes because I didn't use the software for long enough.

Restricting the input to a reasonable size is an easy workaround for sure, but this limitation isn't indicated everywhere, so anyone deciding to consume this random project into their important code wouldn't know to defend against such situation.

In a web server scenario, 2GiB of { (which would trigger two overflows) in a compressed request would require a couple hundred kilobytes to two megabytes, depending on how old your server software is.

[deleted]

To be fair, anyone who uses a 150 line library without bothering to read it deserves what they get.

And in the spirit of your profile text I'm quite glad for such landmines being out there to trip up those that do blindly ingest all code they can find.

Not really. I deal with this everyday. If the library has a limit on the input size, it should mention this.

It is ~150 lines of code. Submit a PR, or when you git clone it add your checks, or stop complaining because the author does not owe you anything.

If you deal with this every day, you're an outlier.

For such big data, you should definitely be using an efficient format, not JSON.

I agree, but 2GB json files absolutely exist. It fits in ram easily