Hacker News

jameshart 2 days ago [ - ]

Roundtripping IEEE floating point values via conversion to decimal UTF-8 strings and back is a ridiculously fragile process, too, not just slow.

The difference between which values are precisely representable in binary and which are precisely representable in decimal means small errors can creep in.

jk-jeon 2 days ago [ - ]

A way to achieve perfect round-tripping was proposed back in 1990, by Steele and White (and likely they are not the first ones who came up with a similar idea). I guess their proposal probably wasn't extremely popular at least until 2000's, compared to more classical `printf`-like rounding methods, but it seems many languages and platforms these days do provide such round-tripping formatting algorithms as the default option. So, I guess nowadays roundtripping isn't that hard, unless people do something sophisticated without really understanding what they're doing.

ape4 a day ago [ - ]

"How to print floating-point numbers accurately" by Steele & White https://dl.acm.org/doi/10.1145/93548.93559

lifthrasiir 2 days ago [ - ]

I do think the OP was worrying about such people. Now a performant and correctly rounded JSON library is reasonably common, but it was not the case a decade ago (I think).

kccqzy 2 days ago [ - ]

Interesting! I didn't know about Steele and White's 1990 method. I did however remember the Burger and Dybvig's method from 1996.

gugagore 2 days ago [ - ]

You don't have to precisely represent the float in decimal. You just have to have each float have a unique decimal representation, which you can guarantee if you include enough digits: 9 for 32-bit floats, and 17 for 64-bit floats.

https://randomascii.wordpress.com/2012/02/11/they-sure-look-...

jameshart 2 days ago [ - ]

And you need to trust that whoever is generating the JSON you’re consuming, or will consume the JSON you generate, is using a library which agrees about what those representations round to.

jk-jeon 2 days ago [ - ]

Note that the consumer side doesn't really have a lot of ambiguity. You just read the number, compute its precise value as written, round it to the closest binary representation with banker's rounding. You do anything other than this only under very special circumstances. Virtually all ambiguity lies on the producer side, which can be cleared out by using any of the formatting algorithms with the roundtripping-guarantee.

EDIT: If you're talking about decimal->binary->decimal round-tripping, it's a completely different story though.

kccqzy 2 days ago [ - ]

JSON itself doesn't mandate that IEEE754 numbers be used.

Waterluvian 2 days ago [ - ]

This is one of those really common misunderstandings in my experience. Indeed JSON doesn’t encode any specific precision at all. It’s just a decimal number of any length you possibly want, knowing that parsing libraries will likely decode it into something like IEEE754. This is why libraries like Python’s json will let you give it a custom parser, if, say, you wanted a Decimal object for numbers.

MobiusHorizons 2 days ago [ - ]

Like it or not, but json data types are inherently linked to the primatives available in JavaScript. You can, of course write JSON that can’t be handled with the native types available in JavaScript, but the native parser will always deserialize to a native type. Until very recently all numbers were iee754 doubles in JavaScript, although arbitrary precision bignums do exist now. So the defacto precision limit of a number in JSON that needs to be compatible is an IEE754. If you control your clients you can do whatever you want though.

jameshart 2 days ago [ - ]

The standard definitely limits what precision you should expect to be handled.

But how JSON numbers are handled by different parsers might surprise you. This blog post actually does a good job of detailing the subtleties and the choices made in a few standard languages and libraries: https://github.com/bterlson/blog/blob/main/content/blog/what...

I think one particular surprise is that C# and Java standard parsers both use openAPI schema hints that a piece of data is of type ‘number’ to map the value to a decimal floating point type, not a binary one.

xxs 2 days ago [ - ]

>C# and Java standard parsers

Not sure which parser you consider standard, as Java doesn't have one at all (in the standard libraries). Other than that the existing ones just take the target type (not json) when they deserialize into, e.g. int, long, etc.

jameshart a day ago [ - ]

That blog post treats Jackson as the de facto standard Java JSON parser/formatter which seems reasonable.

xxs a day ago [ - ]

That's bit much - (unfortunately) the codebase uses at least 4 different json libraries (perhaps 5 if I consider one non-general purpose, personally written). gson is generally very popular as well. The blog post mentions BigDecimal and at that point, I'd not dare to trust it much.

The de-facto standard is similar to the expectation everyone uses spring boot.

MangoToupe a day ago [ - ]

> Like it or not, but json data types are inherently linked to the primatives available in JavaScript.

Presumably this is dependent on runtime. You certainly don't need to respect javascript (or any runtime's) parsers if you don't use javascript.

For instance, my current position explicitly uses arbitrary-precision decimals to deserialize json numbers.

jameshart 2 days ago [ - ]

Indeed - you could be serializing to or from JSON where the in-memory representation you're aiming for is actually a floating point decimal. JSON doesn't care.

kccqzy 2 days ago [ - ]

Most languages in use (such as Python) have solved this problem ages ago. Take any floating point value other than NaN, convert it to string and convert the string back. It will compare exactly equal. Not only that, they are able to produce the shortest string representation.

jameshart 2 days ago [ - ]

Maybe 'ridiculously fragile' is the wrong word. Perhaps 'needlessly' fragile would be better.

The point is that it takes application of algorithms that need to be provably correctly implemented on both ends of any JSON serialization/deserialization. And if one implantation can roundtrip its own floating point values, that's great - but JSON is an interop format, so does it roundtrip if you send it to another system and back?

It's just an unnecessary layer of complexity that binary floating point serializers do not have to worry about.

aardvark179 2 days ago [ - ]

True, but many of them have had bugs in printing or parsing such numbers, and once those creep in they can cause real long term problems. I remember having to maintain alternative datums and projections in GIS software because of a parser error that had been introduced in the late 80s.