This may be hairsplitting, but emphasis on the handling word in that sentence. I would bet it's not so much the feature gap it's that the specification is very, very YOLO. And when any number of byte sequences happened to render as a PDF[1], then such a situation leads to a very, very diverse set of inputs that coincidentally work on some systems leading to "works on my machine" type outcomes. When your project is in the business of rendering things which happen to work in ${other system} it leads to a lot of very angry users
1: it's like the million monkeys with a million typewriters decided to go work at Adobe and they really have heard great things about mixed text and binary file formats because they are outstanding RCE and stack smashing opportunities
On the one hand, a lot of those are feature requests, not necessarily bugs. They also have more users, so they catch more edge case bugs like "Hebrew is rendered backwards" and so on.
On the other hand, PDF.js has been around for more than a decade. As it is a core component of Firefox, and viewing PDFs is an important part of name business applications, you'd think they'd have not nearly so many issues.
I know PostScript and PDFs are a nightmare, but no small part of me feels like this is yet another case of Mozilla underfunding development.
I found this and dropped their table into a sheet to get a a total of just over 2.3 billion.
Languages using right-to-left scripts https://share.google/lN5lmfjCoIW7ENvdZ
When you're not a Hebrew speaker and you don't interact with Hebrew script, and you're developing a PDF library by yourself, it's easy to not even realize that there is something about it that you've overlooked.
I don’t see dotancohene say anything about how surprising it is that that bug exists. They only say something about the judgment of it being an edge case.
This may be hairsplitting, but emphasis on the handling word in that sentence. I would bet it's not so much the feature gap it's that the specification is very, very YOLO. And when any number of byte sequences happened to render as a PDF[1], then such a situation leads to a very, very diverse set of inputs that coincidentally work on some systems leading to "works on my machine" type outcomes. When your project is in the business of rendering things which happen to work in ${other system} it leads to a lot of very angry users
1: it's like the million monkeys with a million typewriters decided to go work at Adobe and they really have heard great things about mixed text and binary file formats because they are outstanding RCE and stack smashing opportunities
On the one hand, a lot of those are feature requests, not necessarily bugs. They also have more users, so they catch more edge case bugs like "Hebrew is rendered backwards" and so on.
On the other hand, PDF.js has been around for more than a decade. As it is a core component of Firefox, and viewing PDFs is an important part of name business applications, you'd think they'd have not nearly so many issues.
I know PostScript and PDFs are a nightmare, but no small part of me feels like this is yet another case of Mozilla underfunding development.
In any case, there are more native speakers of right-to-left languages then there are native English speakers.
I found this and dropped their table into a sheet to get a a total of just over 2.3 billion. Languages using right-to-left scripts https://share.google/lN5lmfjCoIW7ENvdZ
When you're not a Hebrew speaker and you don't interact with Hebrew script, and you're developing a PDF library by yourself, it's easy to not even realize that there is something about it that you've overlooked.
In any case, Hebrew is handled differently from other RTL languages, hence this bug report in PDF.js: https://github.com/mozilla/pdf.js/issues/20097
Almost certainly something a smaller team would have never caught, given that a team with a massive name behind them didn't.
I don’t see dotancohene say anything about how surprising it is that that bug exists. They only say something about the judgment of it being an edge case.
I don't think there's an indication that this affects all RTL languages, just Hebrew word order in selections.
Specifically a right edge case
Nice!