Hacker News

weinzierl 15 hours ago [ - ]

"For nearly thirty years, notepad.exe was the gold standard for a "dumb" utility which was a simple, win32-backed buffer for strings that did exactly one thing...display text."

Well, except that this did not prevent it from having embarrassing bugs. Google "Bush hid the facts" for an example. I'm serious, you won't be disappointed.

I think complexity is relative. At the time of the "Bush hid the facts" bug, nailing down Unicode and text encodings was still considered rocket science. Now this is a solved problem and we have other battles we fight.

usrbinbash 13 hours ago [ - ]

As funny as the "Bush hid the facts" bug may be, there is a world of difference between an embarassing mistake by a function that guesses the text encoding wrong, and a goddamn remote code execution with an 8.8 score

> and we have other battles we fight.

Except no, we don't. notepad.exe was DONE SOFTWARE. It was feature complete. It didn't have to change. This is not a battle that needed fighting, this was hitting a brick wall with ones fist for no good reason, and then complaining about the resulting pain.

MarleTangible 12 hours ago [ - ]

They also wanted to use the popularity of Notepad, so they replaced it with an AI bloatware version instead of creating a new app with extra features.

delecti 11 hours ago [ - ]

They didn't need to create a new app. At the same time that they started adding LLM garbage to Notepad, they discontinued WordPad.

https://en.wikipedia.org/wiki/Windows_Notepad#Change_in_deve... https://en.wikipedia.org/wiki/WordPad#Discontinuation

They likely knew nobody would be drawn to WordPad by the additions, so they had to scavenge their rapidly diminishing list of actually useful software for sacrifices on the altar to their outrageous AI investments.

Ntrails 9 hours ago [ - ]

How long were they threatening to kill snipping tool despite it being a perfectly serviceable piece of kit so we could switch to some shitty alternative?

d3Xt3r 2 hours ago [ - ]

They did ultimately kill it though - and then they re-created it as a bloated UWP version that is an insane 449 MEGABYTES in size! The old win32 Snipping Tool used to be only a few kilobytes...

Aachen 9 hours ago [ - ]

I would agree if it were RCE

This definition in the first paragraph on Wikipedia matches my understanding of it as a security consultant:

> The ability to trigger arbitrary code execution over a network (especially via a wide-area network such as the Internet) is often referred to as remote code execution (RCE or RCX). --https://en.wikipedia.org/wiki/Arbitrary_code_execution

Issues in handling local files, whether they require user interaction or not, are just that

Doesn't take away from the absurdity that notepad isn't a notepad but does extensive file contents parsing

mghackerlady 10 hours ago [ - ]

For a good built in "done" text editor, theres apples textedit. It's barely changed since NeXTSTEP and works flawlessly and is FOSS. As much as I hate apple there's a reason I have GNUstep installed on most of my *nix boxes

breppp 11 hours ago [ - ]

> Except no, we don't. notepad.exe was DONE SOFTWARE

While 8.8 score is embarrassing, by no measure notepad was done software. It couldn't load a large text file for one, its search was barely functional, had funky issues with encoding, etc.

Notepad++ is closer to what should be expected from an OS basic text editor

bsza 10 hours ago [ - ]

What counts as "large"? I'm pretty sure at some point in my life I'd opened the entirety of Moby Dick in Notepad. Unless you want to look for text in a binary file (which Notepad definitely isn't for) I doubt you'll run into that problem too often.

Also, I hope the irony of you citing Notepad++ [1] as what Notepad should aim to be isn't lost on you. My point being, these kinds of vulnerabilities shouldn't exist in a fucking text editor.

[1] https://notepad-plus-plus.org/news/hijacked-incident-info-up...

breppp 8 hours ago [ - ]

I know about the vulnerabilities in notepad++, however I was referring to the feature set.

Regarding large, I am referring to log files for example. I think the issue was lack of use of memory mapped files, which meant the entire file was loaded to RAM always, often giving the frozen window experience

vel0city 10 hours ago [ - ]

> What counts as "large"?

Remote into a machine that you're not allowed to copy data out of. You only have the utilities baked into Windows and whatever the validated CI/CD process put there. You need to open a log file that has ballooned to at least several hundred megabytes, maybe more.

Moby Dick is about 1MB of text. That's really not much compared to a lot of log files on pretty hot servers.

I do agree though, if we're going to be complaining about how a text editor could have security issues and pointing to Notepad++ as an example otherwise, its had its own share of notable vulnerabilities even before this update hijacking. CVE-2017-8803 had a code execution vulnerability on just opening a malicious file, this at least requires you to click the rendered link in a markdown file.

bsza 9 hours ago [ - ]

Oh right, generated files exist. Though logging systems usually have a rollover file size you can configure, should this happen to you in real life.

Honestly I'm okay with having to resort to power tools for these edge cases. Notepad is more for the average user who is less likely to run into 100 MB text files and more likely to run into a 2 kB text file someone shared on Discord.

vel0city 7 hours ago [ - ]

> Though logging systems usually have a rollover file size you can configure, should this happen to you in real life

I get what you're saying. But if things were done right I probably wouldn't have to be remoting into this box to hunt for a log file that wasn't properly being shipped to some other centralized logging platform.

Romario77 10 hours ago [ - ]

Notepad++ might be too much for a simple utility.

Plus for many years Word was one of the main cash cows for MS, so they didn't want to make an editor that would take away from Word.

And you could see how adding new things adds vulnerabilities. In this case they added ability to see/render markdown and with markdown they render links, which in this case allowed executing remote code when user clicks on a link.

breppp 7 hours ago [ - ]

> Plus for many years Word was one of the main cash cows for MS, so they didn't want to make an editor that would take away from Word.

Wordpad was the bundled rich text editor and was also a mess

I don't think an improved notepad could have cannibalized Word

vbezhenar 10 hours ago [ - ]

notepad.exe worked just fine.

Notepad++ is a monster software.

dspillett 13 hours ago [ - ]

> nailing down Unicode and text encodings was still considered rocket science. Now this is a solved problem

I wish…

Detecting text encoding is only easy if all you need to contend with is UTF16-with-BOM, UTF8-with-BOM, UTF8-without-BOM, and plain ASCII (which is effectively also UTF8). As soon as you might see UTF16 or UCS without a BOM, or 8-bit codepages other than plain ASCII (many apps/libs assume that these are always CP1252, a superset of the printable characters of ISO-8859-1, which may not be the case), things are not fully deterministic.

Thankfully UTF8 has largely won out over the many 8-bit encodings, but that leaves the interesting case of UTF8-with-BOM. The standard recommends against using it, that plain UTF8 is the way to go, but to get Excel to correctly load a UTF8 encoded CSV or similar you must include the BOM (otherwise it assumes CP 1252 and characters above 127 are corrupted). But… some apps/libs are completely unaware that UTF8-with-BOM is a thing at all so they load such files with the first column header corrupted.

Source: we have clients pushing & pulling (or having us push/pull) data back & forth in various CSV formats, and we see some oddities in what we receive and what we are expected to send more regularly than you might think. The real fun comes when something at the client's end processes text badly (multiple steps with more than one of them incorrectly reading UTF8 as CP1252, for example) before we get hold of it, and we have to convince them that what they have sent is non-deterministically corrupt and we can't reliably fix it on the receiving end…

josephg 13 hours ago [ - ]

> to get Excel to correctly load a UTF8 encoded CSV or similar you must include the BOM

Ah so that’s the trick! I’ve run into this problem a bunch of times in the wild, where some script emits csv which works on the developers machine but fails strangely with real world data.

Good to know there’s a simple solution. I hope I remember your comment next time I see this!

silon42 12 hours ago [ - ]

Excel CSV is broken anyway, since in some (EU, ...) countries it needs ; as separator.

OptionOfT 11 hours ago [ - ]

That's not an excel issue. That's a locale issue.

Due to (parts of?) the EU using then comma as the decimal separator, you have to use another symbol to separate your values.

dspillett 11 hours ago [ - ]

Comma for decimal separator, and point (or sometimes 'postraphy) for thousands separator if there is one, is very common. IIRC more European countries use that than don't, officially, and a bunch of countries outside Europe do too.

It wouldn't normally necessitate not using comma as the field separator in CSV files though, wrapping those values is quotes is how that would usually be handled in my experience.

Though many people end up switching to “our way”, despite their normal locale preferences, because of compatibility issues they encounter otherwise with US/UK software written naively.

anthk 10 hours ago [ - ]

Locales should have died long ago. You use plain data, stop parsing it depdending on wen your live. Plan9/9front uses where right long ago. Just use Unicode everywhere, use context-free units for money.

dspillett 9 hours ago [ - ]

Locales are fine for display, but yes they should not affect what goes into files for transfer. There have always been appropriate control characters in the common character sets, in ASCII and most 8-bit codepages there are non-printing control characters that have suitable meanings to be used in place of commas and EOL so they could be used unescaped in data fields. Numbers could be plain, perhaps with the dot still as a standard decimal point or we could store non-integers as a pair of ints (value and scale), dates in an unambiguous format (something like one of the options from ISO8601), etc.

Unfortunately people like CSV to be at least part way human-readable, which means readable delimiters, end-or-record markers being EOLs that a text editor would understand, and the decimal/thousand/currency symbols & date formatting that they are used to.

dspillett 11 hours ago [ - ]

A lot of the time when people say CSV they mean “character separated values” rather than specifically “comma separated values”.

In the text files we get from clients we sometimes see tab used instead of comma, or pipe. I don't think we've seen semicolon yet, though our standard file interpreter would quietly cope¹ as long as there is nothing really odd in the header row.

--------

[1] it uses the heuristic “the most common non-alpha-numeric non-space non-quote character found in the header row” to detect the separator used if it isn't explicitly told what to expect

7bit 13 hours ago [ - ]

The very fact that UTF-8 itself discouraged from using the BOM is just so alien to me. I understand they want it to be the last encoding and therefore not in need of a explicit indicator, but as it currently IS NOT the only encoding that is used, it makes is just so difficult to understand if I'm reading any of the weird ASCII derivatives or actual Unicode.

It's maddening and it's frustrating. The US doesn't have any of these issues, but in Europe, that's a complete mess!

dspillett 11 hours ago [ - ]

> The US doesn't have any of these issues

I think you mean “the US chooses to completely ignore these issues and gets away with it because they defined the basic standard that is used, ASCII, way-back-when, and didn't foresee it becoming an international thing so didn't think about anyone else” :)

capitainenemo 12 hours ago [ - ]

From wikipedia...

    UTF-8 always has the same byte order,[5] so its only use in UTF-8 is to signal at the start that the text stream is encoded in UTF-8...
    Not using a BOM allows text to be backwards-compatible with software designed for extended ASCII. For instance many programming languages permit non-ASCII bytes in string literals but not at the start of the file. ...
   A BOM is unnecessary for detecting UTF-8 encoding. UTF-8 is a sparse encoding: a large fraction of possible byte combinations do not result in valid UTF-8 text.

That last one is a weaker point but it is true that with CSV a BOM is more likely to do harm, than good.

g-b-r 12 hours ago [ - ]

Indeed, I've been using the BOM in all my text files for maybe decades now, those who wrote the recommendation are clearly from an English country

dspillett 11 hours ago [ - ]

> are clearly from an English country

One particular English-speaking country… The UK has issues with ASCII too, as our currently symbol (£) is not included. Not nearly as much trouble as non-English languages due to the lack of accents & such that they need, but we are still affected.

bsza 13 hours ago [ - ]

There is a difference between a bug you laugh at and walk away and a bug a scammer laughs at as he walks away with your money.

When I open something in Notepad, I don't expect it to be a possible attack vector for installing ransomware on my machine. I expect it to be text. It being displayed incorrectly is supposed to be the worst thing that could happen. There should be no reason to make Notepad capable of recognizing links, let alone opening them. Save that crap for VS Code or some other app I already know not to trust.

reyqn 14 hours ago [ - ]

Embarrassing bugs are not RCEs. Also the industry should be more mature now, not less. But move fast and break things, I guess...

sph 14 hours ago [ - ]

We have reached peak software stability, it's all gonna be downhill from here.

cookiengineer 12 hours ago [ - ]

Peak software stability was Windows 7, that's why it's still used in industrial environments.

trinix912 11 hours ago [ - ]

Funny how back then people claimed peak stability was Windows 2000. 10 years from now people will look at Windows 10 and claim that was peak stability.

fwgijcqywqeo 13 hours ago [ - ]

We are living in the future!

nuancebydefault 14 hours ago [ - ]

To be honest, the 'bush hid the facts' bug was funny and was not really a vulnerability that could be exploited, unless... you understood Chinese and the alternative text would manage to pursuade you to do something harmful.

In fact, those were the good days, when a mere affair with your secretary would be enough to jeopardize your career. The pendulum couldn't have swung more since.

egeozcan 13 hours ago [ - ]

> unless... you understood Chinese and the alternative text would manage to persuade you to do something harmful

Oh, here is the file I just saved... I see that it now tells me to rob a bank and donate the money to some random cult I'm just learning about.

Let me make a web search to understand how to contact the cult leader and proceed with my plan!

(luckily LLMs were not a thing back then :) )

Vinnl 15 hours ago [ - ]

https://en.wikipedia.org/wiki/Bush_hid_the_facts

13 hours ago [ - ]

[deleted]

g947o 15 hours ago [ - ]

I am pretty sure it's possible to fix that entire category of bugs without introducing RCE vulnerabilities.

14 hours ago [ - ]

[deleted]

jama211 15 hours ago [ - ]

Fascinating reading about that bug, thanks for sharing

croes 14 hours ago [ - ]

> Now this is a solved problem

Is that so? I ran pretty often in problems with programs having trouble with non-ANSI characters

direwolf20 15 hours ago [ - ]

It's not solved, we just don't have to guess the encoding any more because it's always UTF-8.