This has come up multiple times before [1], and more generally it's come up hundreds of times with Unix style tools in general. It's always been a stupid idea for every tool to have its own barely documented file format.

This wouldn't be an issue if patches were XML or JSON with a well defined schema, but everything must be a boutique undocumented format in the world of Unix tools.

Maybe the worst part about this is that it can entirely come from a patch being exported by git and then imported straight back in to git. If you can't even handle your own undocumented format then what hope do other tools have that want to work with it?

[1]: https://mas.to/@zekjur/116022397626943871

While patch[0] has problems, the issue here is not that it is undocumented.

Git recently added this doc on roundtripping, and the problem is with git.

     Any line that is of the form:
     * three-dashes and end-of-line, or
     * a line that begins with "diff -", or
     * a line that begins with "Index: "

     is taken as the beginning of a patch, and the commit log message is terminated before the first occurrence of such a line.

The patch isn't even the complicated forms with RCS, ClearCase, Perforce, or SCCS support, it is just doing what the pre-POSIX spec says.

The argument is if git should do input sanitation etc...

But `patch -p1` is doing exactly what was documented, even in the original Larry Wall usenet post of the program.

[0] https://pubs.opengroup.org/onlinepubs/9799919799/utilities/p... [1] https://github.com/git/git/blob/94f057755b7941b321fd11fec1b2...

> This wouldn't be an issue if patches were XML or JSON with a well defined schema, but everything must be a boutique undocumented format in the world of Unix tools.

Patch files are readable by humans. Replacing them with XML or JSON would fix this problem, but at the expense of removing a core feature.

If, by "readable by humans", you mean "it would reliably fool humans as well", I'd say it's an ambiguity bug regardless of whether it's "a core feature" or not. A patch format, human-readable or not, should clearly indicate which part is the commit message and which part is an actual diff; it's not the case here.

Alright, allow me to disambiguate in your preferred format.

  <?xml version="1.0" encoding="UTF-8"?> <claims> <claims_I_did_not_make description='Claims that I did not make or defend.'> <claim>Patch is perfect.</claim> <claim>Ambiguity is good.</claim> <claim>There are no better formats for conveying patches.</claim> </claims_I_did_not_make> <claims_I_did_make description='What I actually said.'> <claim>Patch files are readable by humans.</claim> <claim>Being readble by humans is useful.</claim> <claim>XML is painful for humans to read and write.</claim> <claim>JSON is painful for humans to read and write.</claim> <claim caveat='Actually this would require all parties to handle JSON or XML correctly which on further reflection I am not sure about. Still, it is a claim I initially made.'>JSON or XML would actually fix this problem in the format.</claim> </claims_I_did_make> <claims_I_did_not_make_but_am_open_to description='Things that were never specified but that I do not actually disagree with.'> <claim>The patch format could be improved.</claim> <claim>Formats should be unambiguous.</claim> <claim>Separating sections is good.</claim> </claims_I_did_not_make_but_am_open_to> </claims>

that's not the preferred format for writing XML, this is:

    <?xml version="1.0" encoding="UTF-8"?>
    <claims>
      <claims_I_did_not_make description='Claims that I did not make or defend.'>
        <claim>Patch is perfect.</claim>
        <claim>Ambiguity is good.</claim>
        <claim>There are no better formats for conveying patches.</claim>
      </claims_I_did_not_make>
      <claims_I_did_make description='What I actually said.'>
        <claim>Patch files are readable by humans.</claim>
        <claim>Being readble by humans is useful.</claim>
        <claim>XML is painful for humans to read and write.</claim>
        <claim>JSON is painful for humans to read and write.</claim>
        <claim caveat='Actually this would require all parties to handle JSON or XML correctly which on further reflection I am not sure about. Still, it is a claim I initially made.'>JSON or XML would actually fix this problem in the format.</claim>
      </claims_I_did_make>
      <claims_I_did_not_make_but_am_open_to description='Things that were never specified but that I do not actually disagree with.'>
        <claim>The patch format could be improved.</claim>
        <claim>Formats should be unambiguous.</claim>
        <claim>Separating sections is good.</claim>
      </claims_I_did_not_make_but_am_open_to>
    </claims>

What I posted is valid XML. And even prettified, it's a pain to read.

it was valid but not the way XML is written or read by humans which is what we are discussing. how much of a pain it is to read is a matter of taste. i won't deny that. but XML can be made more readable without fail because it is a structured format. i would not have been ale to reformat a patch text the way i reformatted this XML example. XML is also more powerful. it could handle word based changes, as opposed to patch which can only do line based changes. same goes for JSON. patch could potentially be improved, but i don't see how it could handle word based changes without extra syntax to mark line breaks.

That's really not that bad, especially with indentation and color coding. You're kind of cheating by putting it into HN, which is terrible for code.

> XML is painful for humans to read and write.

Speaking of claims no-one made; no-one's talking about writing patch files by hand.

If that's good enough to be human readable than patch is even better.

People do write patch files be hand.

More commonly, edit them.

Haha, good one. Much like Makefiles, patch format precedes a lot of more modern things (by decades!) and is good enough to stick around. Unlike Makefiles, I've never seen tool gain any acceptance at all to replace patch.

And a lot of these older tools are not meant to be fed untrusted, unvetted input. The patch shown there confused me for quite a bit.

Or, more snarky: tee is also a huge security problem if you pipe untrusted input into `tee -a /etc/passwd`, such as `curl | tee -a /etc/passwd`. Not many things are safe with a `curl |` in front of them. I think yes might be?

This is where I kind-of like the idea of PowerShell, it's just that I dislike almost all other aspects of it and around it.

Same - psh has one good idea and it’s this. The next evolution of shells needs to include it.

can either of you elaborate what you mean? are you talking about support for structured data passing between scripts/programs?

Yes - https://devblogs.microsoft.com/scripting/working-with-json-d...

Tons of bugs in scripting in Unix come from the fact that data and metadata are interspersed in the same stream (you can mitigate somewhat with stderr vs stdout but hardly anyone does). Examples include things like trying to handle random filenames from * expansions.

It’s a bit more annoying to deal with sometimes, but for actual scripts it’s much more foolproof.

xargs is one of the programs that is designed to work around the original issue.

Yes, structured data between scripts and programs. No xargs, tee, awk, sed, grep mangling. No "argument list too long" errors.

So many problems are avoided, but at the same time the Windows ecosystem is just so far from providing an properly usable terminal experience. Things are still really not designed to be used from PowerShell.

right, see my response to the sibling comment.

> Maybe the worst part about this is that it can entirely come from a patch being exported by git and then imported straight back in to git.

No one wants to apply diffs in commit messages. But some people use this technique via email:

    Finally fix it

    ---

    Changes in v2:

    - Proper formatting
    - Remove irrelevant typo fix
They’ve used the `---` commit message delimiter in the commit message itself so that everything after it won’t be applied by git-am(1). So that’s intentional loss of round tripping.

I would personally use Git notes instead though.

    Finally fix it

    ---

    Notes:
        Changes in v2: ...

  Patch: 1985
  SGML: 1986

XML: 1996

what are you suggesting? XML is a simplified form of SGML. an SGML parser can parse XML so it was already possible to write an XML like document before XML was defined.

> This wouldn't be an issue if patches were XML or JSON

Or MIME, even.