Hacker News

In case "bases with optional newlines" wasn't obvious to anyone else, a specific example (from Wikipedia) is:

    ;LCBO - Prolactin precursor - Bovine
    MDSKGSSQKGSRLLLLLVVSNLLLCQGVVSTPVCPNGPGNCQVSLRDLFDRAVMVSHYIHDLSS
    EMFNEFDKRYAQGKGFITMALNSCHTSSLPTPEDKEQAQQTHHEVLMSLILGLLRSWNDPLYHL
    VTEVRGMKGAPDAILSRAIEIEEENKRLLEGMEMIFGQVIPGAKETEPYPVWSGLPSLQTKDED
    ARYSAFYNLLHCLRRDSSKIDTYLKLLNCRIIYNNNC*

where "SS...EM", HL..VT", or "ED..AR" may be common subsequences, but the plaintext file arbitrarily wraps at column 65 so it renders on a DEC VT100 terminal from the 70s nicely.

Or, for an even simpler example:

    ; plaintext
    GATTAC
    AGATTA
    CAGATT
    ACCAGA
    TTACAG
    ATTACA

becomes, on disk, something like

    ; plaintext\r\nGATTAC\r\nAGATTA\r\nCAGATT\r\nACCAGA\r\nTTACAG\r\nATTACA\r\n

which is hard to compress, while

    ; plaintext\r\nGATTACAGATTACAGATTACCAGATTACAGATTACA

is just

    "; plaintext\r\n" + "GATTACA" * 7

and then, if you want, you can reflow the text when it's time to render to the screen.

tgtweak a day ago [ - ]

Feels like it could be an extension to the compression lib (and would retain newlines as such) vs requiring external document tailoring. Also feels like a very specific use case but this optimization might have larger applications outside this narrow field/format.

Terr_ a day ago [ - ]

Huh, so in other words: "If you don't arbitrarily interrupt continuous sequences of data with cosmetic noise, they compress better."

spatoa 12 hours ago [ - ]

When working with data, I definitely prefer the UI to adapt to the data. I never save anything for the display back.