Hacker News

Okay, so I know back in the day you could choke scanning software (ie email attachment scanners) by throwing a zip bomb into them. I believe the software has gotten smarter these days so it won’t simply crash when that happens - but how is this done; How does one detect a zip bomb?

danudey 3 days ago [ - ]

I don't understand the code itself, but here's Debian's patch to detect overlapping zip bombs in `unzip`:

https://sources.debian.org/patches/unzip/6.0-29/23-cve-2019-...

    The detection maintains a list of covered spans of the zip files
    so far, where the central directory to the end of the file and any
    bytes preceding the first entry at zip file offset zero are
    considered covered initially. Then as each entry is decompressed
    or tested, it is considered covered. When a new entry is about to
    be processed, its initial offset is checked to see if it is
    contained by a covered span. If so, the zip file is rejected as
    invalid.

So effectively it seems as though it just keeps track of which parts of the zip file have already been 'used', and if a new entry in the zip file starts in a 'used' section then it fails.

necovek 2 days ago [ - ]

I wonder if this has actually been used for backing up in real use cases (think how LVM or ZFS do snapshotting)?

I.e. an advanced compressor could abuse the zip file format to share base data for files which only incrementally change (get appended to, for instance).

And then this patch would disallow such practice.

10000truths 3 days ago [ - ]

For any compression algorithm in general, you keep track of A = {uncompressed bytes processed} and B = {compressed bytes processed} while decompressing, and bail out when either of the following occur:

1. A exceeds some unreasonable threshold

2. A/B exceeds some unreasonable threshold

integralid 2 days ago [ - ]

In practice one of the things that happens very often is that you compress a file filled with null bytes. Such files compress extremely well, and would trigger your A/B threshold.

On the other hand, zip bomb described in this blog post relies on decompressing the same data multiple times - so it wouldn't trigger your A/B heuristics necessarily.

Finally, A just means "you can't compress more than X bytes with my file format", right? Not a desirable property to have. If deflate authors had this idea when they designed the algorithm, I bet files larger than "unreasonable" 16MB would be forbidden.

10000truths 2 days ago [ - ]

> In practice one of the things that happens very often is that you compress a file filled with null bytes. Such files compress extremely well, and would trigger your A/B threshold.

Sure, if you expect to decompress files with high compression ratios, then you'll want to adjust your knobs accordingly.

> On the other hand, zip bomb described in this blog post relies on decompressing the same data multiple times - so it wouldn't trigger your A/B heuristics necessarily.

If you decompress the same data multiple times, then you increment A multiple times. The accounting still works regardless of whether the data is same or different. Perhaps a better description of A and B in my post would be {number of decompressed bytes written} and {number of compressed bytes read}, respectively.

> Finally, A just means "you can't compress more than X bytes with my file format", right? Not a desirable property to have. If deflate authors had this idea when they designed the algorithm, I bet files larger than "unreasonable" 16MB would be forbidden.

The limitation is imposed by the application, not by the codec itself. The application doing the decompression is supposed to process the input incrementally (in the case of DEFLATE, reading one block at a time and inflating it), updating A and B on each iteration, and aborting if a threshold is violated.

nrhrjrjrjtntbt 3 days ago [ - ]

Embarrsingly simple for a scanner too as you just mark as suspicious when this happens. You can be wrong sometimes and this is expected