bug-gzip
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#30935: gzip -l reports wrong size for decompressed files larger than


From: Mark Adler
Subject: bug#30935: gzip -l reports wrong size for decompressed files larger than 4GB
Date: Sun, 25 Mar 2018 14:05:52 -0700

Wolfgang,

The gzip format stores only the low 32 bits of the uncompressed length as the 
last four bytes of the stream, so it is not possible to show the correct 
number. At least not without decompressing the whole thing.

There are two other ways that the displayed uncompressed size can be incorrect, 
even for small files. Those are if a) there is more than one gzip member in the 
gzip stream, in which case only the uncompressed size of the last member will 
be shown, or b) if there are junk bytes after the end of the gzip stream, in 
which case the junk will be shown as the length.

In short, the reported length is informational at best, and should not be 
trusted if the information is important.The purpose of the length modulo 2^32 
being in the trailer is as an additional integrity check along with the CRC. 
However it was also used for gzip -l, which was perhaps a mistake.

You can get the actual decompressed length only by decompressing, and 
discarding the uncompressed data if you only want the length. You can either:

    gzip -dc file.gz | wc -c

or:

    pigz -lt file.gz

The latter will report the members of the gzip stream separately.

Mark


> On Mar 25, 2018, at 1:42 AM, Wolfgang Formann <address@hidden> wrote:
> 
> Hello!
> 
> I am using gzip 1.6 from openSUSE Leap 42.3 with latest patches
> 
> $ file /usr/bin/gzip
> /usr/bin/gzip: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), 
> dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 
> 3.0.0, BuildID[sha1]=7103d56e17e6f81a52db927e393dce601c3af0e1, stripped
> 
> There is a compressed file available at 
> https://data.dnb.de/opendata/GND.rdf.gz which has a size of 1.232.465.678 
> bytes. Uncompressed it will have a size of 19.465.374.298
> 
> The problem is:
> $ gzip -l GND.rdf.gz
>         compressed        uncompressed  ratio uncompressed_name
>         1232465678          2285505114  46.1% GND.rdf
> 
> This number 2285505114 is actually the lower 32 bits of the real size 19GB.
> $ echo "19465374298-16*1024*1024*1024" | bc
> 2285505114
> 
> Such a behaviour is okay for 32-bit software, 64-bit should show correct 
> numbers.
> 
> Thanks
> Wolfgang
> 
> 
> 
> 






reply via email to

[Prev in Thread] Current Thread [Next in Thread]