lzip-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip


From: Damir
Subject: Re: [Lzip-bug] Selection of CRC32 Polynomial for lzip
Date: Thu, 18 May 2017 07:31:06 +0000

Hello Antonio! 




> Some recent CPUs (x86_64 SSE4.2, PowerPC ISA 2.07, ARM v8.1) offer
> hardware accelerated calculation of CRC32 with a different polynomial
> (crc32c) than used in lzip (ethernet crc32).

Maybe hardware accelerated calculation of ethernet CRC32 also exists.
After all it is the same polynomial used by gzip and zlib.

Not in those CPUs I mentioned. And won't be implemented in new hardware because of inferiority of ethernet poly.


> So, picking crc32c poly instead has two benefits:
> 1) hardware accelerated integrity checking

Hardware acceleration of CRC calculation makes sense for storage devices
because the data is just moved; there is no time spent in processing it.
Calculating the CRC is the only calculation involved.

But calculating the CRC is just a small part of the total decompression
time. So, even if you accelerate it, the total speed gain is small.
(Probably smaller than 5%). For compression the speed gain is even smaller.

I can cite your own lzip benchmark, when comparing uncompression peformance of lunzip vs busybox unxz, enabling crc in unxz (by using xz with crc32) gives performance penalty of 16.7% (9.723s vs 8.331s). That's more convincing number than 5%.



> 2) better protection against undetected errors

You will need to prove this one.

CRC32C has a slightly larger Hamming distance than ethernet CRC32 for
"small" packet sizes (see pags 3,4 of [1]). But beyond some size perhaps
not much larger than 128 KiB, both have the same HD of 2. For files
larger than that (uncompressed) size, there is little diference between
both CRCs.

That's not accurate at all. According to Koopman's crc32 zoo, crc32c gives hd=4 on 2 gigabits, while ethernet crc is 92 kilobits.
http://users.ece.cmu.edu/~koopman/crc/crc32.html 

2 gigabits (268 MB) is well within typical lzip usage, while 92 Kbit (11 KB) is nothing.


Even more important, we are talking about the interaction between
compression and integrity checking. The difference between a Hamming
distance of 2 or 3 is probably immaterial here. Maybe you would like to
read section 2.10 of [2]. I quote:

That's a valid point. There is no good error model describing corruption in uncompressed file resulting from typical errors in compressed files (BER, burst errors, nand page error, hdd sector error). Still, better HD on typical sizes is preferred.


> The downside is the compatibility problem, but changing version byte in
> file header can help with that.

This is a very large downside, most probably to gain almost nothing.
IMO, one of the big problems of today's software development is that too
many people are willing to complicate the code without the slightest
proof that the proposed change is indeed an improvement.

That's an argument that everyone uses against implementing lzip. But relatively narrow spread of lzip is actually plus here, because it gives much more flexibility.

 New decompressor can decompress both old files and new ones, and old decompressor can decompress new ones, but can't check its integrity. Large downside, but not very large.



Best regards,
Antonio.

_______________________________________________
Lzip-bug mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/lzip-bug

reply via email to

[Prev in Thread] Current Thread [Next in Thread]