bug-gzip
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16472: PD: gzip - tree overflow protection algorithm doubt/suggestio


From: Krzysztof Rybak
Subject: bug#16472: PD: gzip - tree overflow protection algorithm doubt/suggestion
Date: Thu, 16 Jan 2014 22:55:25 +0100

Hi,
I sent a message to the author of gzip but without response. Maybe You can help 
me with my doubts.

best regards,
Krzysztof

Dnia Czwartek, 2 Stycznia 2014 22:41 Krzysztof Rybak <address@hidden> 
napisaƂ(a) 
> Hello Jean-loup,

> I'm preparing my own algorithm based on Huffman compression method and 
> deflate format. 

> 

> I have doubts if method implemented in gzip program is 100% correct. 

> I mean tree overflow protection:

> just to remind: in deflate (RFC1951) and therefore in gzip there is creating 
> tree for lit/len, dist (maximum length of codes is 15 bits) and code table 
> (maximum length is 7 bits). In case of too long code word, there is tree 
> modification implemented in gen_bitlen(desc) function (trees.c):

> 

>     do {

>         bits = max_length-1;

>         while (bl_count[bits] == 0) bits--;

>         bl_count[bits]--;      /* move one leaf down the tree */

>         bl_count[bits+1] += 2; /* move one overflow item as its brother */

>         bl_count[max_length]--;

>         overflow -= 2;

>     } while (overflow > 0);

> 

> I'm not sure if this method is correct for original code length greater than 
> 8 ( for maximum 7 ) and than 16 ( for maximum 15): even not every time. I 
> couldn't obtain such symbol distribution to generate so long codes on real 
> files indeed as such situation is extremely rare but generated it by 
> modifying software:

> 

> just before mentioned above overflow protection I changed bl_count histogram 
> to one as for Fibonacci numbers (which is the worst for Huffman overflow - 
> the longest codes, please refer to attached image): in my case I modified 
> tree when maximum length is 7 bits (max_length = 7 is for code book 
> compression):

>     bl_count[0] = 1;

>     bl_count[1] = 1;

>     bl_count[2] = 1;

>     bl_count[3] = 1;

>     bl_count[4] = 1;

>     bl_count[5] = 1;

>     bl_count[6] = 1;

>     bl_count[7] = 1;

>     bl_count[8] = 1;

>     bl_count[9] = 1;

>     bl_count[10] = 2;

>     overflow = 4;

> 

> in fact in gzip there is setting length = 7 when it's longer than 7:

> if (bits > max_length){

>         bits = max_length;

>         overflow++;

> }

> 

> ,so my histogram looks like:

>     bl_count[0] = 1;

>     bl_count[1] = 1;

>     bl_count[2] = 1;

>     bl_count[3] = 1;

>     bl_count[4] = 1;

>     bl_count[5] = 1;

>     bl_count[6] = 1;

>     bl_count[7] = 5;

>     bl_count[8] = 0;

>     bl_count[9] = 0;

>     overflow = 4;

> and after overflow protection method from gzip:

>     bl_count[0] = 1

>     bl_count[1] = 1

>     bl_count[2] = 1

>     bl_count[3] = 1

>     bl_count[4] = 1

>     bl_count[5] = 0

>     bl_count[6] = 2

>     bl_count[7] = 5

> , which is incorrect Huffman tree: too many elements. 

> 

> trees.c with my modification and printf() function is attached. I compress 
> obj2 from Calgary corpus.

> I know my description may look unclear, but I hope You will help me/clarify 
> my doubts.  

> 

> best regards,

> Krzysztof Rybak

Attachment: Fibonacci.png
Description: PNG image

Attachment: trees.c
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]