[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gnubg] Downloads from gnubg.org appear to be compressed twice
From: |
Michael Petch |
Subject: |
[Bug-gnubg] Downloads from gnubg.org appear to be compressed twice |
Date: |
Tue, 08 Feb 2011 07:21:11 -0700 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7 |
Okay,
This is interesting. I have confirmed that the issue is not a matter of
CRLF processing. What IS occurring with Firefox is that when the files
are downloaded from www.gnubg.org/media/sources they appear to be
compressed a second time, and the doubly compressed file is stored by
Firefox.
Here is the firefox HTTP request/response:
GET /media/sources/gnubg-source-SNAPSHOT-20110207.tar.gz HTTP/1.1
Host: www.gnubg.org
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.13)
Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729; .NET4.0E)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://www.gnubg.org/media/sources/
HTTP/1.1 200 OK
Date: Tue, 08 Feb 2011 13:21:14 GMT
Server: Apache
Last-Modified: Mon, 07 Feb 2011 03:50:09 GMT
ETag: "10f3-d8a97b-49ba920ce1e40"
Accept-Ranges: bytes
Cache-Control: max-age=2419200
Expires: Tue, 08 Mar 2011 13:21:14 GMT
Vary: Accept-Encoding
Content-Encoding: gzip
Keep-Alive: timeout=5, max=200
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: application/x-gzip
I'm begining to think that the chunked encoding with gzip is adding the
extra compression. I took the firefox download (that tar can't process
directly) and asked it to tell me the content of the file. it said:
gunzip -ltv ~mpetch/Desktop/gnubg-source-SNAPSHOT-20110207.tar.gz
method crc date time compressed uncompressed
ratio uncompressed_name
defla 54b9e403 Feb 8 04:54 14194658 14199163
0.0% /home/mpetch/Desktop/gnubg-source-SNAPSHOT-20110207.tar
Make note of the fact that the compression ration is near 0 and it
claims the contents of the archive tar ball. 0% Compression tells me
that the fireforx download is not a gzipped-tarball, but a
gzipped-gzipped-tarball.
So I decided to gunzip the firefox tarball, rename the .tar file to
.tar.gz and then ask gunzip to tell me the contents. this is what I see:
(rename the file from tar to tar.gz)
mv gnubg-source-SNAPSHOT-20110207.tar gnubg-source-SNAPSHOT-20110207.tar.gz
(ask gunzip to tell me what is in the file)
gunzip -l gnubg-source-SNAPSHOT-20110207.tar
compressed uncompressed ratio uncompressed_name
14199163 23439360 39.4%
gnubg-source-SNAPSHOT-20110207.tar
Sure enough now it says the contents are a tarball and there was 39.4%
compression. So clearly this thing is compressed twice! When you
download with wget you get a tar.gz that has been compressed once. wget
though DOES NOT use chunked encoding. I am going to guess that Chunked
encoding+Apache on the fly Gzip is somehow causing this.
Is it possible to have apache on gnubg.org to not compress already
compressed files (don't compress anything ending in .gz)?
As for why tar doesn't decompress ('z' option) but gunzip works appears
to be simple. gunzip clearly does one decompression, resulting in a true
tar.gz (but with a .tar extension) file. It appears that tar is smart
enough to figure out that the input stream is a compressed tarball and
even without the 'z' parameter on the command line - it is still able to
decompress it. What tar can't do is double decompress the file by itself.
Mike
- [Bug-gnubg] Downloads from gnubg.org appear to be compressed twice,
Michael Petch <=