gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Free space wasting when handling binary files


From: John Arbash Meinel
Subject: Re: [Gnu-arch-users] Free space wasting when handling binary files
Date: Thu, 24 Mar 2005 18:15:59 -0600
User-agent: Mozilla Thunderbird 1.0 (Macintosh/20041206)

Adrian Irving-Beer wrote:

On Thu, Mar 24, 2005 at 12:42:46PM -0600, John Arbash Meinel wrote:



Remember, the tarball is compressed, so you do get a little bit of delta
compression even though there are 2 copies in there.



Negative on that, AFAICT...

% dd if=/dev/urandom of=file1 bs=1k count=1k
1024+0 records in
1024+0 records out
1048576 bytes transferred in 0.338545 seconds (3097302 bytes/sec)
% cp file1 file2
% ls -l file1 file2
-rw-r--r--  1 wisq wisq 1048576 2005-03-24 18:50 file1
-rw-r--r--  1 wisq wisq 1048576 2005-03-24 18:51 file2
% tar -zcf files.tar.gz file1 file2
% ls -l files.tar.gz
-rw-r--r--  1 wisq wisq 2097810 2005-03-24 18:51 files.tar.gz


It does depend on the size of the file versus the size of the
compression window:

$ dd if=/dev/random of=file1 bs=1k count=10
10+0 records in
10+0 records out
$ cp file1 file2
$ tar czf files.tar.gz file1 file2
$ ls -l file1 file2 files.tar.gz
-rw-r--r--    1 jameinel jameinel    10240 Mar 24 18:07 file1
-rw-r--r--    1 jameinel jameinel    10240 Mar 24 18:07 file2
-rw-r--r--    1 jameinel jameinel    10586 Mar 24 18:07 files.tar.gz

$ dd if=/dev/random of=file1 bs=1k count=100
100+0 records in
100+0 records out
$ cp file1 file2
$ tar czf files.tar.gz file1 file2
$ ls -l file1 file2 files.tar.gz
-rw-r--r--    1 jameinel jameinel   102400 Mar 24 18:09 file1
-rw-r--r--    1 jameinel jameinel   102400 Mar 24 18:09 file2
-rw-r--r--    1 jameinel jameinel   205198 Mar 24 18:09 files.tar.gz

I don't know what the gzip window is, but the bzip2 window is 900k. (If
we used bzip2 instead of gzip, the above holds true up until > 500k files).

$ dd if=/dev/random of=file1 bs=1k count=100
100+0 records in
100+0 records out
$ cp file1 file2
$ tar cjf files.tar.bz2 file1 file2
$ ls -l file1 file2 files.tar.bz2
-rw-r--r--    1 jameinel jameinel   102400 Mar 24 18:09 file1
-rw-r--r--    1 jameinel jameinel   102400 Mar 24 18:09 file2
-rw-r--r--    1 jameinel jameinel   128478 Mar 24 18:11 files.tar.bz2

$ dd if=/dev/random of=file1 bs=1k count=500
500+0 records in
500+0 records out
$ cp file1 file2
$ tar cjf files.tar.bz2 file1 file2
$ ls -l file1 file2 files.tar.bz2
-rw-r--r--    1 jameinel jameinel   512000 Mar 24 18:13 file1
-rw-r--r--    1 jameinel jameinel   512000 Mar 24 18:13 file2
-rw-r--r--    1 jameinel jameinel   750903 Mar 24 18:13 files.tar.bz2

$ dd if=/dev/random of=file1 bs=1k count=1k
1024+0 records in
1024+0 records out
$ cp file1 file2
$ tar cjf files.tar.bz2 file1 file2
$ ls -l file1 file2 files.tar.bz2
-rw-r--r--    1 jameinel jameinel  1048576 Mar 24 18:12 file1
-rw-r--r--    1 jameinel jameinel  1048576 Mar 24 18:12 file2
-rw-r--r--    1 jameinel jameinel  2106766 Mar 24 18:12 files.tar.bz2

So I would say your mostly right. Unless the files are below the size of
the compression window, then you get pretty good delta compression.

John
=:->

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]