gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Free space wasting when handling binary files


From: John Arbash Meinel
Subject: Re: [Gnu-arch-users] Free space wasting when handling binary files
Date: Thu, 24 Mar 2005 12:42:46 -0600
User-agent: Mozilla Thunderbird 1.0 (Macintosh/20041206)

Simon Geusebroek wrote:

Hello !

I'm working on a project which will handle big binary files (.deb
files). I've noted that Arch stores each files twice (in order to
check data integrity, if I've rightly understood, cf.


It's not so much for data integrity. In arch each patch is basically
it's own entity. Meaning if I just sent you the tarball of one patch,
you could apply it to your tree without any other information. (provided
your tree was somewhat similar).
Patches are also designed to be reversible. So again, if I just send you
one patch, you can apply it in reverse, and undo those changes.

In the case of binary files, you have to store the pre and post of the
file. So that you can check to make sure you are applying the "patch" to
the right pre image, to get the post image. This could be done with a
checksum, but then the patch would be irreversible.

There have been several proposals for alternatives to this approach. But
nobody has been sufficiently motivated to do so. For most people the
space is not much of a concern.
Remember, the tarball is compressed, so you do get a little bit of delta
compression even though there are 2 copies in there.

I don't know much about the .deb format, but isn't it generated from
some other sources? Is there a reason you need to save the final output,
rather than the stuff leading up to it?

http://wiki.gnuarch.org/Ask_20Arch_20questions?action=refresh&arena=Page.py&key=Ask_20Arch_20questions.text_html#head-9fbf906184b7b40ad6cc10c85c68d8a77afa250e).
This is doubling the size of my archive, loosing a lot of space :(.

I'm wondering if it's possible to deactivate this system (that is,
store each file only once…) and, moreover, why checksum are not used:
it seems to me that it could be sufficient to insure data integrity
and could free a lot of space (I'm surely wrong for one reason or
another, but I would like to understand why…).

Thank you in advance,

Simon Geusebroek.


Sorry I didn't have a better answer. If it is truly important to you,
and you are unable to do it yourself, you probably can hire some people
from this list to work on it.

John
=:->

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]