bug-xorriso
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-xorriso] bug with handling symbolic links?


From: Joerg Meyer
Subject: Re: [Bug-xorriso] bug with handling symbolic links?
Date: Fri, 27 Feb 2015 23:09:12 +0100

Hi Thomas,

>> Identical files have different dev_t  but the same ino_t (physically stored
>> in the same location) -

> This is the base for the incremental backup decisions by
> -disk_dev_ino "ino_only".
OK - at least got that right ;-) This works perfectly...

> How much deduplication is provided by hardlinks in your file tree ?
... such that the identical files in different snapshpts are "deduplicated" in 
the image.
Keep in mind (according to the first paragraph) that they do not(!) appear as 
(conventional) hardlinks in the BTRFS.
The latter are still supported within individual subvolumes, which appearing as 
"special" directories in the filesystem,
since their contents all have the same dev_t.
In fact, I think there is hardly any conventional hardlink in any of the 
snapshots 
(which are simply subvolumes that have been cloned at a particular time).
I could get e.g. two(!) BTRFS snapshot directories each with apparent size (du 
-hs) of ~ 7GB onto a single DVD+DL.
(and du -hs shows reasonably similar sizes in the isofs for both of them as 
well).
If I have not overlooked any (secret) magic of mkisofs (whichever incarnation),
I think xorriso is the only tool that allows to arrive at such an isofs 
"on-the-fly",
i.e. not having to transform the "BTRFS snapshots construct" into "regular 
directories" with traditional hardlinks - 
which is an annoyingly time-consuming operation on my machinery.

> Can you show me the arguments of your xorriso run ?
It's essentially 
[...]
-add [snapshot-name] current -- -clone current [snapshot-name] --
-update [snapshot-name] current -- -clone current [snapshot-name] --
[...]
[optionally] -rm_r current --
Will send you my script next week.

> But on the other hand, ten million file entries are a lot.
> 220 MB of extra consumption would mean only 22 bytes per
> file.
> 
> I will investigate. But it might be the natural cost of checking 10 million 
> files for hard links.
It's actually ~0.27E6 files per snapshots (typo in my previous mail) in that 
large case above,
so the total number for two of them is still less than one million.
Also ~220 bytes more per file would not be that much,
but please do not take those numbers to be very accurate -
I have not looked at the "watch -n60 free" screen tab the entire time!
But apart from the memory consumption, there was a clear difference in time for 
the -update_r step
(which might of course have been related to different swapping):
-for_backup: >2000s
-acl on -xattr on -md5 on: ~900s

> Would it be worth to look for hardlinks, so that at restore
> time the data will not grow over the original size and the
> proper link relations are maintained ?
No, I don't think it would be in my use case:
Typically, I envision to restore one snapshot at a time.
And when looking for different versions of a config file among different 
snapshots,
then this can quickly and easily be done from the DVD.
Is it possible to only extract files from an ISO which differ (based on ino_t 
or md5?) 
compared to files already on hard disk (i.e. reverse operation of -update_r)?
If so, then at least on a BTRFS the original multiple snapshot structure 
could be very closely restored should that ever be desired...
(restoring to a BTRFS subvolume and cloning the latter between individual 
such incremental extract operations - exploiting copy-on-write).

Best wishes and have a nice weekend,
Jörg.


On 27 Feb 2015, at 14:03, Thomas Schmitt <address@hidden> wrote:

> Hi,
> 
>> Identical files have different dev_t  but the same ino_t (physically stored
>> in the same location) -
> 
> This is the base for the incremental backup decisions by
> -disk_dev_ino "ino_only".
> 
> 
>> which lead me to conclude that they are subject to hardlinks processing with
>> -hardlinks on. Is that correct?
> 
> Rather not.
> 
> -hardlinks looks inside the composed ISO directory tree
> for families of file paths of which the input files on
> "hard disk" share the same dev_t and ino_t. These families
> will share a Rock Ridge PX inode number in the ISO. This
> Rock Ridge inode number is ignored at least by Linux, FreeBSD,
> and NetBSD. But xorriso may restore the families to hardlinks
> which share dev_t, ino_t and file content.
> 
> Since dev_t on hard disk differs, the files in a snapshot
> are not linked with those in another snapshot, even if both
> snapshots would go into the same backup.
> 
> 
>> For the image generation described above, -for_backup consumes >250MB of
>> swap (in addition to the 512MB RAM) quite early (i.e. during first -add), 
>> whereas -acl on -xattr on -md5 only swaps later (i.e. during the subsequent
>> -update_r) and much more moderately (~30 MB),
> 
> Wow. That's unexpected.
> 
> But on the other hand, ten million file entries are a lot.
> 220 MB of extra consumption would mean only 22 bytes per
> file.
> 
> I will investigate. But it might be the natural cost of
> checking 10 million files for hard links.
> 
> How much deduplication is provided by hardlinks in your
> file tree ?
> Would it be worth to look for hardlinks, so that at restore
> time the data will not grow over the original size and the
> proper link relations are maintained ?
> 
> 
> Before i make own experiments on smaller scale:
> 
> Why/how do you combine -add and -update_r in the same run ?
> 
> -add causes files to be copied into the ISO unconditionally,
> whereas -update_r takes only those which have no matching
> counterpart in the loaded ISO.
> 
> Can you show me the arguments of your xorriso run ?
> 
> 
> Have a nice day :)
> 
> Thomas
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]