bug-xorriso
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RFC: devmapping cutouts?


From: Ivan Shmakov
Subject: RFC: devmapping cutouts?
Date: Thu, 26 May 2022 03:12:24 +0000

        The -cut-out option, as recently amended [1], allows one to
        store one or more fragments of data from a block device on
        an ECMA 119 [2] filesystem.  The obvious use case for that
        is to store data across several optical disks that exceeds
        the capacity of one.

[1] http://bugs.debian.org/1010098
[2] http://ecma-international.org/publications-and-standards/standards/ecma-119/

        A somewhat rarer case is to avoid storing large spans of
        value-zero bytes present in the source data.

        This can occur when said data is a regularly-“trimmed”
        (e. g., with [2]) filesystem residing on a flash-based drive;
        or when the allocated storage was simply never fully written
        over (such as when much more space was allocated for a
        filesystem than ever got used.)  There, it’s possible to
        only store on the target filesystem the ranges that aren’t
        entirely filled with zeros.

[3] http://manpages.debian.org/sid/fstrim.8

        Suppose that such a filesystem was created.  What would be
        an efficient way to access its contents as if it were the
        original block device?

        Arguably it’d be through the creation of a devmapper block
        device with a table mapping block ranges of said block
        device either to file data on the given ECMA 119 filesystem,
        or to the ‘zero’ target, as appropriate.

        Note that it’s possible to bypass the filesystem layer here
        (assuming that the ECMA 119 filesystem resides on a block
        device, such as an optical disk) by consulting the output of
        xorriso(1) -find -exec report_lba command, like:

$ xorriso \
      -indev stdio:/dev/BACKUP \
      -find / -exec report_lba -- \
      -rollback-end 
xorriso 1.5.0 : RockRidge filesystem manipulator, libburnia project.

xorriso : NOTE : Loading ISO image tree from LBA 0
xorriso : UPDATE :    1218 nodes read in 1 seconds
Drive current: -indev 'stdio:/dev/BACKUP'
Media current: stdio file, overwriteable
Media status : is written , is appendable
Media summary: 1 session, 2293183 data blocks, 4479m data,     0 free
Volume id    : '3FD2B6132D3543A196F288D2A82A7A42'
Report layout: xt , Startlba ,   Blocks , Filesize , ISO image path
File data lba:  0 ,  2293070 ,       84 ,   171690 , 
'/private/backups/.mtree/2022-05-21'
File data lba:  0 ,  2293154 ,       61 ,   123688 , 
'/private/backups/.sha256/2022-05-21'
File data lba:  0 ,      135 ,      195 ,   399360 , 
'/private/backups/lvfoo-z598d01-x6288bb2c/+0'
File data lba:  0 ,      330 ,       36 ,    73728 , 
'/private/backups/lvfoo-z598d01-x6288bb2c/+10000000'
File data lba:  0 ,      366 ,     5979 , 12244992 , 
'/private/backups/lvfoo-z598d01-x6288bb2c/+100202000'
File data lba:  0 ,     6345 ,     1466 ,  3002368 , 
'/private/backups/lvfoo-z598d01-x6288bb2c/+100dd0000'
File data lba:  0 ,     7811 ,      101 ,   206848 , 
'/private/backups/lvfoo-z598d01-x6288bb2c/+1010d0000'
…

        Here, the filenames record the byte offset of the fragment
        in hexadecimal; so that e. g. +10000000 is 256 MiB from the
        start of the image, +100dd0000 is 4208448 KiB, etc.

        (Arguably it makes even more sense to use Crockford Base32
        encoding for the offsets, so the aforementioned would end up
        being pretty concise 800000 and 40DT000, respectively; with
        only 8 Base32 digits being necessary to encode offsets within
        1 TiB.)

        It’s easy to transform the output above into a table
        suitable for passing to the # dmsetup create command (note
        that devmapper operates on 512 byte blocks rather than bytes):

0 780 linear /dev/BACKUP 540
780 3356 zero
4136 23820 linear /dev/BACKUP 2562196
27956 260 zero
28216 101100 linear /dev/BACKUP 7999556
129316 348 zero
…

        Or, what I’ve actually used, is a list of arguments to my
        dmmontage [4] convenience wrapper (albeit largely superfluous
        in this case.)

--target=lvfoo-z598d01-x6288bb2c --
 @0 /dev/BACKUP:540-1320
 @780 /dev/zero:0-3356
 @4136 /dev/BACKUP:2562196-2586016
 @27956 /dev/zero:0-260
 @28216 /dev/BACKUP:7999556-8100656
 @129316 /dev/zero:0-348
…

        (Line broken for readability.)

[4] http://am-1.org/~ivan/src/blkutils-2022/dmmontage.sh

        The conversion code is as follows.

#!/usr/bin/perl

### Ivan Shmakov, 2022

## To the extent possible under law, the author(s) have dedicated
## all copyright and related and neighboring rights to this software
## to the public domain worldwide.  This software is distributed
## without any warranty.

## You should have received a copy of the CC0 Public Domain Dedication
## along with this software.  If not, see
## <http://creativecommons.org/publicdomain/zero/1.0/>.

### Code:

use common::sense;

my $orig
    = shift (@ARGV);
my ($prev, %acc)
    = ();

sub print_out {
    ## .
    return
        unless (defined ($prev));
    print  ("--target=",
            $prev =~ s/-([0-9]+)$/${ \sprintf ("-x%x", $1); }/r,
            ".thin --");
    my $pos
        = 0;
    foreach my $o (sort { $a <=> $b; } (keys (%acc))) {
        printf (" @%d /dev/zero:0-%d", $pos, $o - $pos)
            if ($o != $pos);
        print (" @", $o, ($o != $acc{$o}->[1] ? (" ", $acc{$o}->[0]) : ()));
        $pos
            = $acc{$o}->[1];
    }
    print ("\n");
}

while (<STDIN>) {
    my ($so, $z, $ta, $to) = m {
        ^ File\sdata\slba:
          \s+ [0-9]+ [ ,]+ ([0-9]+) .*\b ([0-9]+)
          \s* , .*? / ([^/]+-x?[0-9a-f]+)
          / [0-9a-f]* [+] ([0-9a-f]+) \b
    }x or next;
    if ($prev ne $ta) {
        print_out ();
        ($prev, %acc)
            = ($ta);
    }
    $so <<= 2;
    $z >>= 9;
    $to = do {
        no warnings;
        (hex ("0x" . $to) >> 9);
    };
    # warn ("D: ", join (" ", $so, $z, $ta, $to), "\n") if (0);
    $acc{$to}
        = [ sprintf ("%s:%d-%d", $orig, $so, $z + $so), $z + $to ];
}
print_out ()
    if (defined ($prev));

        Two more issues to consider are: a. locating the 0-filled
        ranges in the source, as well as balancing the amount of
        data /not/ stored vs. the number of files / ranges needed;
        and b. extending this simple ‘bunch of files with filenames
        being offsets’ convention in such a way as to allow for
        incremental backups of filesystem (or other large dataset,
        where it might make sense) snapshots.

        For instance, in this particular case, 105568 individual
        ranges of 0-filled 512-byte blocks (2411608 in total) were
        identified on the filesystem.  In order to save on the
        filesystem and dmsetup overhead, 1097588 of such 512-byte
        0-filled blocks (across 104359 ranges) were disregarded
        (as in: kept in the resulting archive), while 1314020
        512-byte blocks across 1209 ranges were “cut out” and not
        archived.  Yet this still allowed me to fit a 5 GiB FS
        snapshot on a single 4482 MiB DVD+R blank.

        Thoughts?

-- 
FSF associate member #7257  http://am-1.org/~ivan/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]