Re: explicit extraction of files behind (sym)links

help-tar

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: explicit extraction of files behind (sym)links

From:	Reuti
Subject:	Re: explicit extraction of files behind (sym)links
Date:	Sat, 23 Jul 2022 18:53:21 +0200

Hi,

> Am 23.07.2022 um 15:04 schrieb Aiyion.Prime <help-tar@aiyionpri.me>:
> 
> Heu Reuti,
> 
> Thanks for you input. I was afraid I'd need to use a workaround like this.
> I like the idea of explicitely packing relative symlinks and use them to 
> determine the real target.
> What bugs me is the idea to extract the files in order to determine, whether 
> this is the real deal or just a symlink I need to follow first.
> 
> If I understand tar correctly, it is stored in a header whether the current 
> file is a file or a symlink.

This might be a misunderstanding. The tar archive is sequential and could even 
hold more than one file with the same name (and the same directory); maybe 
added later on with `tar --append …`. A symlink is stored as a symlink (unless 
-h is used) also in the middle of an archive.

Therefore the idea to use a list of files (created e.g. by `find <directory> 
-type l -print0 > myfilelist` followed by a `find <directory> -type f print0 >> 
filelist`) and give this list to `tar --create --null --files-from=filelist …`.

Even if the archive is compressed and hence not seekable, this will allow in 
combination with "--occurrence" to stop at the first encounter and avoid 
uncompressing the remainder of the archive (or even restore the same file 
twice). The symlinks should this way be extracted (or output) quite fast, and 
even if we got already the real file, we don't have to look in the following 
files in the archive whether we got the file another time.

I don't know whether the archive is compressed in your case, whether you would 
like to use "--wildcards" or provide always a complete path for the file to be 
extracted. Just some ideas of my approach to such a problem.

-- Reuti


> I think I'll extend your recommendation by a variant of "tar tvf archive.tar 
> targetfile".
> Gotta find out, whether that is intended to be machine-readable or if there's 
> a cleaner approach to proping the files.
> 
> Thanks for your help
> Aiyion
> 
> 
> On 7/22/22 17:29, Reuti wrote:
>> Hi Aiyion,
>>> Am 22.07.2022 um 10:13 schrieb Aiyion.Prime <help-tar@aiyionpri.me>:
>>> 
>>> 
>>> Good morning everyone,
>>> 
>>> I thought I knew my way around tar for a few years now, but learned I'm 
>>> wrong about that yesterday evening:
>>> 
>>> I'm archiving a directory-structure, that does contain large redundant 
>>> files.
>>> 
>>> onepath/readme
>>> onepath/binaryblob13
>>> anotherpath/readme
>>> anotherpath/binaryblob13
>> I don't know your complete workflow, hence I can give only a vague idea:
>> Assuming you are using symlinks in the above structure:
>> • instead of archiving the complete directories recursively, create a list 
>> of files to be saved for `tar`: first all symlinks (as symlinks), then all 
>> real files
>> • on extraction --occurrence=1 will stop at the first encounter
>> • in case it's a symlink, remove the extracted symlink file and extract the 
>> real file it points to with the name of the symlink file
>> This should speed up the processing.
>> -- Reuti
>>> I cannot change the pathing, as this is to be fed to a packagemanager, that 
>>> requires it.
>>> 
>>> What I thought I could do, to not have an archive twice the size of 
>>> `binaryblob13`, was to use sym- or hardlinks and the `-h` flag for creation.
>>> 
>>> So archiving this:
>>> 
>>> onepath/
>>> secondpath -> onepath/
>>> 
>>> using
>>> 
>>> tar --sort=name --owner=0 --group=0 --numeric-owner -chvf normal_sized.tar 
>>> secondpath onepath ${mtime})
>>> 
>>> That would work like a charm if said packagemanger would extract the whole 
>>> tarfile.
>>> 
>>> This is what it does though:
>>> 
>>> tar xf $tar_file secondpath/binaryblob13
>>> 
>>> And that works fine if I extract files from the directory first referenced 
>>> in the creation command (in the case above secondpath)
>>> but returns an error for the latter directory I archived, as it tries to 
>>> create a hardlink on disk pointing to what would've been the former 
>>> extracted file. As it does not exist I've got a problem.
>>> 
>>> I'd like to avoid extracting all binaryblob13 references beforehand only to 
>>> have the link I extract point to something valid.
>>> 
>>> Is there a flag to tell tar "I dont care if you have to seacrh the archive 
>>> twice, but extract the original file instead of creating an (invalid) 
>>> hardlink"?
>>> 
>>> 
>>> I realize thats unuseable for actual tape-records, but maybe someone has a 
>>> hint for me here.
>>> 
>>> Thanks in advance and have a nice morning,
>>> Aiyion
>>>

signature.asc
Description: Message signed with OpenPGP

[Prev in Thread]

Current Thread

[Next in Thread]

explicit extraction of files behind (sym)links, Aiyion.Prime, 2022/07/22
- Re: explicit extraction of files behind (sym)links, Paul Eggert, 2022/07/22
- Re: explicit extraction of files behind (sym)links, Reuti, 2022/07/22
  - Re: explicit extraction of files behind (sym)links, Aiyion.Prime, 2022/07/23
    - Re: explicit extraction of files behind (sym)links, Reuti <=

Prev by Date: Re: explicit extraction of files behind (sym)links
Previous by thread: Re: explicit extraction of files behind (sym)links
Index(es):
- Date
- Thread