help-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Output of guix build --check foo is not part of store deduplication


From: Björn Höfling
Subject: Output of guix build --check foo is not part of store deduplication
Date: Thu, 9 Aug 2018 11:45:55 +0200

Is there any reason why the output of 'guix build --check ...' is not
part of deduplication? I will explain my problem:

When checking for (un)reproducibility, we use something like:

guix build --check -K foo

That will build the package foo again and produce a store output

/gnu/store/hash..-foo-1.0.0-check

You can then use diffoscope to view the difference between the old and
the new '-check' output.

Usually, the store gets deduplicated, i.e. if files bar and baz have
the same content, they will hard-link to the same thing on disk. That's
cool for saving space if for example some package get's updated because
of a changed dependency but really there is no or little change to the
output files.

But the '-check' files are somehow not part of that deduplication. Even
if you enforce deduplication with guix gc --optimize. You can see it
like this:

ls -l  
/gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz
 
/gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz
 
-r--r--r--  1 root root 624 Jan  1  1970 
/gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz
-r--r--r-- 11 root root 624 Jan  1  1970 
/gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz

ls -i  
/gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz
 
/gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz
 
46161304 
/gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz
45141642 
/gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz

The '-check' output has only one link count and the actual output has
11 links, because I have already so many store items/generations of
that package around. The inode differs.

If you now diffoscope them, diffoscope will call stat and then we get
diffs like:

│ │   --- 
/gnu/store/h63cx6akyrv3m73lky585ba10qq3mydc-libchop-0.5.2/share/info/libchop.info.gz
│ │ ├── +++ 
/gnu/store/h63cx6akyrv3m73lky585ba10qq3mydc-libchop-0.5.2-check/share/info/libchop.info.gz
│ │ │ ├── /gnu/store/as7vb5xx7vqdwmmqj9543470r49b4c0c-coreutils-8.28/bin/stat {}
│ │ │ │ @@ -1,8 +1,8 @@
│ │ │ │  
│ │ │ │    Size: 29524          Blocks: 64         IO Block: 4096   regular file
│ │ │ │ -Links: 3
│ │ │ │ +Links: 1


This is annoying because it hides the actual unreproducibility-problem. 
Is there any reason for that?


At least, I found a very guixy way around it:

There's a patch by Eelco to filter those Links out:

https://github.com/edolstra/diffoscope/commit/367f77bba8df0dbc89e63c9f66f05736adf5ec59

(with copy/paste errors):

 diffoscope/comparators/directory.py
@@ -47,14 +47,18 @@ def cmdline(self):
    FILE_RE = re.compile(r'^\s*File:.*$')
    DEVICE_RE = re.compile(r'Device: [0-9a-f]+h/[0-9]+d')
+   LINKS_RE = re.compile(r'Links: [0-9]+')
    ACCESS_TIME_RE = re.compile(r'^Access: [0-9]{4}-[0-9]{2}-[0-9]{2}.*$')
    CHANGE_TIME_RE = re.compile(r'^Change: [0-9]{4}-[0-9]{2}-[0-9]{2}.*$')
    def filter(self, line):
        line = line.decode('utf-8')
        line = Stat.FILE_RE.sub('', line)
        line = Stat.DEVICE_RE.sub('', line)
        line = Stat.INODE_RE.sub('', line)
+       line = Stat.LINKS_RE.sub('', line)
        line = Stat.ACCESS_TIME_RE.sub('', line)
        line = Stat.CHANGE_TIME_RE.sub('', line)
        return line.encode('utf-8')


So, I did:

guix build -S diffoscope

to get the source tarball, unpacked the sources. Patched. Packed. Then:

guix package -i diffoscope --with-source=diffoscope-96.tar.gz

and have a Links-free version of diffoscope in my profile (If I would
have thought about that earlier, I would have done it in a separate
profile and not in my main one)!

Björn




Attachment: pgpU4oPxAx_c4.pgp
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]