guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Reproducible Builds Status Summary for Guix


From: Vagrant Cascadian
Subject: Reproducible Builds Status Summary for Guix
Date: Sun, 12 Jun 2022 20:55:38 -0700

I've been working on Reproducible Builds in guix a fair amount this
month.

data.guix.gnu.org has proven invaluable for this work, big thanks for
that!

  
https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-reproducibility


I have cataloged many of the packages that are identified by
dowloading a .json file:

  
https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-derivation-outputs.json?output_consistency=not-matching&system=x86_64-linux&target=none&field=no-additional-fields&limit_results=10000'

And then running those packages in a guix challenge for loop...

  for a in $@ ; do
    diffoscope_out=${a}.diffoscope
    diffoscope_out_comp=${diffoscope_out}.zst
    package=${a}
    if [ -s "${diffoscope_out_comp}" ] ; then
        echo ${diffoscope_out_comp} already present, skipping...
    else
        guix challenge --verbose --diff=diffoscope ${a} 2>&1 | tee 
"${diffoscope_out}"
        test -s "${diffoscope_out}" && zstd --rm --threads=0 "${diffoscope_out}"
    fi
  done

A few times I ran into disk space issues, due to:

  guix challenge with diffoscope fails to clean up temporary directory
  https://issues.guix.gnu.org/55809

So had to manually clean up some files and re-run it a few times and
probably missed a few packages...


I've looked at each of these diffoscope outputs and tried to quickly
categorize them. Attached a .yaml file (we cannot possibly have enough
different file formats!) that includes a rough identifier for each
issue. It was a rough and quick best-effort pass through, so there may
be some discrepancies...


I've already pushed fixes for a handful of packages, and tried to
remember to mark them as fixed. I've probably left many of the fixed
ones out of this list, but not terribly worried about that.

Some rough summaries about the types of issues:

  * ecl-* packages account for nearly half of the issues (~500 out of
    ~1000 packages)

  * ~850 packages categorized (ecl-* accounting for most of them)

  * 19 packages embed kernel version

  * 63 packages embed timestamps

  * 52 packages embed dates (harder to reproduce that full timestamps)

  * 5 timestamps in python .pyc files

  * 12 timestamps in .jar files

  * 66 ordering issues

  * 3 ordering issues in .pyc files

  * 9 ordering in .jar files

  * 16 ordering in guile .go files

  * ~160 largely unidentified and inscrutible issues

That's unfortunately a lot of "unidentified" issues, but I figured I'd
at least mark the ones I looked at.

This does reveal that there are some opportunities for toolchain fixes,
fixing multiple packages at a time (and future packages too!), such as
ecl, sbcl, python, java, guile, clojure, texlive (see FORCE_SOURCE_DATE
proposal
https://lists.gnu.org/archive/html/guix-devel/2022-06/msg00171.html ).

I haven't done extensive cross-referencing with other distros, but
suspect there may be patches to fix some of these toolchain issues... If
you've savvy with any of the above languages, help fixing toolchain
issues would be amazing!


I'm not sure where to collaborate on this stuff, I've just got a local
git repository and it's a bit rough. I could also push a branch to
guix.git with something like this in it.

There is a rough proposal for using a multi-project "notes" format that
debian uses:

  https://salsa.debian.org/reproducible-builds/reproducible-notes/-/tree/master
  
https://salsa.debian.org/reproducible-builds/reproducible-notes/-/blob/multi-project-syntax/ideas_on_sharing_notes_between_distros

... back in 2016, and touched on at later Reproducible Builds summits,
but not really adopted as far as I know. But I know some of the issues
are essentially the same across distros; yet some are surprisingly
different even with the same source code!


If you're looking to get your hands dirty with some reproducibility
fixes in guix, a fair number of the timestamp, date and kernel version
fixes are likely fairly easy, but you generally have to manually verify
that the date or kernel version aren't embedded, as "guix build
--rounds=2" will likely happen with the same kernel version and date.


Will be curious to see any new and exciting issues after the staging
merge!


live well,
  vagrant

Attachment: guix-rb-notes.yml
Description: Binary data

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]