bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: git-commit based mtime-reproducible tarballs


From: Paul Eggert
Subject: Re: RFC: git-commit based mtime-reproducible tarballs
Date: Sun, 15 Jan 2023 08:03:35 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2

On 2023-01-15 05:21, Bruno Haible wrote:
Reproducibility is about verifying that an artifact A was generated
from a source S.

Quite true. However, there's something else going on: when I do an 'ls -l' of a source directory that I got from a distribution tarball, it's useful to see the last time the contents of each source file was changed upstream. When sources are in a Git repository, I've found the commit timestamp to be a good representation for that.

For TZDB, where users have long wanted reproducibility, I use something like this in a Makefile recipe for each source file $$file:

              time=`git log -1 --format='tformat:%ct' $$file` &&
              touch -cmd @$$time $$file

Here are three problems I ran into with this approach, and the solutions that TZDB uses:

1. As you mentioned, what if you're building a release from sources that have not yet been committed? In this case TZDB's Makefile recipe warns but goes ahead with the timestamp that the working file already has.

2. What about platform-independent files that are automatically created from source files from the repository, and that are shipped in the release tarball? In this case, the TZDB Makefile arranges for each such file to have a timestamp one second later than the maximum of timestamps of files that the file depends on. This step is the biggest hassle, since it means I need to repeat in the Makefile the logic that 'make' already uses when calculating dependencies.

3. What about tarball metadata other than last-modified time? Here, TZDB uses the following GNU Tar options:

  GNUTARFLAGS= --format=pax --pax-option='delete=atime,delete=ctime' \
  --numeric-owner --owner=0 --group=0 \
  --mode=go+u,go-w --sort=name

The need for most of this should be obvious, if one wants the tarball to be reproducible. However, some details are less obvious. GNUTARFLAGS specifies pax format because the default GNU Tar format becomes unportable after 2242-03-16 12:56:32 UTC due to the 33-bit limitation of ustar. And GNUTARFLAGS uses delete=atime,delete=ctime so that atime and ctime do not leak into the tarball and make it less reproducible; since mtime values are always a multiple of 1 second (given steps 1 and 2) this means the tarball will be ustar-compatible until 2242, giving users *plenty* of time to prepare for pax format timestamps.

There is an argument that we need not have a fancy GNUTARFLAGS like this, because I'm signing the tarballs and users have to trust me anyway. Still, some users want to "trust but verify" and a reproducible tarball is easier to audit than a non-reproducible one, so for these users it can be a win to omit the irrelevant data from the tarball.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]