cons-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Signature configurability: documentation review


From: Steven Knight
Subject: Signature configurability: documentation review
Date: Thu, 29 Mar 2001 15:57:53 -0600 (CST)

Conspirators--

I have the signature configurability feature, based on the mailing-list
exchange last month, working on my in-house copy.  While I'm working
on test cases and cleaning up code prior to checkin, I could use some
review of the documentation.

The initial goal here was to preserve existing behavior as the default
while allowing for a very easy-to-configure equivalent to Wayne
Scott's patch for content-based signatures for derived files.  It got
generalized and now provides a lot of flexibility to accomodate how
different build tools might or might not insert time stamps or other
gunk into derived files.

One last interface question:  As currently coded, this feature will
allow you to specify file-glob patterns to match file path names, but
the patterns can currently match across directory separators.  That
is, "foo/*.c" will match any *.c file anywhere underneath the "foo"
subdirectory: not just "foo/bar.c" but also "foo/subdir/bar.c".  I
can't decide if this is a bug or a feature (although cpio does the same
thing).  If it's a bug, it could be "fixed" by making the patterns Perl
regular expressions, not file glob.  This would at least avoid making it
look like globbing but having it behave slightly differently than the
behavior people are used to from shells.

I'm anticipating releasing this (and all the other 2.3.0 stuff that's
piled up) Real Soon Now.

New draft 2.3.0 signature documentation is appended below; any and all
feedback is welcome before I check it in.

        --SK



Signatures
    Cons uses file signatures to decide if a derived file is out-of-date and
    needs rebuilding. In essence, if the contents of a file change, or the
    manner in which the file is built changes, the file's signature changes
    as well. This allows Cons to decide with certainty when a file needs
    rebuilding, because Cons can detect, quickly and reliably, whether any
    of its dependency files have been changed.

  MD5 content and build signatures

    Cons uses the Message Digest 5, or MD5 algorithm to compute file
    signatures. The MD5 algorithm computes a strong cryptographic checksum
    for any given input string. Cons can, based on configuration, use two
    different MD5 signatures for a given file:

    The content signature of a file is an MD5 checksum of the file's
    contents. Consequently, when the contents of a file change, its content
    signature changes as well.

    The build signature of a file is an MD5 checksum of the combined
    signatures of all the file's dependencies (that is, all the input files
    used to build the file, plus all dependency files discovered by source
    scanners or specified explicitly via the `Depends' method), plus the
    command-line string used to build the file. The build signature is, in
    effect, a digest of all the dependency information for the specified
    file. Consequently, a file's build signature changes whenever any part
    of its dependency information changes: a new file is added, the contents
    of a dependent file change, the command line used to build the file
    changes, etc.

    For example, in the previous section, the build signature of the world.o
    file will include:

        the signature of the world.c file

        the signatures of any header files that Cons detects are included,
        directly or indirectly, by world.c

        the text of actual command line was used to generate world.o

    Similarly, the build signature of the libworld.a file will include all
    the signatures of its constituents (and hence, transitively, the
    signatures of their constituents), as well as the command line that
    created the file.

    Note that there is no need for a derived file to depend upon any
    particular Construct or Conscript file. If changes to these files affect
    a file, then this will be automatically reflected in its build
    signature, since relevant parts of the command line are included in the
    signature. Unrelated changes will have no effect.

  Storing signatures in .consign files

    Before Cons exits, it stores the calculated signatures for all of the
    files it built or examined in separate .consign files, one per
    directory. Cons uses this stored information on later invocations to
    decide if derived files need to be rebuilt.

    After the previous example was compiled, the .consign file in the
    build/peach/world directory looked like this:

      world.h:985533370 - d181712f2fdc07c1f05d97b16bfad904
      world.o:985533372 2a0f71e0766927c0532977b0d2158981
      world.c:985533370 - c712f77189307907f4189b5a7ab62ff3
      libworld.a:985533374 69e568fc5241d7d25be86d581e1fb6aa

    After the file name and colon, the first number is a timestamp of the
    file's modification time (on UNIX systems, this is typically the number
    of seconds since January 1st, 1970). The second value is the build
    signature of the file (or ``-'' in the case of files with no build
    signature--that is, source files). The third value, if any, is the
    content signature of the file.

  Using signatures to decide when to rebuild files

    When Cons is deciding whether to build or rebuild a derived file, it
    first computes the file's build signature. If the file doesn't exist, it
    must obviously be built.

    If, however, the file already exists, Cons next compares the
    modification timestamp of the file against the timestamp value in the
    .consign file. If the timestamps match, Cons compares the newly-computed
    build signature against the build signature in the .consign file. If the
    build signatures do not match, the derived file is rebuilt.

    After the file is built or rebuilt, Cons arranges to store the
    newly-computed build signature in the .consign file when it exits.

    Note that a file will be rebuilt whenever anything about a dependent
    file changes. In particular, because Cons looks for exact timestamp
    matches in the .consign file, *any* change to the modification time of a
    dependency, forward or backwards in time, will force a rebuild of the
    derived file.

  Signature example

    The use of these signatures is an extremely simple, efficient, and
    effective method of improving--dramatically--the reproducibility of a
    system.

    We'll demonstrate this with a simple example:

      # Simple "Hello, World!" Construct file
      $CFLAGS = '-g' if $ARG{DEBUG} eq 'on';
      $CONS = new cons(CFLAGS => $CFLAGS);
      Program $CONS 'hello', 'hello.c';

    Notice how Cons recompiles at the appropriate times:

      % cons hello
      cc -c hello.c -o hello.o
      cc -o hello hello.o
      % cons hello
      cons: "hello" is up-to-date.
      % cons DEBUG=on hello
      cc -g -c hello.c -o hello.o
      cc -o hello hello.o
      % cons DEBUG=on hello
      cons: "hello" is up-to-date.
      % cons hello
      cc -c hello.c -o hello.o
      cc -o hello hello.o

  Derived-file signature configuration

    Cons allows you to configure how derived file signatures are used to
    calculate further dependencies via the `SIGNATURE' construction
    environment variable. The value of the `SIGNATURE' construction variable
    is a Perl array reference that holds one or more pairs of strings.

    The first string in each pair is a pattern to match against derived file
    path names. The pattern is a file-globbing pattern, not a Perl regular
    expression; the pattern `*.obj' will match all (Win32) object files.
    Patterns will also match files across directories; the pattern
    `foo/*.la' would match all (UNIX) library archives in any subdirectory
    underneath the foo subdirectooy.

    The second string in each pair contains one or more of the following
    keywords to specify how signatures should be calculated for derived
    files that match the pattern. The available keywords are:

    build
        Use the build signature of the derived file when calculating
        signatures of dependent files.

    content
        Use the content signature of the derived file when calculating
        signatures of dependent files.

    consign
        Use the derived file's build or content signature as stored in the
        `.consign' file, provided the derived file's timestamp matches the
        cached timestamp value in the `.consign' file.

    The Cons default behavior (as previously described) for using
    derived-file signatures is equivalent to:

      $env = new cons(SIGNATURE => ['*' => 'consign build']);

    The `'*'' will match all derived files. The `build' keyword specifies
    that build signatures are to be used for these files, and the `consign'
    keyword specifies that Cons may use build signatures found in .consign
    files, provided the timestamps match.

    Note that the order of the keywords in the string does not matter.
    Specifying `'build consign'' is equivalent to specifying `'consign
    build''.

    A useful alternative default `SIGNATURE' configuration for many sites:

      $env = new cons(SIGNATURE => ['*' => 'consign content']);

    In this configuration, derived files have their signatures calculated
    from the file contents. This has the useful effect of "stopping" further
    rebuilds if a derived file is rebuilt to exactly the same file contents
    as before.

    For example, changing a comment in a C file and recompiling should
    generate the exact same object file (assuming the compiler doesn't
    insert a timestamp in the object file's header). In that case,
    specifying `'content'' for the signature calculation will cause Cons to
    recognize that the object file did not actually change as a result of
    being rebuilt, and libraries or programs that include the object file
    will not be rebuilt. When `'build'' is specified, however, Cons will
    only "know" that the object file was rebuilt, and proceed to rebuild any
    additional files that include the object file.

    Note that Cons tries to match derived file path names against the
    patterns in the order they are specified in the `SIGNATURE' array
    reference:

      $env = new cons(SIGNATURE => ['foo/*.o' => 'build',
                                    '*.o' => 'consign content',
                                    '*.a' => 'consign build',
                                    '*' => 'content']);

    In this example, all object files underneath the foo subdirectory will
    use build signatures, all other object files (including object files
    underneath other subdirectories!) will use .consign file content
    signatures, libraries will use .consign file build signatures, and all
    other derived files will use content signatures.

  Source-file signature configuration

    Cons provides a `SourceSignature' method that allows you to specify how
    source file signatures should be calculated. The arguments to the
    `SourceSignature' method are the same pairs of strings (pattern and
    keywords) as previously described for the `SIGNATURE' construction
    variable, except that you don't need to enclose them in square brackets
    to specify an array reference.

    Because source files are not built by Cons, the `build' keyword has no
    effect when used in the `SourceSignature' method.

    The Cons default behavior of always calculating a source file's
    signature from the file's contents is equivalent to specifying:

      SourceSignature '*' => 'content';

    This specifies that Cons always reads the contents of source files to
    generate a signature on each invocation. A useful performance
    optimization is:

      SourceSignature '*' => 'consign content';

    This specifies that Cons will use pre-computed content signatures from
    .consign files, when available, rather than re-calculating a signature
    from the the source file's contents each time Cons is run.

    (Note, however, that using the 'consign' keyword for source files opens
    up the very slight possibility of an incorrect build when a source
    file's contents have been changed so quickly after a Cons build that its
    modification timestamp still matches the timestamp in the .consign
    file.)

    Cons tries to match source file path names against the patterns in the
    order they are specified in the `SourceSignature' arguments:

      SourceSignature '/usr/repository/objects/*' => 'consign content',
                      '/usr/repository/*' => 'content',
                      '*.y' => 'content',
                      '*' => 'consign content';

    In this example, all source files under the /usr/repository/objects
    directory will use .consign file content signatures, source files
    anywhere else underneath /usr/repository will not use .consign signature
    values, all Yacc source files (`*.y') anywhere else will not use
    .consign signature values, and any other source file will use .consign
    signature values.

  Debugging signature calculation

    Cons provides a `-S' option that can be used to specify what internal
    Perl package Cons should use to calculate signatures. The default Cons
    behavior is equivalent to specifying `-S md5' on the command line.

    The only other package (currently) available is an `md5::debug' package
    that prints out detailed information about the MD5 signature
    calculations performed by Cons:

      % cons -S md5::debug hello
      sig::md5::srcsig(hello.c)
              => |52d891204c62fe93ecb95281e1571938|
      sig::md5::collect(52d891204c62fe93ecb95281e1571938)
              => |fb0660af4002c40461a2f01fbb5ffd03|
      sig::md5::collect(52d891204c62fe93ecb95281e1571938,
                        fb0660af4002c40461a2f01fbb5ffd03,
                        cc   -c %< -o %>)
              => |f7128da6c3fe3c377dc22ade70647b39|
      sig::md5::current(||
                     eq |f7128da6c3fe3c377dc22ade70647b39|)
      cc -c hello.c -o hello.o
      sig::md5::collect()
              => |d41d8cd98f00b204e9800998ecf8427e|
      sig::md5::collect(f7128da6c3fe3c377dc22ade70647b39,
                        d41d8cd98f00b204e9800998ecf8427e,
                        cc  -o %> %<  )
              => |a0bdce7fd09e0350e7efbbdb043a00b0|
      sig::md5::current(||
                     eq |a0bdce7fd09e0350e7efbbdb043a00b0|)
      cc -o hello, hello.o




reply via email to

[Prev in Thread] Current Thread [Next in Thread]