RFC: doc for `Handling Tools that Produce Many Outputs' (2nd round)

automake

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RFC: doc for `Handling Tools that Produce Many Outputs' (2nd round)

From:	Alexandre Duret-Lutz
Subject:	RFC: doc for `Handling Tools that Produce Many Outputs' (2nd round)
Date:	Sat, 14 Feb 2004 21:38:19 +0100
User-agent:	Gnus/5.1003 (Gnus v5.10.3) Emacs/21.3.50 (gnu/linux)

Sorry for the delay, I've been busy.  Here is an update of that
section.  I think I've included all remarks so far, most notably
Eric's simpler solution and Tim's touch trick.

I didn't feel the inclination to discuss the
weird-corner-case-with-messed-timestamps because 
  1. their was no agreement about it
  2. I think it would be confusing
  3. I'm lazy

How does that look?

Handling Tools that Produce Many Outputs
========================================

This section describes a `make' idiom that can be used when a tool
produces multiple output files.  It is not specific to Automake and can
be used in ordinary `Makefile's.

   Suppose we have a program called `foo' that will read one file
called `data.foo' and produce two files named `data.c' and `data.h'.
We want to write a `Makefile' rule that captures this one-to-two
dependency.

   The naive rule is incorrect:

     # This is incorrect.
     data.c data.h: data.foo
             foo data.foo

What the above rule really says is that `data.c' and `data.h' each
depend on `data.foo', and can each be built by running `foo data.foo'.
In other words it is equivalent to:

     # We do not want this.
     data.c: data.foo
             foo data.foo
     data.h: data.foo
             foo data.foo

which means that `foo' can be run twice.  Usually it will not be run
twice, because `make' implementations are smart enough to check for the
existence of the second file after the first one has been built; they
will therefore detect that it already exists.  However there are a few
situations where it can run twice anyway:

   * The most worrying case is when running a parallel `make'.  If
     `data.c' and `data.h' are built in parallel, two `foo data.foo'
     commands will run concurrently.  This is harmful.

   * Another case is when the dependency (here `data.foo') is (or
     depends upon) a phony target.

   A solution that works with parallel `make' but not with phony
dependencies is the following:

     data.c data.h: data.foo
             foo data.foo
     data.h: data.c
  
The above rules are equivalent to

     data.c: data.foo
             foo data.foo
     data.h: data.foo data.c
             foo data.foo

therefore a parallel `make' will have to serialize the builds of
`data.c' and `data.h', and will detect that the second is no longer
needed once the first is over.
   
   Using this pattern is probably enough for most cases.  However it
does not scale easily to more output files (in this scheme all output
files must be totally ordered by the dependency relation), so we will
explore a more complicated solution.

   Another idea is to write the following:

     # There is still a problem with this one.
     data.c: data.foo
             foo data.foo
     data.h: data.c

The idea is that `foo data.foo' is run only when `data.c' needs to be
updated, but we further state that `data.h' depends upon `data.c'.
That way, if `data.h' is required and `data.foo' is out of date, the
dependency on `data.c' will trigger the build.

   This is almost perfect, but suppose we have built `data.h' and
`data.c', and then we erase `data.h'.  Then, running `make data.h' will
not rebuild `data.h'.  The above rules just state that `data.c' must be
up-to-date with respect to `data.foo', and this is already the case.
   
   What we need is a rule that forces a rebuild when `data.h' is
missing.  Here it is:
      
     data.c: data.foo
             foo data.foo
     data.h: data.c
             @if test -f $@; then :; else \
               rm -f data.c; \
               $(MAKE) $(AM_MAKEFLAGS) data.c; \
             fi

   The above scales easily to more outputs and more inputs.  One of the
output is picked up to serve as a witness of the run of the command, it
depends upon all inputs, and all other outputs depend upon it.  For
instance if `foo' should additionally read `data.bar' and also produce
`data.w' and `data.x', we would write:
     
     data.c: data.foo data.bar
             foo data.foo data.bar
     data.h data.w data.x: data.c
             @if test -f $@; then :; else \
               rm -f data.c; \
               $(MAKE) $(AM_MAKEFLAGS) data.c; \
             fi

   There is still a minor problem with this setup.  `foo' outputs four
files, but we do not know in which order these files are created.
Suppose that `data.h' is created before `data.c'.  Then we have a weird
situation.  The next time `make' is run, `data.h' will appear older
than `data.c', the second rule will be triggered, a shell will be
started to execute the `if...fi' command, but actually it will just
execute the `then' branch, that is: nothing.  In other words, because
the witness we selected is not the first file created by `foo', `make'
will start a shell to do nothing each time it is run.

   A simple riposte is to fix the timestamps when this happens.

     data.c: data.foo data.bar
             foo data.foo data.bar
     data.h data.w data.x: data.c
             @if test -f $@; then \
               touch $@; \
             else \
               rm -f data.c; \
               $(MAKE) $(AM_MAKEFLAGS) data.c; \
             fi

   Another solution, not incompatible with the previous one, is to use a
different and dedicated file as witness, rather than using any of
`foo''s outputs.
     
     data.stamp: data.foo data.bar
             @rm -f data.tmp
             @touch data.tmp
             foo data.foo data.bar
             @mv -f data.tmp $@
     data.c data.h data.w data.x: data.stamp
             @if test -f $@; then \
               touch $@; \
             else \
               rm -f data.stamp; \
               $(MAKE) $(AM_MAKEFLAGS) data.stamp; \
             fi
   
   `data.tmp' is created before `foo' is run, so it has a timestamp
older than output files output by `foo'.  It is then renamed to
`data.stamp' after `foo' has run, because we do not want to update
`data.stamp' if `foo' fails.

   Using a dedicated witness like this is very handy when the list of
output files is not known beforehand.  As an illustration, consider the
following rules to compile many `*.el' files into `*.elc' files in a
single command.  It does not matter how `ELFILES' is defined (as long
as it is not empty: empty targets are not accepted by POSIX).

     ELFILES = one.el two.el three.el ...
     ELCFILES = $(ELFILES:=c)
     
     elc-stamp: $(ELFILES)
             @rm -f elc-temp
             @touch elc-temp
             $(elisp_comp) $(ELFILES)
             @mv -f elc-temp $@
     
     $(ELCFILES): elc-stamp
             @if test -f $@; then \
               touch $@; \
             else \
               rm -f elc-stamp; \      
               $(MAKE) $(AM_MAKEFLAGS) elc-stamp; \
             fi

-- 
Alexandre Duret-Lutz

[Prev in Thread]

Current Thread

[Next in Thread]

RFC: doc for `Handling Tools that Produce Many Outputs' (2nd round), Alexandre Duret-Lutz <=
- Re: RFC: doc for `Handling Tools that Produce Many Outputs' (2nd round), Richard Dawe, 2004/02/14
  - Handling Tools that Produce Many Outputs, Oren Ben-Kiki, 2004/02/14
    - Re: Handling Tools that Produce Many Outputs, Alexandre Duret-Lutz, 2004/02/15

Prev by Date: Re: Problem with configure
Next by Date: Re: RFC: doc for `Handling Tools that Produce Many Outputs' (2nd round)
Previous by thread: Problem with configure
Next by thread: Re: RFC: doc for `Handling Tools that Produce Many Outputs' (2nd round)
Index(es):
- Date
- Thread