bug-make
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #46242] Race condition when input file is updated while compili


From: Reinier Post
Subject: Re: [bug #46242] Race condition when input file is updated while compiling
Date: Tue, 20 Oct 2015 23:22:18 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue Oct 20 20:36:13 2015, address@hidden (Egmont Koblinger) wrote:

[...]

> > I don't think make should be worried about this potential race; it's an
> > obvious case of user error with an easy discipline for the user to apply
> > to avoid any problems.
> 
> Is it really obvious?  Is it documented somewhere?  Do you really think
> pretty much all users of make think about it?  I, for one, first used make
> (not just typed the command, but understood its basics, wrote a tiny
> Makefile for a tiny project) about 18-20 years ago, yet this race condition
> never occurred to me until recently.

I don't think it's obvious to new users, but it should be obvious to
anyone who has been working with concurrent or failure-prone systems
that it is essential to distinguish between atomic state changes
(which either completely fail or completely succeed and never allow
anything else to see any intermediate state) and any other sort of
change (which may partly fail, which may proceed in intermediate stages
with ill-defined intermediate states, etcetera).  Once you do this
and you read that make simply executes the arbitray commands that
the user specifies as recipes, it should be obvious that neither
these recipes nor make's way of using them provide any guarantee
of atomic execution.

One way of dealing with this is just accepting the resulting risk
of malfunction.  By default, everything will be executed sequentially
and most recipes have a very slim chance of leaving a result that
make interprets as success when it shouldn't.  if you must exclude
that chance 100%, I think the best approach is to protect your
non-atomic recipes by adding a final step that atomically updates
the target(s), and is the only step to touch the target(s).

> Many years ago I was wondering what "make" would do if the source and
> destination files had the exact timestamp.  I had better things to do than
> actually try it out or look it up, I just assumed that developers had put
> proper thought to it and found the solution that Just Works™.  Nowadays
> with nanosecond timestamps I just don't care anymore.

You should, e.g. if you mount files from remote filesystems.

Relying on timestamps has some problems, but they are not exactly
the same as problems with assuming atomicity of recipe execution.
 
> Had I thought about it, I probably would have just guessed that I'm not the
> first one with this race condition problem, would have assumed that there's
> already a clever solution for this in place which, again, Just Works™.
> Seeing that there isn't any, I'm kindly asking you to think about it and
> come up with one.

Dan Bernstein has written about this and wrote some utilities
to combat the problem:

  http://cr.yp.to/redo/atomic.html
 
> > It's a rough-and-ready tool for developers, with
> > no pretenses of being intended for use by those who don't think about
> > how their tools work.
> 
> Do you really claim I should deeply think about all corner cases of the
> tools before using them?

No, and in 99% of cases of using 'make', you shouldn't either.
But indeed, as soon as you try to do anything more advanced or
non-standard you'll run into all sorts of limitations.

> Given that "make" is just a tiny little fraction
> of all the tools I'm using, it's practically impossible.  In fact, I think
> it's quite the opposite!  The creators of such tools should do their best
> to make sure that the tools work reliably as expected.

I think 'make' is a perfect example of 'Worse is Better':

  https://www.dreamsongs.com/RiseOfWorseIsBetter.html

it tries to remain simple at the expense of trying to be perfect.

> I didn't think about the internals of make -- why should have I?  I knew
> the goal it was serving: to create an up-to-date build in as little overall
> time as possible.  With my particular development pattern it fails to
> achieve this goal.  I don't think it's me who should've thought of this,
> nor that it's me who should change my workflow (touch the file after make
> completes, or not save while make is running, or start drinking coffee, or
> accept that I can not parallelize my time with the CPU's time, effectively
> taking away long minutes of my life many times for no good reason).

But solving that problem perfectly is very hard, impossible to be
precise, when build steps can be arbitrary commands, as is the
case for 'make'; it cannot do magic, and it doesn't try to.

I agree it would be nice if its users didn't fall into the trap
of relying on false assumptions of what make can do.
 
> In my firm opinion, it should be make's job to do whatever it can to
> produce an up-to-date build under any circumstance.  Hey, that's what make
> is all about!

True.  But at what expense?
 
> > Of course, if a file-system had a separate "up to date at" file-stamp,
> > that make can set (to the newest dependency's time-stamp) and later
> > read, or if make were to store this same datum in a separate database,
> > it would solve your problem.  It would also make it possible to have
> > make rules that tentatively build the new file, compare to existing,
> > replace if (meaningfully) different but only revise "up to date at"
> > otherwise.
> 
> Such a tiny "database" could indeed solve this problem without race
> condition.  I would be more than happy to see this implemented!  It
> wouldn't have to be a "database", just a simple ephemeral text file that's
> removed on clean exit of make, listing all the currently running rules
> that'd have to be rerun if this file exists on startup (meaning that the
> previous run was interrupted or crashed).

This is just one of many limitations you're going to run into.
For instance, you might run into cases where the same file can
have several stages of uptodateness (e.g. in LaTeX builds) or
where you rely on a file *not* being present, or where your whole
build process structure subtly depends on the setting of certain
environment variables or properties of the system you're on, etc.
Plenty of problems that require a lot of subtle Makefile hacking to get
it right, when it is possible at all.  This is just one of them.

> > As you note, every command that might be used in a make rule would need
> > to do this "right", which is a prohibitive requirement.
> >
> > Requiring every command that could be used as a make rule to abide by
> > this is, again, prohibitive.  [...]
> 
> Some of these ideas I've thrown in might not work in practice, I agree.
> 
> Please note that I had a simple propsal: re-check the timestamps of input
> files when a rule completes and re-run (moving away or dating back the
> previous output file first) if any of them changed.  This would not elimite
> the race condition completely, but would make the window magnitudes smaller
> (e.g. fraction of a millisecond instead of 15 seconds), and as such, would
> already be a great improvement!

It would help, but more can be done.  However, that will take
you away from standard 'make', which is a concern in itself.
See e.g. makepp:

  http://makepp.sourceforge.net/2.0/makepp_build_algorithm.html

> cheers,
> egmont

-- 
Reinier Post
TU Eindhoven



reply via email to

[Prev in Thread] Current Thread [Next in Thread]