[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Make run in parallel mode with output redirected to a regular file c

From: Frank Heckenbach
Subject: Re: Make run in parallel mode with output redirected to a regular file can randomly drop output lines
Date: Wed, 29 May 2013 09:32:55 +0200

Eli Zaretskii wrote:

> > Date: Mon, 27 May 2013 00:42:34 +0200
> > From: Frank Heckenbach <address@hidden>
> > Cc: address@hidden
> > 
> > One issue, though it might seem strange that I'm the one to mention
> > it, is that it might be POSIX specific. How do other systems behave,
> > can they set O_APPEND via fcntl or otherwise
> This can be done on Windows by creating a new file descriptor that has
> the O_APPEND bit set, and then using dup2 to force stdout/stderr refer
> to that file descriptor.  (This is theory; I should try that and see
> if it actually works.)

I don't think this would work, as least on systems I know (mostly
POSIX), since we're talking about altering the flags of the
stdout/stderr given to us. We don't usually have its filename to
open it again; it may not even have a filename (e.g., it might be a
file created and deleted; or it might be a pipe, a socket, etc.), or
it might not be possible to reopen it (maybe we don't have
permissions anymore; or again sockets) ...

If Windows has a function to make a copy of a FD, whatever it is,
with new flags, this plus dup2 would be mostly equivalent to fcntl
for our purposes indeed. (Though I doubt it has one, since from what
I've seen, it generally doesn't seem to treat files, pipes, etc.

Paul Smith wrote:

> POSIX guarantees that if you open a file in O_APPEND mode, the above
> race can never happen because the kernel updates the file offset as the
> file is being written.
> Frank's question is whether other, non-POSIX systems have the same
> behavior with O_APPEND.  Of course if they don't I don't see how it
> would make things worse than they are now.

I don't think it would make things worse, it might just cause
package authors to ignore the issue on their end, so if what we do
works only on POSIX and they test only on POSIX, they might not
notice that there is a possible problem elsewhere.

Of course, this doesn't apply if other systems serialize writes even
without O_APPEND, and the whole discussion is moot for those

Eli Zaretskii wrote:

> > From: Paul Smith <address@hidden>
> >
> > The original issue reported is that if you do something like this:
> > 
> >     make -j >make.out
> > 
> > and your make environment is recursive so you invoke one or more
> > sub-makes, your output may not just be interspersed (that is output
> > between multiple jobs are mixed together) but you will actually lose
> > some output: it will never appear at all.
> > 
> > The reason is that when you have multiple processes trying to update the
> > same file at the same time using standard output file mode, there is a
> > race condition between when the output is written to the file and when
> > the "current offset" value is updated, where multiple processes could be
> > overwriting the same part of the file.
> It sounds strange to me that the filesystem doesn't serialize the
> writes.  Maybe I'm naive.

I don't know the exact reasons. Perhaps it's just for efficiency, to
avoid synchronization by the OS for a rather special case, i.e.
different processes writing to the *same* file concurrently. If you
look at it this way, it smells like trouble because the question is,
how to merge the various writes. There are basically two answers:
Either the programs care about it themselves (in which case they
must cooperate, so they can also synchronize themselves), or it's
done automatically in the only sane way I can think of, i.e.
appending. Therefore POSIX makes an explicit guarantee for O_APPEND.
That's how I understand it.

In other words, you might in trouble as soon as you duplicate a
writable FD without O_APPEND set. "make -j" of course may do just
that, if its stdout/stderr is a regular file without O_APPEND. But
it's not particular to make. Any simple program that forks another
one (perhaps just a shell script starting a background job) is in
the same situation if both programs write to stdout/stderr. So if I
understand it correctly:

% cat foo
echo foo &
echo bar
% ./foo > bar

Whoops, undefined behaviour. (Though it seems unlikely for the
problem to actually occur in such a simple case.)

It seems the real culprit in all of these cases it redirecting
stdout/stderr to a log file with ">". Of course, people do this all
the time because they usually only think of the open-time effects of
">" vs. ">>" (i.e., truncation or not) and not about the effects
further down. Unfortunately, the shell has no easy way to open a
file with truncation and appending (which is what one really wants
here), and most people are too lazy (or not aware of the need) to do
the two-step procedure (remove and ">>").

So on an abstract level I still think make has no business messing
with the FD flags, since make is just one example in a large class
of affected programs. In practice, though, it may be a very
important example, and since we're not gonna convince everyone to
use ">>", it may indeed be the pragmatically best thing to set
O_APPEND (unless we discover actual problems with it).

That said, I'm now going back to my own programs which redirect
stdout in forked child processes and add O_APPEND to O_TRUNC ...

reply via email to

[Prev in Thread] Current Thread [Next in Thread]