[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: "wait" bug, now or ever?
Re: "wait" bug, now or ever?
Sat, 11 Oct 2008 19:29:05 -0600
Larry Clapp wrote:
> A coworker of mine scatters "wait" around his code a lot. I've seen
> "wait" after "echo" (and no, he's not waiting on some previously
> backgrounded process):
> echo something
Hmm... I am surprised I am the first person to weigh in on this
> He asserts "The wait command waits until a program has *completely*
> finished", or words to that effect.
That is specious reasoning at best. The Unix process model has
defined behavior and the shell wait command has defined behavior.
That assertion does not match with the defined behavior. Read the
documentation on wait for details. Start here:
$ help wait
wait: wait [n]
Wait for the specified process and report its termination status. If
N is not given, all currently active child processes are waited for,
and the return code is zero. N may be a process ID or a job
specification; if a job spec is given, all processes in the job's
pipeline are waited for.
That is what it does.
> Basically he says he's seen where a program does some redirection,
> exits, and the file isn't done yet:
> some_program > some_file
> # some_file isn't finished yet!
> # Now it's done!
I rarely state things in absolutes and it makes me uncomfortable to do
so but let me say that this is just plain wrong. That isn't what is
happening. There is no basis for it. This might as well be fear of
stepping on a sidewalk crack or skipping the number 13 or knocking on
wood. This reads more of superstition and not of any actual causality.
> I have not cornered him on the exact circumstances in which he's seen
> this behavior, but I certainly never have --
I think I can shed some light on this behavior. I am sure that he is
misinterpreting NFS filesystem buffering and its lack of cache
coherency. This type of problem is common in NFS environments. But
it has nothing to do with the above example. It occurs when accessing
files from different hosts using different filesystem buffer caches.
People accidentally trip into this problem very often when using
ssh/rsh or job queue systems or anything that coordinates processes on
different machines that access the same files. This is a very common
problem. I feel confident that this is the root of the superstition
and that workarounds for it are being applied inappropriately in other
Search the web for nfs cache coherency and specifically close-to-open
cache consistency and you should find much discussion of the problems.
See specifically the Linux NFS FAQ.
> but on the other hand, I've used ksh and zsh my entire career; bash,
> as such, is new to me.
It isn't going to be shell specific. All of the shells will be the
same with this regard and all operate within the operating system's
> Here are my thoughts on this behavior, most likely first:
> - I tend to think he had some code at one point that he didn't
> completely understand (or had forgotten some details of) that was
> running stuff in background and he didn't realize it, and he
> experienced this problem, and the "wait" fixed it, so he's
> scattered "waits" around his scripts ever since.
I am sure you are correct here. I am sure that this person used to
work in an NFS environment across multiple machines and ran into NFS
cache coherency issues. I am confident this guess is very likely.
I much prefer it when people's superstitions are restricted to
avoiding walking under ladders or wearing a certain same pair of socks
on game days. Those things don't adversely affect the code they leave
> (I also plan to corner him on the exact circumstances of this bug, but
> wanted to explore this avenue in parallel.)