bug-make
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #33138] .PARLLELSYNC enhancement with patch


From: Eli Zaretskii
Subject: Re: [bug #33138] .PARLLELSYNC enhancement with patch
Date: Wed, 24 Apr 2013 22:39:09 +0300

> Date: Wed, 24 Apr 2013 20:46:14 +0200
> Cc: address@hidden, address@hidden
> From: Frank Heckenbach <address@hidden>
> 
> > > I don't see why it would terminate prematurely
> > 
> > It was long ago, but I presume I thought about the ^Z character that
> > some programs write or interpret to signal end of file.  Writing that
> > to the console stops output, AFAIR.
> 
> Oh, that's still done? When I last used Dos in the 1990s it was
> already obsolete and few programs did it. OK, so you'll have to
> strip them. Fortunately, that's not too hard, I assume you have
> already implemented it.

No need: if the file is read and written in binary mode, ^Z is just a
character like any other.

> > I just mentioned some caveats that people might find surprising,
> > that's all.  Perhaps those caveats should be mentioned in the
> > documentation.  No other intentions.
> 
> OK, we can add a few sentences in the docs. Have you already written
> something (just to avoid further dupliation of work)?

No.  Go ahead, if you have time.

> Really, I just tried to be helpful.

I realize, and I am grateful.  The discussion was very helpful.

> > There's one issue that perhaps needs discussing.  A mutex is
> > identified by a handle, which on Windows is actually a pointer to an
> > opaque object (maintained by the kernel).
> 
> And the pointer is the same between different processes?

I arrange for it to be inherited, therefore yes.  This way, I don't
have to invent a unique name and don't risk a slim chance that some
stale process left a mutex by the same name around.  (Still need to
make sure that nameless mutexes will do the job in this case, though.)

> > As such, using just 'int'
> > for sync_handle is not wide enough, certainly not in 64-bit builds.
> > Is it OK to use intptr_t instead?  Doing this cleanly might require to
> > have a macro (I called it FD_FROM_HANDLE) to extract a file descriptor
> > that is passed to a Posix fcntl; on Windows that macro will be a
> > no-op, and on Posix platforms it's a simple cast.
> > 
> > Is this OK?
> 
> I can't follow you here. On POSIX, we don't need to pass a fd
> because it's always stdout/stderr. Or do mean something else?

I meant the file descriptor passed to fcntl to put a lock on it.  The
value of that descriptor is stored in sync_handle, whose type is
'int'.

> > > +/* Test whether a file contains any data. */
> > > +static int
> > > +fd_not_empty (int fd)
> > > +{
> > > +  return fd >= 0 && lseek (fd, 0, SEEK_END) > 0;
> > > +}
> > 
> > Isn't this unnecessarily expensive (with large output volumes)?  Why
> > not use fstat?
> 
> On POSIX both lseek and fstat are not very expensive.

But fstat should be cheaper, as it already tracks the size of the
file, which is what you want here.

> On Windows, you said fstat was very expensive, didn't you? Or is
> lseek even worse?

I think anything that potentially moves the file pointer can be
sometimes expensive and is best avoided.  (On Windows, I'd use
GetFileInformationByHandle.)

> > There are 2 separate things intermixed here: determination of
> > combined_output and determination of the resource to use as a mutex.
> > They are mixed because they both deal with file descriptors, and it
> > was just convenient, by sheer luck (or maybe because of something
> > else) to save a few if's and do them together.
> 
> I talked about this yesterday or so. I still don't see why you can't
> keep this code as it.

Because the Windows implementation doesn't care whether stdout or
stderr (or both) are "OK".  It doesn't use their file descriptor for
locking, so it doesn't care.  It just needs to know, for the purposes
of combined_output, whether they point to the same file/device.  But
for determining what to use for sync_handle, this is irrelevant.  So
the logic looks different.

> You can use sync_handle as a flag to make sure the code is executed
> only once and ignore it for the locking purpose.

I need to store the handle for the mutex somewhere, and sync_handle
sounds just like the right place.

> Nothing is actually read by lseek (and even if it were, it would
> only need to look at the first and last part of the file, not read
> all the content, if that was the worry).

Are you sure?  How can lseek "jump" to the last byte of the file, if
the file is not contiguous on disk, except by reading some of it?

Anyway, we should time this instead of arguing.

Or maybe we should abandon this optimization and take the lock
regardless.  How bad can that be?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]