[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #33138] .PARLLELSYNC enhancement with patch

From: Frank Heckenbach
Subject: Re: [bug #33138] .PARLLELSYNC enhancement with patch
Date: Sat, 27 Apr 2013 03:13:01 +0200

Eli Zaretskii wrote:

> You underestimate me ;-)
> What I have is actually this:
> [...]
> and I wrote 'fcntl' emulation for Windows that uses a mutex.

Indeed, I had not expected this. :-)

> That said, I'm not wedded to the above approach, and if people like a
> completely disjoint code for Windows, that would be a trivial change.
> I just wanted to comply with what Paul said, viz.:
> > > Also, where is the best place to put the emulated Posix functions?
> > > Some new file in w32/compat/? 
> > 
> > I'd like to see it there.  I'm thinking I want to move the new stuff out
> > of job.c even for POSIX systems.  The ifdefs are really getting to me.
> So I now have w32/compat/posixfcn.c with the emulation of fcntl (also
> used for CLOSE_ON_EXEC), and a few support functions it needs.

I see. If Paul agrees, I'm fine with it.

> > > > Nothing is actually read by lseek (and even if it were, it would
> > > > only need to look at the first and last part of the file, not read
> > > > all the content, if that was the worry).
> > > 
> > > Are you sure?  How can lseek "jump" to the last byte of the file, if
> > > the file is not contiguous on disk, except by reading some of it?
> > 
> > lseek doesn't need to read any data. It just sets the current offset
> > of the FD to the given position, so the next read (which in this
> > case never happens before seeking to the beginning) knows where to
> > read. Even in the case of SEEK_END, all it has to do is add the
> > given offset (here: 0) to the current file size.
> What I meant is that lseek doesn't just return the byte position, it
> also makes sure the next read or write happens at that position.  So
> at some point, some piece of software needs to tell the disk to move
> its reading head to the right point.  Whether this happens as part of
> lseek or the subsequent read/write, and whether this requires reading
> some of the data on the disk, is a matter of how this is implemented
> and what data structures does the filesystem maintain in memory at all
> times.

Sure, in theory it's up to the filesystem, but I'd be surprised if
any fs actually worked this way. There are several layers between an
lseek and the actual moving of disk heads (virtual fs, concrete fs,
block layer, caching, drive hardware etc.). In particular, if you
consider that other processes can read different files
simultaneously, it would be premature to position disk heads after
an lseek already. (If this is even possible; normally the OS sends a
request to read a block to the disk, and the disk controller does
the rest.) And as I said, "lseek (SEEK_END, 0)" is a rather
well-known idiom, so I don't think any fs implementor would consider
it an optimization to do anything like this. In short, all that's
needed for lseek is the current offset and the file size. Both
values really should be in memory for any open file.

> > Instead of testing, I just looked at the implementation (Linux
> > 3.2.2). The following is really the whole relevant code. As you see,
> > nothing's read from the disk, it only handles in-memory data. (Also
> > the file size is in memory for open files; even it were not, it
> > would be a constant-time access to the inode and wouldn't need to
> > touch any data blocks.)
> I timed lseek on Windows on very large files (hundreds of MBs), and
> found that a single lseek takes less than 1 usec, at least with NTFS
> volumes and in my core i7 box.  So, while more efficient ways of
> revealing whether the file is empty are possible, I don't think such a
> small penalty justifies yet another set of ifdef's.

As I expected.

> The only situation where lseek could be really expensive is if the
> volume is compressed, because lseek on Windows returns the
> uncompressed offsets in that case.  I don't have access to any machine
> which has such volumes, so I cannot test that.  (Does Unix support
> such filesystems?  If so, what does lseek do there?)

At least I don't have access to one right now. But I still wouldn't
expect it to take long. They surely store the uncompressed size
somewhere, and as I said that's all that's needed. When actually
reading from the new position is the right time to compute the
compressed position. Furthermore, we do this check only on our temp
files. Though the user could put them (i.e., the directory where
tmpfile() will create them, usually /tmp on Unix) on a compressed
fs, that's not exactly the wisest thing to do, and we might not
really have to optimize for this case. :-)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]