[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug #33138] .PARLLELSYNC enhancement with patch

From: Eli Zaretskii
Subject: Re: [bug #33138] .PARLLELSYNC enhancement with patch
Date: Fri, 26 Apr 2013 11:40:56 +0300

> Date: Thu, 25 Apr 2013 02:16:33 +0200
> Cc: address@hidden, address@hidden
> From: Frank Heckenbach <address@hidden>
> > > On Windows, you said fstat was very expensive, didn't you? Or is
> > > lseek even worse?
> > 
> > I think anything that potentially moves the file pointer can be
> > sometimes expensive and is best avoided.  (On Windows, I'd use
> > GetFileInformationByHandle.)
> OK, if that's so, do that. But I don't think that's true on POSIX.

I don't think it's worth doing on Windows as well, see below.

> > > Nothing is actually read by lseek (and even if it were, it would
> > > only need to look at the first and last part of the file, not read
> > > all the content, if that was the worry).
> > 
> > Are you sure?  How can lseek "jump" to the last byte of the file, if
> > the file is not contiguous on disk, except by reading some of it?
> lseek doesn't need to read any data. It just sets the current offset
> of the FD to the given position, so the next read (which in this
> case never happens before seeking to the beginning) knows where to
> read. Even in the case of SEEK_END, all it has to do is add the
> given offset (here: 0) to the current file size.

What I meant is that lseek doesn't just return the byte position, it
also makes sure the next read or write happens at that position.  So
at some point, some piece of software needs to tell the disk to move
its reading head to the right point.  Whether this happens as part of
lseek or the subsequent read/write, and whether this requires reading
some of the data on the disk, is a matter of how this is implemented
and what data structures does the filesystem maintain in memory at all

> Instead of testing, I just looked at the implementation (Linux
> 3.2.2). The following is really the whole relevant code. As you see,
> nothing's read from the disk, it only handles in-memory data. (Also
> the file size is in memory for open files; even it were not, it
> would be a constant-time access to the inode and wouldn't need to
> touch any data blocks.)

I timed lseek on Windows on very large files (hundreds of MBs), and
found that a single lseek takes less than 1 usec, at least with NTFS
volumes and in my core i7 box.  So, while more efficient ways of
revealing whether the file is empty are possible, I don't think such a
small penalty justifies yet another set of ifdef's.

The only situation where lseek could be really expensive is if the
volume is compressed, because lseek on Windows returns the
uncompressed offsets in that case.  I don't have access to any machine
which has such volumes, so I cannot test that.  (Does Unix support
such filesystems?  If so, what does lseek do there?)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]