bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?


From: James Youngman
Subject: Re: Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?
Date: Sun, 23 Sep 2018 23:04:32 +0100

On Sun, Jan 28, 2018 at 6:57 PM Bernhard Voelker <address@hidden> wrote:
On 01/27/2018 06:45 PM, Peng Yu wrote:
>> glusterfs doesn't provide D_TYPE information:
>>
>> getdents(4, {{d_ino=10054722685526780333, ..., d_type=DT_UNKNOWN} ...
>>
>> Nevertheless, it is strange that find calls newfstatat() also
>> in the case of "-maxdepth 1" - it shouldn't need to.
>
>
> Should this be considered as a performance bug of 'find'?

well, maybe.

I could reproduce this case with sshfs where getdents also returns DT_UNKNOWN.

   $ mkdir -p ~/tmp/d1 \
       && seq 10000 | xargs env -C ~/tmp/d1 touch

   $ mkdir -p ~/tmp/mnt \
       && sshfs localhost:tmp/d1 ~/tmp/mnt

   $ strace -ve getdents,newfstatat find ~/tmp/mnt -maxdepth 1

   $ strace -ve getdents,newfstatat find -D search ~/tmp/mnt -maxdepth 1 -name doesntmatter

The problem seems to be that gnulibs' fts_read() already tries to determine
whether the current item is a directory [1]:

   [...]
   getdents(4, [], 32768)                  = 0
   newfstatat(5, "8793", {st_dev=makedev(0, 46), st_ino=2, st_mode=S_IFREG|0644, ...}, AT_SYMLINK_NOFOLLOW) = 0

before find() sees it [2]:

   consider_visiting (early): ‘/home/berny/tmp/mnt/8793’: fts_info=FTS_F , [...]

@James: do you have an idea how to work around this?

In short, no.  I believe the interface between find and fts doesn't allow find to provide the information to fts which would make fts understand it won't need to stat the directories.    But it's possible fts doesn't need to perform the stat anyway, I don't know that code well enough to know why it does it.

[1]
https://git.sv.gnu.org/cgit/gnulib.git/tree/lib/fts.c?id=d4f6a210f44a#n1054
[2]
https://git.sv.gnu.org/cgit/findutils.git/tree/find/ftsfind.c?id=040f20b91e#n559



findutils already passes the FTS_NOSTAT flag to fts.   However, this part (i.e. [1]) of the fts code only skips the stat if the leaf optimisation tells us there are no subdirectories.

I don't know why, offhand, the fts code works this way; gnulib folks, any suggestions as to why?  Is it because fts needs to figure out whether to set fts_info to FTS_D (and provide stat data) or to FTS_NSOK (and maybe skip it)?

I'd guess that the leaf optimisation is not enabled for glusterfs.  That is, I assume that st_nlink is not a predictor of subdirectory count on that file system, though  the strace output provided earlier in the thread doesn't include the st_nlink value obtained for ".", so I can't tell for sure if the leaf optimisation would work on glusterfs.   

James.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]