[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Add-rm-rf-as-examples-loadables-rm.c

From: Eric Blake
Subject: Re: [PATCH] Add-rm-rf-as-examples-loadables-rm.c
Date: Mon, 31 Oct 2016 16:44:21 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0

On 10/31/2016 04:14 PM, Chet Ramey wrote:

>> > Nice, thanks for the modifications.
> Here's the modified version.
> Chet
> -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars
> longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU address@hidden
> http://cnswww.cns.cwru.edu/~chet/

>   if ((dir = opendir(dirname)))
>     {
>       while ((dp = readdir(dir)))

Ouch.  This version has O(n^2) complexity when dealing with deep
hierarchies.  GNU Coreutils intentionally prefers using
openat()/fdopendir() and so on in order to recurse through the tree with
O(n) complexity instead of O(n^2).

>         {
>           if (*dp->d_name == '.' && (dp->d_name[1] == 0 || (dp->d_name[1] == 
> '.' && dp->d_name[2] == 0)))
>             continue;
>          fnsize = dirlen + 1 + strlen(dp->d_name) + 1;
>            fname = xmalloc (fnsize);
>            snprintf(fname, fnsize, "%s/%s", dirname, dp->d_name);

And here is part of the HUGE speed penalty that you get in large
directories.  You are mallocing a LOT of memory, and waste a LOT of work
since the majority of the time (while within a single directory) the
prefix is unchanged from the previous iteration, while in the deep
hierarchy; while doing everything directory-relative completely avoids
the need to construct ever-deeper names.

>            if (rm_file (fname) && force == 0)
>              err = 1;
>            free (fname);
>         }

Another speed sink - your implementation removes names in readdir()
order; GNU Coreutils has proven benchmark numbers that unlink()ing files
that are sorted by inode number is measurably faster than unlink()ing in
readdir() order for directories with a large number of files.  Yes, it
is more complex to set up the tracking structures to sort things, but it
pays off in the long run.

If you are going to make this a builtin, you REALLY ought to consider
having the builtin fall back to the NORMAL rm utility (which has already
taken care of speed issues that you have so naively regressed on) if
heuristics ever detect that you are descending more than a few levels
deep or if you have encountered more than a certain number of files in a
given directory.

>   while ((opt = internal_getopt (list, "rf")) != -1)
>     {
>       switch (opt)
>       {
>       case 'r':
>         recursive = 1;
>         break;

Per POSIX, 'rm -R' and 'rm -r' are identical; a couple tweaks to this
patch will make support for 'rm -R' work out of the box.

>       case 'f':
>         force = 1;
>         break;
>       default:
>         builtin_usage ();
>         return (EX_USAGE);
>       }

You do NOT support the POSIX-mandated 'rm -i'.  I don't necessarily
blame you (since it is complex), but what you should REALLY do is that
any time you don't recognize an option, your builtin should silently
call the real rm utility, rather than failing.  That way, your builtin
can be a faster drop-in when it recognizes the simple cases, but remains
POSIX-compliant (assuming the fallback app is POSIX compliant), as well
as offer ALL extensions that the normal fallback understands, rather
than being a crippled version that causes failures to code expecting
POSIX or particular extension behaviors.

>     }
>   list = loptend;
>   if (list == 0)
>     return (EXECUTION_SUCCESS);

This does not match the behavior of GNU coreutils.  'rm' without
arguments prints a usage error and exits with nonzero status.  But note
that POSIX requires 'rm -f' without arguments to silently exit with
success.  POSIX only specifies the behavior of 'rm -f' without
arguments; with 'rm', you can do what you want (therefore you comply
with POSIX), but it's still nice to match what other implementations do.

While I think that it is neat that this builtin can be used to speed up
configure scripts, I don't think it is ready for prime time yet.  It's
tough to recommend that autoconf be taught to use this sub-standard

Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]