help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Speeding up a find and subsequent grep


From: Koichi Murase
Subject: Re: Speeding up a find and subsequent grep
Date: Sat, 19 Dec 2020 23:57:10 +0800

2020年12月19日(土) 21:56 Chris Elvidge <celvidge001@gmail.com>:
> On 19/12/2020 12:04 pm, Jeffrey Walton wrote:
> > I'm working on CentOS 7, x86_64, fully patched. The server has a
> > manual wiki installation. Ownership and permissions need to be set
> > after an update. I've got a script that does it. The slowest part of
> > the script is this:
>
> [...]
> I don't know if it will help but: make a executable script like this
> [...]
> Gets rid of the while read. Works with filenames with spaces.

The bottleneck here is not `while read`.  Although `while read` is a
relatively slow one among the bash builtin features, it is still
faster than spawning processes.  Using a separate Bash script even
increases the number of spawns so will eventually slow down the whole
processing.

To improve the performance, one should first reduce the number of
spawns (i.e., the number of the calls of external commands).  For
example, if you can use Bash-specific features in your script, you may
write it in something like the following way (see Note below):

  #!/bin/bash
  shopt -s lastpipe
  find "$WIKI_DIR" -type f -print0 | # 1 fork, 1 exec
    mapfile -d '' -t filenames
  printf '%s\0' "${filenames[@]}" |  # 1 fork
    xargs -0 -P 1 file -b |          # 1 fork, 1 exec
    mapfile -t filetypes
  executables=()
  normalfiles=()
  for ((i=0;i<${#filenames[@]};i++)); do
    case ${filetypes[i]} in
    (*executable*|*script*) executables+=("${filenames[i]}") ;;
    (*)                     normalfiles+=("${filenames[i]}") ;;
    esac
  done
  ((${#executables[@]})) && echo chmod u=rwx,g=rx,o=
"${executables[@]}" # 1 fork
  ((${#normalfiles[@]})) && echo chmod u=rw,g=r,o=
"${normalfiles[@]}" # 1 fork

Note: I haven't tested the above code thoroughly, and also it uses
"mapfile -d ''" (separate stdin by \0) of Bash 4.4.  Since the Bash
version in CentOS 7 seems to be Bash 4.2, it needs to be adjusted
anyway.  If the filenames can be assumed not to contain newlines,
one can just separate the result of "find" by newlines.

--
Koichi



reply via email to

[Prev in Thread] Current Thread [Next in Thread]