Re: type errors, command length limits, and Awk

automake

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: type errors, command length limits, and Awk

From:	Jacob Bachmeyer
Subject:	Re: type errors, command length limits, and Awk
Date:	Wed, 16 Feb 2022 00:04:40 -0600
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.22) Gecko/20090807 MultiZilla/1.8.3.4e SeaMonkey/1.1.17 Mnenhy/0.7.6.0

Mike Frysinger wrote:

On 15 Feb 2022 21:17, Jacob Bachmeyer wrote:

Mike Frysinger wrote:

context: https://bugs.gnu.org/53340

Looking at the highlighted line in the context:


thanks for getting into the weeds with me


You are welcome.

  echo "$$py_files" | $(am__pep3147_tweak) | $(am__base_list) | \
It seems that the problem is that am__base_list expects ListOf/File (andproduces ChunkedListOf/File) but am__pep3147_tweak emits ListOf/Glob.This works in the usual case because the shell implicitly converts Glob-> ListOf/File and implicitly flattens argument lists, but results inthe overall command line being longer than expected if the globs expandto more filenames than expected, as described there.
It seems that the proper solution to the problem at hand is to haveam__pep3147_tweak expand globs itself somehow and thus provideListOf/File as am__base_list expects.
Do I misunderstand?  Is there some other use for xargs?
if i did not care about double expansion, this might work.  the pipeline
quoted here handles the arguments correctly (other than whitespace splitting
on the initial input, but that's a much bigger task) before passing them to
the rest of the pipeline.  so the full context:

  echo "$$py_files" | $(am__pep3147_tweak) | $(am__base_list) | \
  while read files; do \
    $(am__uninstall_files_from_dir) || st=$$?; \
  done || exit $$?; \
...
am__uninstall_files_from_dir = { \
  test -z "$$files" \
    || { test ! -d "$$dir" && test ! -f "$$dir" && test ! -r "$$dir"; } \
    || { echo " ( cd '$$dir' && rm -f" $$files ")"; \
         $(am__cd) "$$dir" && rm -f $$files; }; \
  }

leveraging xargs would allow me to maintain a single shell expansion.
the pathological situation being:
  bar.py
  __pycache__/
    bar.pyc
    bar*.pyc
    bar**.pyc

py_files="bar.py" which turns into "__pycache__/bar*.pyc" by the pipeline,
and then am__uninstall_files_from_dir will expand it when calling `rm -f`.

if the pipeline expanded the glob, it would be:
  __pycache__/bar.pyc __pycache__/bar*.pyc __pycache__/bar**.pyc
and then when calling rm, those would expand a 2nd time.

If we know that there will be _exactly_ one additional shell expansion,why not simply filter the glob results through `sed 's/[?*]/\\&/g'` toescape potential glob metacharacters before emitting them fromam__pep3147_tweak? (Or is that not portable sed?)

Back to the pseudo-type model I used earlier, the difference betweenFile and Glob is that Glob contains unescaped glob metacharacters, soescaping them should solve the problem, no? (Or is there another thornnearby?)

[...]

which at this point i've written `xargs -n40`, but not as fast :p.


Not as fast, yes, but certainly portable!  :p

The real question would be if it is faster than simply running rm onceper file. I would guess probably _so_ on MinGW (bash on Windows, wherethat logic would use shell builtins but running a new process isextremely slow) and probably _not_ on an archaic Unix system where"test" is not a shell builtin so saving the overhead and just running rmonce per file would be faster.

automake jumps through some hoops to try and limit the length of generated
command lines, like deleting output objects in a non-recursive build.  it's
not perfect -- it breaks arguments up into 40 at a time (akin to xargs -n40)
and assumes that it won't have 40 paths with long enough names to exceed the
command line length.  it also has some logic where it's deleting paths by
globs, but the process to partition the file list into groups of 40 happens
before the glob is expanded, so there are cases where it's 40 globs that can
expand into many many more files and then exceed the command line length.

First, I thought that GNU-ish systems were not supposed to have sucharbitrary limits,


one person's "arbitrary limits" is another person's "too small limit" :).
i'm most familiar with Linux, so i'll focus on that.

[...]

plus, backing up, Automake can't assume Linux.  so i think we have to
proceed as if there is a command line limit we need to respect.

So then the answer to my next question is that it is still an issue,even if the GNU system were to allow arguments up to available memory.

and this issue (the context) originated from GentooGNU/Linux. Is this a more fundamental bug in Gentoo or still an issuebecause Automake build scripts are supposed to be portable to foreignsystem that do have those limits?
to be clear, what's failing is an Automake test.  it sets the `rm` limit to
an articially low one.  [...]

Gentoo happened to find this error before Automake because Gentoo also found
and fixed a Python 3.5+ problem -- https://bugs.gnu.org/38043.  once we fix
that in Automake too, we see this same problem.  i'll remove "Gentoo" from
the bug title to avoid further confusion.

The bug still originated on Gentoo, but that the test is run using anartificially low limit is new information to me. In other words,eliminating the limit is not a solution here: this is specificallyabout a feature of working around those limits.

I note that the current version of standards.texi also allows configureand make rules to use awk(1); could that be useful here instead? (see below)
[...]
i noticed that autoconf uses awk.  i haven't dug deeper though to see what
language restrictions are there.  GNU awk is obviously out, and POSIX awk
isn't so bad, but do autotools target lower?

In my experience thus far, "lower" than POSIX Awk is almost unusable orat least very tedious to use. There are some significant convolutionsin that script in DejaGnu to work around the limitations of thenon-POSIX "awk" on Solaris 10 at a point before POSIX awk has beenfound. I would recommend directly targeting POSIX Awk, since GNU Awkhas a POSIX mode that inhibits the GNU extensions (`gawk --posix`) toease testing, and by the time Automake rules are running, configureshould have already located a POSIX Awk on the system.

If you still want to support very old systems without POSIX Awk at all,I would consider the simple, slow, safe approach of running rm for eachfile appropriate if you do not simply keep the current version ofam__base_list for that case.

  it doesn't quite solve the
problem though as the biggest issue is interacting with the filesystem via
globs and quoting.  awk doesn't have a glob().  it has a system() which is
just arbitrary shell code which is what i already have :(.

The main advantage I see awk providing for am__base_list is the"length()" builtin, so you could both count how many entries have beenplaced on the list (using an ordinary awk variable) and keep track ofthe overall length of the list itself (using length() on the variablewhere you build up the list); something like:


8<------
awk 'BEGIN { maxlen = @maxlen@ ; maxarg = @maxarg@ }
{ for ( i = 1; i <= NF; i++)
   if ( length(out) + length($i) >= maxlen || args >= maxarg ) {
     print out ; out = "" ; args = 0
   } else { out = out" "$i ; args++ }
} END { print out }'
8<------

where @maxlen@ and @maxarg@ would be determined and substituted byconfigure. (Feel free to use or adapt that code under GPL3+, by theway, if it is enough to not simply be inherently public domain.)

i think if we're at the point where we have to probe the functionality of
tools, i think probing for xargs (or find) is simpler.  we can leverage it
if available, otherwise fallback to doing one `rm` per file.  i think that
will make it perform well on the vast majority of systems while not breaking
anyone anywhere.

Probing the functionality of tools is why configure exists in the firstplace, is it not? :-)



-- Jacob

[Prev in Thread]

Current Thread

[Next in Thread]

Re: portability of xargs, (continued)
- Re: portability of xargs, Nick Bowler, 2022/02/15
  - Re: portability of xargs, Bob Friesenhahn, 2022/02/15
    - Re: portability of xargs, Paul Smith, 2022/02/15
    - Re: portability of xargs, Bob Friesenhahn, 2022/02/15
    - Re: portability of xargs, Paul Smith, 2022/02/15
    - Re: portability of xargs, NightStrike, 2022/02/15
- type errors, command length limits, and Awk (was: portability of xargs), Jacob Bachmeyer, 2022/02/15
  - Re: type errors, command length limits, and Awk (was: portability of xargs), Mike Frysinger, 2022/02/15
    - Re: type errors, command length limits, and Awk (was: portability of xargs), Dan Kegel, 2022/02/16
    - Re: type errors, command length limits, and Awk, Jacob Bachmeyer <=

Prev by Date: Re: type errors, command length limits, and Awk (was: portability of xargs)
Previous by thread: Re: type errors, command length limits, and Awk (was: portability of xargs)
Index(es):
- Date
- Thread