bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Can "gawk -i extension" be made safer?


From: Andrew J. Schorr
Subject: Re: Can "gawk -i extension" be made safer?
Date: Mon, 26 Jun 2023 21:40:55 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

On Mon, Jun 26, 2023 at 07:36:50PM +0100, Stephane CHAZELAS wrote:
> 2023-06-26 09:43:07 -0400, Andrew J. Schorr:
> [...]
> > Thanks for raising this issue; it's an interesting question.
> > But I think Arnold is correct that it would be problematic to change
> > gawk's established default behavior.
> 
> Thanks,
> 
> Note that the new -I I was suggesting would not change gawk's
> established behaviour. That would not fix scripts that currently
> use -i or @include, but at least would allow script writer to
> switch to a safer API going forward.

There is actually already an -I option used for tracing:

bash-5.1$ ./gawk --help | fgrep -e -I
        -I                      --trace

> Maybe a @use (a la perl) or @import (a la python) or @require...
> could be the corresponding directive.

Is it really worth adding a new directive when the problem here is actually
that the path is not set appropriately?

> My understanding is that the -Wposix mode is intended for users
> that don't care about gawk extensions and want a awk that
> behaves the standard way. The current behaviour of -f that looks
> for files in $AWKPATH or looks for them with ".awk" added breaks
> POSIX compliance, so changing the behaviour as I suggested in
> POSIX mode would restore compliance and would be unlikely to
> break any script.

I'm going to leave the language lawyering to Arnold. I don't know whether
gawk --posix mode should care much about how to find source code.
My sense is that the focus is on how to interpret the programs themselves,
not on how to find them.

> > Here are a couple of thoughts/questions pertaining to this:
> > 
> > 1. Should we consider patching extras/gawk.{sh,csh} to add
> > gawkpath_sanitize and gawklibpath_sanitize functions that remove any
> > directories from the path that are relative and not absolute?
> 
> I don't thinkg AWKLIBPATH has a problem. Its default value
> doesn't include "." or the empty string AFAICT.

I agree that the default value is fine, but presumably somebody could
change it.

> > You already provided the code for gawkpath_sanitize:
> > 
> > gawkpath_sanitize () {
> >     export AWKPATH="$(LC_ALL=C gawk '
> > BEGIN {
> >     n = split(ENVIRON["AWKPATH"], dirs, ":")
> >     for (i = 1; i <= n; i++)
> >     if (substr(dirs[i], 1, 1) == "/") {
> >             newawkpath = (newawkpath sep dirs[i])
> >             sep = ":"
> >     }
> >     print newawkpath
> > }')"
> > }
> 
> The case where AWKPATH ends up being empty in the end would need
> to be handled specially. Currently, if it's empty, gawk seems to
> consider it the same as if it was unset. That's different from
> $PATH handling where an empty PATH means: look in the current
> working directory, and in the current working directory only.
> 
> Either way, we'd likely need AWKPATH=/dev/null in that case to
> force extensions not to be found as there's no safe place where
> to look for them

Good point.

> That's what the sanitising code I suggested at
> https://unix.stackexchange.com/questions/749645/how-to-safely-use-gawks-i-option-or-include-directive/749646#749646
> does.

Fine.

> In any case, I understand those functions are intended for *users* of gawk,
> to be used in their ~/.profile to customise their terminal login
> sessions.

Agreed.

> The setting of the environment variable affects their own usage
> of gawk and that of all the software written in gawk that they
> may use in that environment (introducing for them the
> non-backward compatible change that Arnold is objecting to).
> 
> That doesn't help *authors* or software written in gawk.
> 
> Also searching in the current working directory is desirable for
> -f or -E (where IMO arguments should only be interpreted as
> paths) and some usages of -i/@include while it is unwelcome for
> usages of -i extension.

I'm confused by that point. You seem to say that it's desirable for
some usages of -i, and then say that it is unwelcome. And why is it desirable
for -E?

> [...]
> > 2. Would a "safegawk" wrapper script that sanitizes the paths prior
> > to invoking gawk be useful? If so, should such a script be part of the
> > distribution or something that users should craft for themselves?
> > 
> > safegawk:
> > 
> > #!/bin/sh
> > 
> > . /etc/profiles.d/gawk.sh
> > 
> > gawkpath_sanitize
> > gawklibpath_sanitize
> > 
> > exec gawk "$@"
> [...]
> 
> A caveat is that on some systems, scripts can't be used in
> shebangs.

OK, yes, I didn't realize that the focus here was on shebangs.
The safegawk script could replace gawk when invoked from inside another
script or from the command line, but I agree it's problematic as a shebang
interpreter, which ought to be a binary.

> Not to mention that in:
> 
> #! /usr/bin/safegawk -f/-E
> 
> We do *not* want the argument of -f/-E (which here is filled-in
> by the system from the first (file path) argument to execve())
> to be looked-up in $AWKPATH, only be interpreted as a file path.

That is true. I'm not sure how best to handle the gawk shebang case.
As you suggest, it may be inadvisable for -E to do any searching at all.
And perhaps -E should have a side-effect of disabling relative paths
in AWKPATH and in AWKLIBPATH, on the theory that -E is used only 
in shebang situations.

Regards,
Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]