bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Filename completion broken on single quote


From: lolilolicon
Subject: Re: Filename completion broken on single quote
Date: Sun, 16 Oct 2011 22:52:10 +0800

On Sun, Oct 16, 2011 at 9:52 AM, Chet Ramey <chet.ramey@case.edu> wrote:
> On 10/15/11 10:34 AM, lolilolicon wrote:
>>
>> OK, some more strange test results.
>>
>> In the interactive bash shell, I did this (in an empty directory):
>>
>>   $ mkdir 1\'1          $ mkdir 2@2
>>   $ touch 1\'1/one      $ touch 2@2/two
>>   $ compgen -f "1'1"    $ compgen -f "2@2"
>>   1'1                   2@2
>>   $ compgen -f "1\'1"   $ compgen -f "2\@2"
>>   $ compgen -f "1'1/"   $ compgen -f "2@2/"
>>                         2@2/two
>>   $ compgen -f "1\'1/"  $ compgen -f "2\@2/"
>>   1\'1/one              2\@2/two
>>
>> (Note: Put side by side for comparision.)
>>
>> A bit inconsistency here. Then I wrote a bash script to do the same thing,
>>
>>   #!/bin/bash
>>
>>   shopt -s progcomp
>>
>>   mkdir -p 1\'1 2@2
>>   touch 1\'1/one 2@2/two
>>
>>   for i in \
>>     "1'1" "1\'1" "1'1/" "1\'1/" \
>>     "2@2" "2\@2" "2@2/" "2\@2/"; do
>>     printf -- '%5s => %s\n' "$i" "$(compgen -f "$i")"
>>   done
>>   # (Same results using `compgen -o default')
>>
>> yet with different results:
>>
>>     1'1 => 1'1
>>    1\'1 =>
>>    1'1/ => 1'1/one
>>   1\'1/ =>
>>     2@2 => 2@2
>>    2\@2 =>
>>    2@2/ => 2@2/two
>>   2\@2/ =>
>>
>> What's going on with all this inconsistency?
>
> Maybe that a single quote is a quote character, but an at sign has no
> special meaning.  compgen expects its arguments to come in quoted as
> they would be when entered from the command line, when using readline
> to try and complete them.  "1'1" gives compgen an unquoted single quote,
> since the double quotes are removed before compgen sees it.
>
> The real problem is that readline runs the filename dequoting function
> based on whether or not it found a quote character, and compgen (and the
> underlying programmable completion functions) has to either check, guess,
> or dequote unconditionally when run from the command line.
>

What is a "quote character"? Single quote, double quote, and backslash?
I don't know, but it sounds like compgen should be more stupid, as in KISS.
I'd like compgen to behave consistently no matter how it's called.  The way
it is now, is a bit crazy.  I will explain later.

> Three-plus years ago, when this came up, I wrote:
>
> ==========
> For historical reasons, complete/compgen dequote the filename they're
> passed, removing backslash escapes and interpreting embedded quoted
> substrings.  (One of the things bash should do when it does that is to
> be better about obeying the shell rules about backslash-escaped characters
> and double quotes, but that doesn't matter for this example -- though that
> function has several problems.)
>
> When it's called as the result of readline dispatching on a particular
> character, this is appropriate -- the shell hasn't done any expansion
> or quote removal, so the filenames still have embedded quoting.  When
> called from the command line, as in these examples, it's not appropriate,
> since the shell has already expanded the argument and stripped the
> quotes.
> ==========
>
> Some of the details have changed since then, but the essentials are
> still the same.
>

"since the shell has already expanded the argument and stripped the quotes"
Were you refering to these[1]:

    $ touch "1'1 1" "1'2 2"
    $ compgen -f 1\'    # 1' is passed to compgen, obviously
    1'2 2
    1'1 1
    $ compgen -f 1\'1   # 1'1 is passed to compgen
    $ echo $?
    1

(Note: comments added by me.)

because if you were, my examples are not subject to this mistake.
But if your were not, there must be another "layer" where the shell comes
into play that I'm not aware of.  Please elaborate if this is the case :)

[1] 
https://bugs.launchpad.net/ubuntu/+source/bash-completion/+bug/123665/comments/1

> There are three cases to consider: "straight" programmable completion
> (complete -f), `compgen' run from a programmable completion function
> (foo=( $( compgen -f "$word") ) ), and `compgen -f' run from the command
> line.  They're not all the same, but the programmable completion code's
> filename completion function has to somehow accommodate them (and,
> frankly, the running-from-the-command-line case is the least important).
>

Since I do not really understand why they have to be handled differently,
I'm doing my part by producing more observational results.

Firstly, the three cases as you point out all behave differently.

Inside an empty directory:

    $ touch 1\'2 125 13 24

    #-- "straight" programmable completion
    $ complete -f foo
    $ foo 1'<TAB><TAB>  # (no completion on first <TAB>)
    1'2
    125
    13
    24
    $ foo 1\'<TAB>    # foo 1
    $ foo 1\\\'<TAB>  # foo 1\'2

    #-- `compgen -f' run from a programmable completion function
    $ complete -F bar bar
    $ type bar
    bar is a function
    bar ()
    {
        printf '\n[%s]\n' "${COMP_WORDS[COMP_CWORD]}";
        compgen -f "${COMP_WORDS[COMP_CWORD]}"
    }
    $ bar 1'<TAB>
    [1']
    1'2
    125
    13
    $ bar 1\'<TAB>
    [1\']
    1'2
    125
    13
    $ bar 1\\\'<TAB>
    [1\\\']
    1'2

    #-- `compgen -f' run from command line
    $ compgen -f "1'"
    1'2
    $ compgen -f "1\'"
    $ compgen -f "1\\\\\'"  # (1\\\' is passed to compgen)

I hope the above will help clarify what bash currently does, and how it's
a bit crazy.  I hope to discuss the correct behavior in each case later.

Secondly, I did some more experiments...

I'm having a hard time understanding how exactly readline/bash splits the
command line into words, and how it completes the current word.

The bash(1) man page states:

    COMP_WORDBREAKS
        The set of characters that the readline library treats as word
        separators when performing word completion.  If COMP_WORDBREAKS
        is unset, it loses its special properties, even if it is
        subsequently reset.
    COMP_WORDS
        An array variable consisting of the individual words in the
        current command line.  The line is split into words as readline
        would split it, using COMP_WORDBREAKS as described above.

But as far as I can tell, the man page is not telling the whole story.
Different characters in COMP_WORDBREAKS can have different effects.

The default set of characters in COMP_WORDBREAKS are:

    $ printf %s "$COMP_WORDBREAKS" | cat -A
     ^I$
    "'@><=;|&(:

(Note: continuing from the previous experiment.)

Whitespaces are stripped:

    $ bar <TAB>
    []
    1'2
    24
    125
    13

Single and double quotes do not split words:

    $ bar 1'2<TAB>
    [1'2]
    125

and it's unclear how quotes are interpreted,

    $ bar 12"
    [12"]
    125
    $ bar 1'2''<TAB>
    [1'2'']
    125
    $ bar 1'2'"<TAB>
    [1'2'"]

The characters `@><=:' split words:

    $ bar 1@<TAB>
    [@]
    $ bar 1@2<TAB>
    [2]
    24

any of the characters behaves the same way as above, but there is a
difference between `@' and `><=:' with `complete -f':

    $ complete -f foo
    $ foo @<TAB><TAB>  # (nothing)
    $ foo =<TAB><TAB>  # (lists filenames in directory)
    $ touch @6 =7
    $ foo @<TAB>       # foo \@6
    $ foo =<TAB><TAB>  # (lists filenames in directory)

Presumably, `@' is treated special because it *is* (or _was_) special:

    $ complete -r foo
    $ foo @<TAB><TAB>  # (lists hostnames)
    $ foo =<TAB><TAB>  # (lists filenames in directory)

after `complete -f', `@' loses its special meaning (hostname completion),
but is somehow still not treated equal to `><='.  Is this a bug?

Back to wordbreaks, any of the remaining characters `;|&(' marks the start
of a new command, so bash attempts command completion (or not, depending
on `no_empty_cmd_completion' shopt).  This is understandable, but the
man page seems to oversimplify the word splitting procedure.

Whitespaces and `@><=:' can be escaped to lose their role as wordbreaks:

    $ touch 8=9
    $ bar 8=<TAB>
    [=]
    =7
    $ bar 8\=<TAB>
    [8\=]
    8=9

alternatively we can quote it with single or double quotes, but as stated
before, it's unclear how the quotes are interpreted:

    $ bar 8'=<TAB>
    [8'=]
    8=9
    $ bar 8''=<TAB>   # (unsurprisingly)
    [=]
    =7
    $ bar 8'''=<TAB>  # (unsurprisingly, considering the previous two)
    [8'''=]
    8=9
    $ bar 8''"=<TAB>  # (surprise!)
    [8''"=]

I guess the "surprise!" part is where bash should "be better" as you said
three-plus years ago.

> I will work out some improved heuristics and implement them for the next
> version, and things will get better for both straight `complete -f' and
> when running compgen from the command line.
>

Happy to hear that.  I wish I could help, but I'm incapable right now.
I'm willing to test.

>> Also, dumb question, but is `compgen' supposed to accept quoted arguments or
>> dequoted ones? e.g., "1\'1/" or "1'1/"?
>
> It should accept arguments quoted as they would be during completion,
> but you have to take into account the word expansions performed on its
> arguments.
>

Again, by "word expansions" you mean the shell expansions?  "1\'1/", with
double quotes, is already taking into account word expansions, correct?

Thanks



reply via email to

[Prev in Thread] Current Thread [Next in Thread]