[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Filename Expansion bug
From: |
Mickael KENIKSSI |
Subject: |
Re: Filename Expansion bug |
Date: |
Thu, 9 Jan 2020 12:09:22 +0100 |
Thanks for your comment.
I understand this may not sound of primary importance for you since they
are canonically equivalent, but sometimes what we really all care about is
the path as a literal string (be it well- or ill-formed), and not the
filesystem object it points to.
Normalization upon filename expansion is not the default Bash behavior, so
I see no reason why it should be considered acceptable to have it –
partially – happen on what is no more than an edge case in the end.
zsh (and ksh) provide the expected result:
$ mkdir -p a/b/c d/e/f g/h/e; zsh -c 'printf %s\\n .////a//../*///////*'
> .////a//../a///////b
> .////a//../d///////e
> .////a//../g///////h
>
I suppose it all comes down to an implementation question.
Best,
Mickaël
On Wed, Jan 8, 2020 at 4:09 PM Chet Ramey <chet.ramey@case.edu> wrote:
> On 1/8/20 2:34 AM, Mickael KENIKSSI wrote:
> > Hello,
> >
> > I found a bug regarding how pathnames are processed during filename
> > expansion. The result for non-normalized path-patterns may get mangled
> in a
> > such a way that it becomes inconsistent and unpredictable, making it
> > useless for string comparison or any kind of string manipulation where
> > having it in the exact same form as the pattern is required.
> >
> > How to reproduce :
> >
> > $ mkdir -p a/b/c d/e/f g/h/e; printf '%s\n' .////*//*///////*
> >> .////a/b/c
> >> .////d/e/f
> >> .////g/h/e
> >>
> >
> > This is correct from a filesystem perspective but not from a string
> > perspective, where you'd need each of the computed path as-is:
> >
> > .////a//b///////c
> >> .////d//e///////f
> >> .////g//h///////i
>
> You're not going to get the path with multiple slashes preceding
> pattern characters, because the pathname has single slashes, those
> slashes are, as POSIX says, "explicitly matched by using one or
> more <slash> characters in the pattern," and the matched pathnames
> that replace the pattern don't have multiple slashes.
>
> The reason that the three leading slashes aren't removed is that those
> directory names don't have any pattern characters and are left
> unchanged. Since the kernel's filename resolution treats multiple
> slashes the same as a single slash, the constructed pathname matches
> what's in the file system.
>
> That means, for instance, you have a directory `.////' and a pattern `*'.
> You opendir `////' and read it for every filename matching `*' (a, d, g),
> construct the pathnames, and go on with the rest of the pattern.
>
> The intermediate runs of multiple slashes get removed as part of the
> matching algorithm, as described above. They're essentially null pathname
> components.
>
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
> ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRU chet@case.edu http://tiswww.cwru.edu/~chet/
>