bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bash core dumps doing glob pattern on long string


From: Martin D Kealey
Subject: Re: bash core dumps doing glob pattern on long string
Date: Tue, 11 Oct 2022 17:02:06 +1000

Broadly I meant translating into a "preferably" Deterministic
(stackless/non-backtracking) Finite State Automaton.

However I note that it's not always possible to make a Deterministic FSA
when you have repeatable groups which themselves don't have fixed lengths
(like a+(a|abbba|aabb*aba)b); either the extglob compiler would need to
start over and compile to a Non Deterministic (stacking) FSA, or just give
up and go back to the recursive approach.

Personally I would favour the addition of «shopt -s dfa_extglob» that would
block the fall-back, causing extglobs that would need a stack to be treated
as magic match-never tokens.

I say "extglob", but this would also speed up silly ordinary globs like
[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]*[a-z]

-Martin

On Tue, 11 Oct 2022 at 15:57, Phi Debian <phi.debian@gmail.com> wrote:

>
>
> On Tue, Oct 11, 2022 at 7:23 AM Martin D Kealey <martin@kurahaupo.gen.nz>
> wrote:
>
>>
>> 4) compile globs into state machines, the same way that regexes get
>> compiled, so that they can be matched without needing any recursion.
>>
>> -Martin
>>
>
> Do you mean the NFA colud trade recursion for malloc()'d backtracking
> point record? this would trade stacksize vs bss size. Or may be you mean
> converting the NFA into DFA, that I could not handle myself :)
>
> That's why at first I thought limiting the input string length was may be
> an option, but Chet demonstrated it was not.
>
> Then returning an error instead of core dumps on recursion too deep on the
> NFA walk seems to be the thing to do, and then I thought a stack
> availabitly check for a given recursive level could be done, this is pretty
> easy to do and almost costless regarding perf, specially because bash (and
> other shells) have access to alloca(), well we could admit that os/arch
> sans alloca() could still core dump.
>
> A stack check with alloca() ressemble something along this lines
>
> int stack_too_small(size_t s)
> { return alloca(s)!=NULL;
> }
>
> This non inlinable function do allocate the stack provision, return true
> if not possible, false if possible and return the provision to the stack,
> the caller is then assured there is enough space in the stack for its
> recursion frame.
>
> and the shell code would look like
> ... recursive_func(...)
> {...
>   if(stack_too_small(recursion_frame_size)
>   { handle nicely; // longjmp, return....
> }
>
> What Chet call the right size to define is possibly something like the
> recursion frame size, that is the stack needed by the current recursion
> level instance (its locals) + all the frame size sub function would need
> until the next recurence, i.e if the call path look like
>
> a()-->b()-->c()-->a()...
> The recursion frame size  is sum(framesize(a)+framesize(b)+framesize(c))
>
> Some empiric constant can also be used, for instance on linux a max
> stacksize of 0x800000 is pretty common deciding we big constant like
> 32K could be simple and good enough.
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]