emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pattern matching on match-string groups #elisp #question


From: Stefan Monnier
Subject: Re: Pattern matching on match-string groups #elisp #question
Date: Sat, 27 Feb 2021 09:39:27 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

>> BTW, I was thinking about making the optimization more conservative, so
>> it only throws away the actual `if` but keeps the computation of the test:
> [...]
>> and it does fix the `pcase-let` problem with your original code.
> Given the trouble I think we can defend not respecting side-effects in
> something as functional as pcase!

Nevertheless, I went ahead with this change (after remembering that
wrapping the code in `ignore` should eliminate the extra warnings).

>> It should macroexpand to something morally equivalent to:
>> 
>>    (cond ((not (stringp STR)) nil)
>>          ((not (string-match "\\(?1:a*\\)" STR)) nil)
>>          ((looking-at "^"")
>>           (let* ((x1464 (match-string 1 STR)))
>>             (let ((FOO x1464)) FOO))))
>
> Oh dear... perhaps we should just go with the intermediate list (or vector)
> and suffer the small allocation penalty? (At least we should treat the case
> of a single variable specially, since no consing would then be necessary.)

It's clearly The Right Thing™.

> My guess is that a vector may be faster than a list if there are more than N 
> elements, for some N.

I'll let you benchmark it to determine the N.

> Should we use string-match-p when there are no variables bound in the rx 
> clause?

We could.  Tho, IIRC currently `string-match-p` is ever so slightly
slower than `string-match` and since we clobber the match data in other
cases, we might as well clobber the match data in this case as well: any
code which presumes the match data isn't affected by some other code
which uses regular expressions is quite confused.

>>> Of course a sufficiently optimising compiler would eliminate the consing!
>> Indeed, and it's not a difficult optimization (at least if you can
>> presume that this data is immutable).
> Right, although we would need some more serious data-flow infrastructure
> first. It would be useful for pattern-matching two or more values at the
> same time.

I don't think it's much more complicated than your current constant
folding: when you see a let-binding of a variable to a *constructor*,
stash that expression in your context as a "partially known constant"
and then do the constant folding when you see a matching *destructor*.
The problems I see are:

- you need to detect side effects between the constructor and the
  destructor.  You could just consider any *read* of a variable holding
  such partial-constant as a (potential) side-effect (except for those
  reads recognized as part of destructors, of course).  It should be
  good enough for the case under discussion.

- more importantly in order not to duplicate code and its side-effects
  (and suffer risks linked to moving code into a different scope), you
  need to convert your constructor so all its arguments are trivial and
  "scope safe" (e.g. gensym'd variables, integer constants, symbols,
  ...).

>>>> It's linked to the special undocumented pcase pattern `pcase--dontcare`
>>>> (whose name is not well chosen, suggestions for better names are
>>>> welcome)
>>> 
>>> pcase--give-up
>> 
>> Hmm... probably not much more explanatory than "dontcare".
>
> Well, 'dontcare' suggests that anything would do and the value not being
> used, like '_', but that's quite misleading.

Right, that's part of the problem with this naming: it doesn't want to
mean that anything matches, like `_`, but that any behavior is
acceptable when pcase has to try and match against this pattern.
The other part is that it's not true: we wouldn't settle for "any"
behavior, it still has to be sane-ish.

>> I was thinking of `pcase--impossible` as well.
> Yes, that looks acceptable. In any case, it isn't really a user-facing
> symbol, is it? Otherwise we'd need crystal-clear semantics (and lose the
> double dashes).

I would settle for something a bit less than "crystal-clear", but yes to
drop the "--" we'd need a clear enough semantics.

Here's another idea for its name: `pcase--go-back` since it makes pcase
go back to the last option it tried and accept it even though it failed
to match.  It still sucks, but maybe it'll give someone else a better idea?


        Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]