Re: What's missing in ELisp that makes people want to use cl-lib?

[Replying to both of your mails here]

On Tue, Nov 14, 2023 Dmitry Gutov <dmitry@gutov.dev> wrote:

> And if the list has just 1 element (or zero?), the gap would be even
> larger.
>
> Only seeing a 1.6x difference here is a nice surprise, actually.

These two statements together seem contradictory. Why would
you be happy to see a particular factor for a given
list length if you know the factor is unbounded in general
for small lists?

And it's only in the small lists case. If I pass my own predicate to

both cl-set-difference and your best seq-difference-3

(defun myequal (a b)
(equal a b))

(setq cc (make-list 12 "blabla"))
(setq list2 (make-list 12 "shooveedoowaa"))

(when nil
;; (0.106080265 4 0.03849865399999963)
(benchmark-run 10000 (cl-set-difference cc list2 :test #'myequal))
;; (0.5504704210000001 39 0.394207416)
(benchmark-run 10000 (seq-difference-3 cc list2 #'myequal))

)

I get the 5.5x slowdown again (in one experiment, not shown,

I got a whopping 200x slowdown, but I can't reproduce it right now

from Emacs -Q, maybe I had some method on some crucial seq.el generic)

> > This is all interesting, until one ponders what happens if an existing

> > seq.el user somewhere has:
> >
> > (cl-defmethod seq-contains-p ((seq my-voodoo-seq)
> > (elt (eql :secret-voodoo)) &optional _tesfn)
> > (invoke-voodoo-priests seq))
> >
> > making use of seq.el's support for abstract polymorphic sequences.
> >
> > With seq.el 2.24 a seq-difference operation would consider this user's
> > method, with seq.el 2.24.dmitry (i.e. your fast seq-difference-3) it
> > simply won't. This user's code is clearly broken.
> >
> > But was the user allowed to do that in the first place? If not,
> > why is seq-contains-p a public generic function?
>
> It was allowed, and yes, indeed, it's a breaking change. So we should at
> least examine the existing public code out there and see whether such
> overrides exist.

Not so easy, I'm afraid. seq-contains-p is not the only such generic,

it's just one I happened to spot first. seq-do is a generic and in that

group mentioned in the sparse seq.el documentation about mandatory

customization, meaning presumably it is essential for custom sequences.

See https://github.com/search?q=%22%28cl-defmethod+seq-do%22&type=code for

a list of custom sequences that use it.

Your seq-difference-3 doesn't call seq-do like the original seq-difference

does, so the set difference operations for those custom sequences might

well be broken. What's even more problematic is that it skips seq-do

entirely for certain predicates and not entirely for certain other

predicates.

And this is the type of headache that make-everything-a-gf designs

like seq.el's brings to developers trying to optimize it.

> > Generic function protocols aren't just a bunch of generic functions
> > thrown together, they're also precise documentation as to how/when
> > the generics are called by the framework and what the user may
> > add to them, desirably making common things easy, difficult things
> > possible, and mistakes hard. Normally the framework that calls
> > into generics isn't exposed to the user customizations, i.e.
> > it is made of regular defuns. Of course, you seem to know this
> > as xref.el has such a practice. But in seq.el, almost everything
> > is a generic function, even the entry points, somewhat bizarrely.
>
> I'm not sure that is a problem in itself, except in the cases like I've
> described, where the type dispatch ends up happening N times instead of
> just once.

It's still bizarre. Here's how I think: if an application is customizing

the entry point directly, why even call the entry point? OK, so what if

a library is customizing the entry point? It's presumably because that

library provides a data type for applications to instantiate and use. But

if the library does that, all calling guarantees of other generic

functions are lost for, say, user method for subtypes of that data type

(or :around, or :after, etc). So it's also nonsensical. The only think

where it _could_ make a shred of sense is in a kind of "private library" where

the application controls both the data type and knows exactly the shortcuts

taken. But then "private library" is an oxymoron.

> > What I meant before is that these are non-trivial questions that

> > must be answered when embarking on these optimizations of seq.el.
> >
> > So when I say it's non-trivial to optimize, it's not because of
> > figuring out if seq-filter or seq-reduce is better, or which shortcut
> > is faster, or if dispatch takes time, it's because if you have made
> > most everything public and customizable, you have offered an
> > extremely rich contract to users, so taking any shortcuts is
> > much more likely break it.
> >
> > cl-lib.el is easy to optimize because the contract it offers
> > is mostly clear (it's straight out of the CL standard).
>
> cl-lib is pretty complex and alien from somebody with zero experience in
> it (and CL), and it's a lot more code, with different macros. So I'm not
> sure I'd call cl-lib easier overall.

Oh, don't get me wrong: cl-lib.el's implementation details are not pretty

and not easy, likely typical library code (though i've seen much much

worse). What's fundamentally easier about optimizing cl-lib is that the

contract it offers is much, much more well specified.

Possibly because it comes as a result of careful design in committees of

very capable programmers in the 90's, before the CL winter, that were

already on to this class of difficulties and wanted to make a common

interface.

The interface they made for sequences is not fully generic, but there

_are_ a lot of different implementations for that interface, each CL

compiler has one. Most of the tricks, like what I did to optimize

cl-{n}set-difference are standard stuff in the CL world. There are even

reference LOOP implementations (likely much more performant than ours,

though possibly not compatible with some non-standard edge cases our

cl-loop has of which I know a couple).

> > The case in the CL world against generic functions' performance
> > is often not that the dispatch is slow, but that
> > employing them too liberally makes framework optimization
> > hard, because you restrict yourself from optimization
> > opportunities.
>
> I don't know if that's a major issue: just like cl-some has different
> branches, seq-some could have three different definitions. The
> approaches to optimization seem transferable, more or less.

Seq-some also calls into a lot of user customizable code, so it'll

suffer from the same class of problems. Say, are you going to skip

the seq-do generic sometimes there as well? Maybe. Depends on what

contract we want to uphold, and we have no idea. At least I don't.

> I believe the actual value it provides is a list of implementations that
> are quite compact and as such easy to fix/extend/modify. And if cl-lib
> is handy for someone with CL experience, I'll guess that seq.el rings
> more familiar to the Clojure users.

Maybe in the superficial user interface, because of names etc. Or

are you saying seq.el interface is extracted from Clojure's standard much

like cl-lib is from CL standard? (Do Clojure generic functions work like

ours?) Then by all means we should rush to study that contract closely,

to know what we can and can't do as optimizers.

João

From:	João Távora
Subject:	Re: What's missing in ELisp that makes people want to use cl-lib?
Date:	Tue, 14 Nov 2023 10:34:57 +0000