[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: new-flex-completion-style

From: Drew Adams
Subject: RE: new-flex-completion-style
Date: Thu, 14 Feb 2019 16:34:38 +0000 (UTC)

> The Levenshtein distance and close variants generally do
> a reasonable job of approximating human expectations, yes.
> Close variants (eg: only delete/insert, no mutate) also
> work well.
> The cost of building the Levenshtein edit distance matrix
> for picking or ordering candidates is quite high, though; ...
> gives a good summary, and you really don't want to use the
> most naive option if you can avoid it.  Computation costs
> go through the roof very fast; the cost is roughly
> proportional to the length of the candidate cubed, using the
> brute force approach.
> The other shortfall of using only edit distance is that most
> of the time people expect to front-weight the search: you
> may have a closer edit distance match later in the string,
> but people usually want a less exact match, closer to the
> start, intuitively.
> > I've been experimenting with a simpler function that just
> > counts the number of "holes" and the length of those holes
> > separately in the denominator. The numerator is the same
> > and a perfect match is still a 1. It seems to fare better
> > for your cases. For foo
> I'd very, very strongly encourage y'all to take a look at
> the existing work on the problem, not least because of the
> algorithmic complexity costs when you apply this to a large
> corpus of candidates.  Consider, for example, completion
> over the set of all defined symbols in Emacs, which is
> ~ 55,000 candidates, and IIRC I calculated the average
> length to 9 characters, so brute force matching would be
> be on the order of 40 **million** comparisons for the first
> pattern character, though thankfully dropping fast as you
> eliminate non-matching candidates entirely.  (Raw comparisons,
> accounting for all possible matches in all possible candidate,
> of that pattern, so each candidate gets
> $length / $pattern_length comparisons performed.)
> Anyway, the point is: this is an area where serious study
> has been invested, and I strongly urge you to take advantage
> of that, rather than trying to reinvent this surprisingly
> complex wheel.  This isn't simple to do, and it definitely
> isn't simple to do in a high performance way.
> PS: whatever else you do, make it trivial to configure what
> function is used for ordering the candidate results.  That
> matters far more that the details of the default match /
> order choice, because it allows easy innovation over time.
> PPS: the best modern tools seem to be derived from the
> problem of genome sequence matching, but heuristics are
> definitely helpful too.  Again, fzf is the best version of
> that I have found.

All good info; thanks.  +1 for (1) emphasizing the cost
in general/naive cases and for (2) saying "make it trivial
to configure" the sort order (function).

João's (first) reply to you, saying that an interactive
Emacs context (e.g. completion) has needs that can be a
bit different, is also relevant.

FWIW, here is how Icicles uses (2 kinds of) Levenshtein


And here is how it uses a Jaro-Winkler matching:


reply via email to

[Prev in Thread] Current Thread [Next in Thread]