emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new-flex-completion-style


From: João Távora
Subject: Re: new-flex-completion-style
Date: Wed, 13 Feb 2019 17:29:30 +0000
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (windows-nt)

João Távora <address@hidden> writes:

> I'll take a look at this new batch, do the obvious fixes.  But before
> submitting anything sorting-related, I'd like Dmitry to show a patch for
> his sorting idea on top of scratch/new-flex-completion-style, try it out
> a bit and then report back here.

>> +          (setq completions
>> +                (sort completions
>> +                      (lambda (a b)
>> +                        (let ((va (get-text-property 0 
>> 'completion-style-sort-order a))
>> +                              (vb (get-text-property 0 
>> 'completion-style-sort-order b)))
>> +                          (if (and va vb) (< va vb) va)))))
>
> This will signal an error when one of the completions is the empty
> string :-(

I've removed this for now, I.e. there's absolutely no sorting yet before
we agree on who should be doing it and where.  But why would the empty
string be there anyway?

>> @@ -3056,22 +3067,41 @@ PATTERN is as returned by 
>> `completion-pcm--string->pattern'."
>>           (let* ((pos (if point-idx (match-beginning point-idx) (match-end 
>> 0)))
>>                  (md (match-data))
>>                  (start (pop md))
>> -                (end (pop md)))
>> +                (end (pop md))
>> +                (len (length str))
>> +                (score-numerator 0)
>> +                (score-denominator 0)
>> +                (aux 0)
>> +                (update-score
>> +                 (lambda (a b)
>> +                   "Update score variables given match range (A B)."
>> +                   (setq
>> +                    score-numerator   (+ score-numerator (- b a))
>> +                    score-denominator (+ score-denominator (expt (- a aux) 
>> 1.5))
>> +                    aux              b))))
>
> I don't understand this scoring algorithm: please add a comment
> explaining it

I don't "understand" it either :-) i.e. it was developed empirically,
more or less.  But let's try to:

Consider these bad ascii(tm) schematics matching the pattern "foo" to
the strings "barfobazoquux" and "barfoobazquux":

          bar         fo             baz         o             quux
          ---         ++             ---         +             ---- 
   <start>   <match_1>  <match_1_end>   <match_n> <match_n_end>    <end>
    
    
          bar         foo             bazquux
          ---         +++             ------- 
   <start>   <match_1>   <match_1_end>       <end>
   
+++ indicates parts where the pattern matched
--- indicates parts where the pattern didn't match

The formula assigns higher scores to "better" matches. It's bound by
[0..1] and in the form of a quotient.  For the numerator, it counts the
+++.  For the denominator, it takes counts the number of --- in each
such group, exponentiates that number to a "falloff constant", adds it
to the total, adds one to the final sum, and then multiples by the
length of the string.

If no --- you get

   (length/(length * (1 + 0)) == 1

If no +++, you get 0 (but this never happens cause we pre-filtered the
matches with pcm).

The falloff constant (FC) used in the denominator is important.  A
negative falloff will value tighter matches, a positive value will value
more evenly spread out matches.  I set the constant to 1.5 some time
ago, and I haven't gone back, but perhaps it could be -1.5.

Anyway with FC=1.5 the examples above would be:

   first string  1.507%
   second string 1.197%

with FC=-1.5 it would be

   first string  15.849%
   second string 19,834%

If haven't mangled the numbers, this seems to point that -1.5 is a
better default for the constant, which shouldn't be hardcoded anyway.
On the other hand, I've been working with -1.5 for a long time now and
never had any obvious problems.  It's probably what you said elsewhere,
"good" scoring is a question of getting used to whatever the system is
doing, IOW humans learn, too.

Also note that it doesn't make any sense to compare scores between
different match lengths, only between same match lenghts at different
positions on different strings.

Anyway, what do you think?  Is this acceptable or so you have anything
better?

> (and give a meaningful name to `aux`).

Oh happy days when hackers could code "aux" and feel very smug about it.

OK, I'll call it "last-b".

>> +           (funcall update-score 0 start)
>>             (while md
>> -             (put-text-property start (pop md)
>> +             (funcall update-score start (car md))
>> +             (put-text-property start
>> +                                (pop md)
>
> The extra line-break after `start` seems spurious.

Were you expecting it to have very deep, hidden meaning?  But OK.

>
>> +           (put-text-property
>> +            0 1 'completion-pcm-commonality-score
>> +            (/ score-numerator (* len (1+ score-denominator)) 1.0) str))
>
> This will signal an error when `str` is the empty string :-(

Ooops, but that that happen?  Can an empty string be pcm-matched
by some pattern?  There's 

> BTW, maybe PCM could also use this scoring for sorting (i.e. we could
> set `completion-score` directly here)
>
>> +           (let ((score (get-text-property 0 
>> 'completion-pcm-commonality-score comp)))
>> +             (put-text-property 0 1 'completion-style-sort-order (- score) 
>> comp)
>
> This will signal an error when `comp` is the empty string :-(

Same here.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]