[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strip accents and sorting [was: BibTeX issues]

From: Roland Winkler
Subject: Re: strip accents and sorting [was: BibTeX issues]
Date: Fri, 30 Aug 2019 11:27:33 -0500

On Thu Aug 29 2019 martin rudalics wrote:
>  > But (string-lessp "ä-umlaut" "ö-combine") gives nil
> But (string-collate-lessp "ä-umlaut" "ö-combine") gives t

...not for me, which is likely due to my locale LC_COLLATE=C

I could use instead, say, LC_COLLATE=en_US.utf8.  Then the above
call of string-collate-lessp yields t.  But this also implies case
folding and ignoring dots in directory listings, which is not what I
want.  In other words, these locales have too many features bundled

Maybe these feature sets of different locales are documented
*somewhere* in a neat way, and there is a locale with a feature set
that does exactly what I want.  But to the best of my knowledge this
documentation resides outside emacs so that things get rather
complicated when this affects an emacs session in important or
possibly subtle ways.

> so it should be fairly easy to fix `sort-lines' and friends
> accordingly.

In that sense I am not sure I would like to see `sort-lines' and
friends be fixed "accordingly".  If at all, I'd vote for a user
option that likely I'd use to disable such things.

On the other hand, as Eli pointed out in his reply about accented
characters being represented via a single character as compared to
using combining characters

> The Unicode Standard mandates that they be handled identically,
> including in searching and sorting.  We don't yet implement that
> 100%, but see char-fold.el for a partial (and not very efficient)
> implementation during search.

So I would assume that the locale should not matter at all in the
context of unicode combining characters. (Or there should be a way
to control exactly this aspect of unicode combining characters with
no additional (mis)features bundled with it.)

I understand that it is a different matter how accented characters
are sorted relative to each other and also relative to un-accented
characters.  So it can make a lot of sense to have different locales
for that aspect.

Maybe I am missing something here.  (And I have not yet looked in
more detail at char-fold.el mentioned by Eli, which could be a
better way to go within the emacs world.)


reply via email to

[Prev in Thread] Current Thread [Next in Thread]