strip accents and sorting [was: BibTeX issues]

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

strip accents and sorting [was: BibTeX issues]

From:	Roland Winkler
Subject:	strip accents and sorting [was: BibTeX issues]
Date:	Wed, 28 Aug 2019 22:26:38 -0500

On Wed Aug 28 2019 Eli Zaretskii wrote:
> > From: Roland Winkler <address@hidden>
> > If there was a generic function strip-accents, then BibTeX mode could
> > certainly use it within its bibtex-generate-autokey machinery.
> 
> I don't think we have such a function, but it shouldn't be hard to
> write one, using the facilities in ucs-normalize.el.

Interesting! What are the intended use cases for ucs-normalize.el
and the algorithms that it implements?

I had never much thought about this.  But there is obviously a
problem when one tries to sort a database where the keys may contain
more fancy utf characters. (This problem must be well-known in the
utf world).  Naivly one might hope that the following lines are
properly sorted according to string-lessp

  ä-combine
  ä-umlaut
  ö-combine
  ö-umlaut

But (string-lessp "ä-umlaut" "ö-combine") gives nil so that sort-lines gives

  ä-combine
  ö-combine
  ä-umlaut
  ö-umlaut

Of course, this is due to the fact that a German umlaut can be
represented with its own character or with a combining diaeresis.
These two ways of presenting an umlaut look the same, but they are
not the same for string-lessp.

This can be particularly annoying when a database (be it BibTeX,
BBDB, or whatever) is often enough populated by copying records from
different sources that may represent such fancy utf characters in
different ways.

Now, one solution would be to simply strip off the combining
characters by decomposing the characters.  Or is there a possibility
to teach a sorting algorithm that the first letter of ä-combine is
"the same" as the first letter of ä-umlaut and all this should
appear near a-plain instead of past o-plain?

Roland

[Prev in Thread]

Current Thread

[Next in Thread]

BibTeX issues, Joost Kremers, 2019/08/27
- Re: BibTeX issues, Roland Winkler, 2019/08/28
  - Re: BibTeX issues, Eli Zaretskii, 2019/08/28
    - strip accents and sorting [was: BibTeX issues], Roland Winkler <=
    - Re: strip accents and sorting [was: BibTeX issues], martin rudalics, 2019/08/29
    - Re: strip accents and sorting [was: BibTeX issues], Roland Winkler, 2019/08/30
    - Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/30
    - Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/30
    - Re: strip accents and sorting [was: BibTeX issues], Roland Winkler, 2019/08/30
    - Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/30
    - Re: strip accents and sorting [was: BibTeX issues], Roland Winkler, 2019/08/30
    - Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/31
    - Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/29
    - Re: strip accents and sorting [was: BibTeX issues], Roland Winkler, 2019/08/30

Prev by Date: Re: Extra files in fountain-mode ELPA package
Next by Date: Why subr.el doesn't provide a feature?
Previous by thread: Re: BibTeX issues
Next by thread: Re: strip accents and sorting [was: BibTeX issues]
Index(es):
- Date
- Thread