[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#18051: [Emacs-diffs] trunk r117726: Add string collation.
From: |
Michael Albinus |
Subject: |
bug#18051: [Emacs-diffs] trunk r117726: Add string collation. |
Date: |
Wed, 27 Aug 2014 13:24:48 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.4.50 (gnu/linux) |
Eli Zaretskii <eliz@gnu.org> writes:
> Here are a few more thoughts about related issues:
>
> 1. Why does str_collate return a ptrdiff_t value? AFAIK, wcscoll
> etc. return int data type, and of rather small values.
Hm, yes. Both wcscoll and w32_compare_strings return int, so I've
changed that for str_collate accordingly.
> 2. Should we signal an error if the input strings are not pure-ASCII
> or multibyte? Unibyte strings will at best cause incorrect
> results.
Maybe we shall convert the strings to multibyte, via string_to_multibyte()?
If the string is already multibyte, it doesn't harm.
> And what about strings with invalid codepoints,
> e.g. those outside of the Unicode range, which can happen inside
> Lisp strings?
> 3. What about errors in wcscoll? The current code ignores them;
> however, the value returned by wcscoll in case of an error is not
> documented, so it could be random. Should we signal an error if
> errno gets set by wcscoll?
wcscoll sets EINVAL when the codepoint is out of range. I've added a
check for this case, returning an error.
(string-collate-equalp (string 1) (string ?\U0020FFFF))
=> error: Non-Unicode character: 0x20ffff
> 4. How to control the optional features of the collating sequence? I
> mean, for example, the fact that punctuation characters are ignored
> in the .UTF-8 locales on glibc hosts (or so it seems). At least on
> Windows, a somewhat higher degree of control is available, but it
> must be specified separately of the locale ID. E.g., the
> comparison function accepts flags to ignore punctuation and
> symbols, width differences, diacritics, etc. Should we have another
> variable, perhaps w32-specific, to request these features?
> Alternatively, we could use .UTF-8 on Windows to communicate that,
> although that sounds like a kludge.
In Posix systems, I'm not aware of configuring such optional features
via glibc. The most granular selection is what you dou with LC_COLLATE.
If we want to offer more granular settings, we would need to use a library
like libicu (http://icu-project.org/). Could be done, but should be optional.
> 5. The locale names on Windows are different from Posix: Windows uses
> 3-letter abbreviations of the country and the language,
> e.g. "fra_FRA" instead of the Posix "fr_FR". Do we want the locale
> string values used for let-binding the above-mentioned variable to
> be portable across systems? Then we'd need some conversion
> database on MS-Windows.
Here I'm a bit undecided. We could let it to the users to find the
proper locale name, but this is inconvenient. OTOH it would be much work
to install a mapping system, and we would need to maintain it. What if
there would be a new "en_SC" (Scotland) locale? We would need to
maintain such changes in Emacs forever ...
> 6. I think we will want case-insensitive version of this function.
That's also on my todo list. But I'm a little bit undecided whether we
shall add it to string-collate-* functions, or whether there shall be
further functions.
Maybe we could use sort-fold-case for this as indication? Or is this too
specific?
Best regards, Michael.
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Paul Eggert, 2014/08/25
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Dmitry Antipov, 2014/08/25
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Michael Albinus, 2014/08/25
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Eli Zaretskii, 2014/08/25
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Eli Zaretskii, 2014/08/25
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation.,
Michael Albinus <=
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Eli Zaretskii, 2014/08/27
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Michael Albinus, 2014/08/27
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Eli Zaretskii, 2014/08/27
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Paul Eggert, 2014/08/27
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Paul Eggert, 2014/08/27
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Eli Zaretskii, 2014/08/27
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Paul Eggert, 2014/08/27
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Michael Albinus, 2014/08/27
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., Eli Zaretskii, 2014/08/27
- bug#18051: [Emacs-diffs] trunk r117726: Add string collation., martin rudalics, 2014/08/29