[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: nocaseglob
From: |
Chet Ramey |
Subject: |
Re: nocaseglob |
Date: |
Tue, 23 Jan 2007 14:14:26 -0500 |
User-agent: |
Thunderbird 1.5.0.9 (Macintosh/20061207) |
Tim Waugh wrote:
> On Tue, 2007-01-23 at 17:17 +0100, Andreas Schwab wrote:
>> glibc definitely uses strcoll as well. Most likely python has its own
>> implementation which gets it wrong.
>
> No, really, this is going through glibc's __collseq_table_lookup
> function. The Python example is just an easy-to-run distilled test
> case.
But it doesn't matter what undocumented internal function glibc is using.
The portable, standard way to perform character comparison using the
current locale is strcoll(). If I can't get the same results using
strcoll(), glibc is clearly doing something different internally. (And
there is no portable standard way to obtain the current collating sequence.
The best you can do is sort sets of characters like I did.)
Try running the attached program. Run it like
rangecmp -v start test end
e.g.,
rangecmp -v A h Z
Here are the results I get:
$ LC_ALL=C ./rangecmp -v A h Z
default locale = C
strcoll (h, A) -> 1
strcoll (h, Z) -> 1
$ ./rangecmp -v A h Z
default locale = en_US.UTF-8
strcoll (h, A) -> 7
strcoll (h, Z) -> -18
$ LC_ALL=en_US ./rangecmp -v A h Z
default locale = en_US
strcoll (h, A) -> 7
strcoll (h, Z) -> -18
strcoll indicates that, in the "en_US" locale, `h' sorts between `A' and
`Z'. In the "C" locale, it does not. This is consistent with the
collating sequences I posted earlier.
Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
Live Strong. No day but today.
Chet Ramey, ITS, CWRU chet@case.edu http://cnswww.cns.cwru.edu/~chet/
#include <stdio.h>
#include <locale.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
static void
usage()
{
fprintf(stderr, "rangecmp: usage: rangecmp [-v] start test end\n");
}
int
main(c, v)
int c;
char **v;
{
int i, verbose, r1, r2;
char *dlocale;
verbose = 0;
while ((i = getopt(c, v, "v")) != -1) {
switch (i) {
case 'v':
verbose = 1; break;
case '?':
default:
usage();
exit(2);
}
}
c -= optind;
v += optind;
dlocale = setlocale(LC_ALL, "");
if (verbose)
printf("default locale = %s\n", dlocale ? dlocale : "''");
r1 = strcoll (v[1], v[0]);
printf("strcoll (%s, %s) -> %d\n", v[1], v[0], r1);
r2 = strcoll (v[1], v[2]);
printf("strcoll (%s, %s) -> %d\n", v[1], v[2], r2);
exit(0);
}
- Re: nocaseglob, (continued)
- Re: nocaseglob, Bob Proulx, 2007/01/20
- Re: nocaseglob, Bruce Korb, 2007/01/22
- Re: nocaseglob, Bob Proulx, 2007/01/22
- Re: nocaseglob, Bruce Korb, 2007/01/22
- Re: nocaseglob, Tim Waugh, 2007/01/22
- Re: nocaseglob, Chet Ramey, 2007/01/23
- Re: nocaseglob, Bruce Korb, 2007/01/23
- Re: nocaseglob, Matthew Woehlke, 2007/01/23
- Re: nocaseglob, Andreas Schwab, 2007/01/23
- Re: nocaseglob, Tim Waugh, 2007/01/23
- Re: nocaseglob,
Chet Ramey <=
- Re: nocaseglob, Bob Proulx, 2007/01/23
- Re: nocaseglob, Chet Ramey, 2007/01/23
- Re: nocaseglob, Bruce Korb, 2007/01/23
- Re: nocaseglob, Chet Ramey, 2007/01/23
- Message not available
- Re: nocaseglob, Aharon Robbins, 2007/01/28
- Re: nocaseglob, Bob Proulx, 2007/01/28