bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: nocaseglob


From: Chet Ramey
Subject: Re: nocaseglob
Date: Tue, 23 Jan 2007 14:14:26 -0500
User-agent: Thunderbird 1.5.0.9 (Macintosh/20061207)

Tim Waugh wrote:
> On Tue, 2007-01-23 at 17:17 +0100, Andreas Schwab wrote:
>> glibc definitely uses strcoll as well.  Most likely python has its own
>> implementation which gets it wrong.
> 
> No, really, this is going through glibc's __collseq_table_lookup
> function.  The Python example is just an easy-to-run distilled test
> case.

But it doesn't matter what undocumented internal function glibc is using.
The portable, standard way to perform character comparison using the
current locale is strcoll().  If I can't get the same results using
strcoll(), glibc is clearly doing something different internally.  (And
there is no portable standard way to obtain the current collating sequence.
The best you can do is sort sets of characters like I did.)

Try running the attached program.  Run it like

rangecmp -v start test end

e.g.,

rangecmp -v A h Z

Here are the results I get:

$ LC_ALL=C ./rangecmp -v A h Z
default locale = C
strcoll (h, A) -> 1
strcoll (h, Z) -> 1
$ ./rangecmp -v A h Z
default locale = en_US.UTF-8
strcoll (h, A) -> 7
strcoll (h, Z) -> -18
$ LC_ALL=en_US ./rangecmp -v A h Z
default locale = en_US
strcoll (h, A) -> 7
strcoll (h, Z) -> -18

strcoll indicates that, in the "en_US" locale, `h' sorts between `A' and
`Z'.  In the "C" locale, it does not.  This is consistent with the
collating sequences I posted earlier.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                       Live Strong.  No day but today.
Chet Ramey, ITS, CWRU    chet@case.edu    http://cnswww.cns.cwru.edu/~chet/
#include <stdio.h>
#include <locale.h>

#include <string.h>
#include <stdlib.h>
#include <unistd.h>

static void
usage()
{
        fprintf(stderr, "rangecmp: usage: rangecmp [-v] start test end\n");
}

int
main(c, v)
int     c;
char    **v;
{
        int     i, verbose, r1, r2;
        char    *dlocale;

        verbose = 0;
        while ((i = getopt(c, v, "v")) != -1) {
                switch (i) {
                case 'v':
                        verbose = 1; break;
                case '?':
                default:
                        usage();
                        exit(2);
                }
        }
        c -= optind;
        v += optind;

        dlocale = setlocale(LC_ALL, "");
        if (verbose)
                printf("default locale = %s\n", dlocale ? dlocale : "''");
        r1 = strcoll (v[1], v[0]);
        printf("strcoll (%s, %s) -> %d\n", v[1], v[0], r1);
        r2 = strcoll (v[1], v[2]);
        printf("strcoll (%s, %s) -> %d\n", v[1], v[2], r2);

        exit(0);
}

reply via email to

[Prev in Thread] Current Thread [Next in Thread]