[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: make check fails if no en_US.iso88591 locale
From: |
Ludovic Courtès |
Subject: |
Re: make check fails if no en_US.iso88591 locale |
Date: |
Thu, 10 Sep 2009 17:33:02 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) |
Mike Gran <address@hidden> writes:
> I could fix the test by testing only characters 0 to 127 in a C locale
> if a Latin-1 locale can't be found.
Yes, that'd be nice.
> I can also fix the test by using the 'setbinary' function
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (help setbinary)
`setbinary' is a primitive procedure in the (guile) module.
-- Scheme Procedure: setbinary
Sets the encoding for the current input, output, and error ports
to ISO-8859-1. That character encoding allows ports to operate on
binary data.
It also sets the default encoding for newly created ports to
ISO-8859-1.
The previous default encoding for new ports is returned
--8<---------------cut here---------------end--------------->8---
It seems to do a lot of things, which aren't clear from the name. ;-)
What can be done about it?
At least it should be renamed, to `set-port-binary-mode!' or similar.
Then it'd be nice if that functionality could be split in several
functions, some operating on a per-port basis. After all, one can
already do:
(for-each (lambda (p)
(set-port-encoding! p "ISO-8859-1"))
(list (current-input-port) (current-output-port)
(current-error-port)))
So we just lack:
;; encoding for newly created ports
(set-default-port-encoding! "ISO-8859-1")
With that `setbinary' can be implemented in Scheme.
> to force the encodings on stdin and stdout to a default value that
> will pass through binary data, instead of calling 'setlocale'.
Hmm, I think I'd still prefer `setlocale'.
regexec(3) doesn't say anything about the string encoding. Do libc
implementations actually expect plain ASCII or Latin-1? Or do they
adapt to the current locale's encoding?
> I looked in the POSIX spec on Regex for specific advice using 128-255 in
> regex in the C locale. I didn't see anything offhand. The spec does
> spend a lot of time talking about the interaction between the locale and
> regular expressions. I get the impression from the spec that using
> regex on 128-255 in the C locale is an unexpected use of regular
> expressions.
http://www.opengroup.org/onlinepubs/9699919799/functions/regexec.html
reads:
If, when regexec() is called, the locale is different from when the
regular expression was compiled, the result is undefined.
It makes me think that, if a process runs with a UTF-8 locale and passes
raw UTF-8 bytes to regcomp(3) and regexec(3), it may work.
Hmm, the program below, with UTF-8-encoded source, works both with a
Latin-1 and a UTF-8 locale:
#include <stdlib.h>
#include <regex.h>
#include <locale.h>
int
main (int argc, char *argv[])
{
regex_t rx;
regmatch_t match;
setlocale (LC_ALL, "fr_FR.utf8");
regcomp (&rx, "ça", REG_EXTENDED);
return regexec (&rx, "ça va ?", 1, &match, 0) == 0
? EXIT_SUCCESS : EXIT_FAILURE;
}
Do you think it would work to just leave `regexp.test' as it is in 1.8?
Thanks,
Ludo'.
Re: make check fails if no en_US.iso88591 locale, Ludovic Courtès, 2009/09/09