[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Text collation
From: |
Kevin Ryde |
Subject: |
Re: Text collation |
Date: |
Thu, 30 Nov 2006 10:08:03 +1100 |
User-agent: |
Gnus/5.110006 (No Gnus v0.6) Emacs/21.4 (gnu/linux) |
address@hidden (Ludovic Courtès) writes:
>
> I have come up with an `(ice-9 i18n)' module that contains
> locale-dependent text collation functions and also character case
> mapping and functions to read numbers. There would be a lot more things
> to add, like `strfmon ()', but I think that's a good start.
I would worry that r6rs may address these things too, leaving
guile-specifics as, well, a dead-end. Though I can see this stuff is
of use now.
Myself I've been using a couple of bits from from localeconv and
nl_langinfo. Some way to get at that would be a good addition (though
hopefully in a cleaner way than the C level).
> [0] http://sources.redhat.com/ml/libc-alpha/2006-09/msg00033.html
You could stick that link and perhaps the tllocale.ps.gz one in i18n.c
for reference, since it's not in the glibc manual.
> address@hidden The ice-9 i18n Module
See if you can think of a better section name.
> address@hidden {Scheme Procedure} make-locale category_mask locale_name
> [base_locale]
> ...
> +A @code{system-error} exception (@pxref{Handling Errors}) is raised by
> address@hidden when @var{locale_name} does not match any of the
> +locales compiled on the system.
This bit could be moved to earlier in the description. And perhaps
something non-committal like "locale_name must be known to the
system".
> address@hidden {Scheme Procedure} string-locale<? s1 s2 [locale]
> address@hidden {Scheme Procedure} string-locale>? s1 s2 [locale]
> address@hidden {Scheme Procedure} string-locale-ci<? s1 s2 [locale]
> address@hidden {Scheme Procedure} string-locale-ci>? s1 s2 [locale]
> address@hidden {Scheme Procedure} string-locale-ci=? s1 s2 [locale]
These could be described in one block I think, to avoid five very
similar descriptions. Likewise the char ones.
> +... Note that SRFI-13 provides procedures that
> +look similar (@pxref{Alphabetic Case Mapping}). However, the SRFI-13
> +procedures are locale-independent.
That's the intention of the srfi I guess, but it's not true currently
is it? Don't they use toupper() and therefore get whatever nonsense
the current setlocale() gives. Perhaps better leave the description
of srfi-13 to that section.
> address@hidden {Scheme Procedure} string-locale-upcase str [locale]
> address@hidden {Scheme Procedure} string-locale-downcase str [locale]
Do you need a caveat about multibyte characters there, for now? Like
"Note that in the current implementation Guile has no notion of
multibyte characters and in a multibyte locale characters may not be
converted correctly."
> address@hidden {Scheme Procedure} locale-string->integer str [base [locale]]
> address@hidden {Scheme Procedure} locale-string->inexact str [locale]
I think you should cross-reference strtol and strtod here, since their
parsing is rather idiosyncratic. I'd even be a bit tempted to name
them strtol and strtod in guile, to make it clear they're only one
possible way of parsing. Except those names aren't very nice ...
> +... Return two values:
Consider @pxref{Multiple Values}, since multi-values are (thankfully)
fairly rare.
> @c Local Variables:
> @c TeX-master: "guile.texi"
> address@hidden ispell-local-dictionary: "american"
Best leave that out please, it'll only annoy those of us who don't
have that dictionary installed.
> +Note that @code{setlocale} affects locale settings for the whole
> +process. For a safer, thread-safe and reentrant alternative,
Go easy on the advertising! :)
> - scmconfig.h.top gettext.h
> + scmconfig.h.top libgettext.h
I don't think that's good. Best leave gettext.h the gettext one, and
use another name for guile. Gettext got there first, and it doesn't
really matter which guile header has which prototypes.
> +/* This mutex is used to serialize invocations of `setlocale ()' on non-GNU
> + systems (i.e., systems where a reentrant locale API is not available).
> + See `i18n.c' for details. */
> +scm_i_pthread_mutex_t scm_i_locale_mutex;
There's an scm_i_misc_mutex for use when protection is (or should be)
rarely needed.
> +++ mod/libguile/i18n.c
> +
> +#ifndef USE_GNU_LOCALE_API
> +# include "libguile/posix.h" /* for `scm_i_locale_mutex' */
> +#endif
No need to conditionalize that, it's ok if it's only used sometimes,
it does no harm.
> +/* Provide the locale category masks as found in glibc (copied from
> + <locale.h> as found in glibc 2.3.6). This must be kept in sync with
> + `locale-categories.h'. */
> +# define LC_CTYPE_MASK (1 << LC_CTYPE)
> +# define LC_COLLATE_MASK (1 << LC_COLLATE)
> +# define LC_MESSAGES_MASK (1 << LC_MESSAGES)
> +# define LC_MONETARY_MASK (1 << LC_MONETARY)
> +# define LC_NUMERIC_MASK (1 << LC_NUMERIC)
> +# define LC_TIME_MASK (1 << LC_TIME)
I think you should put some privately selected bits there, not depend
on LC_CTYPE etc being in range 0 to 31.
> +/* Alias for glibc's locale type. */
> +typedef locale_t scm_t_locale;
I suppose the emulation could provide locale_t. Might make it hard to
exercise on an actual gnu system. A #define locale_t would likely be
ok.
> +SCM_DEFINE (scm_locale_p, "locale?", 1, 0, 0,
> ...
> + if (SCM_SMOB_PREDICATE (scm_tc16_locale_smob_type, obj))
> + return SCM_BOOL_T;
> + return SCM_BOOL_F;
scm_from_bool perhaps.
> +#ifdef USE_GNU_LOCALE_API
> + freelocale ((locale_t)c_locale);
> +#else
> + c_locale->base_locale = SCM_UNDEFINED;
> + free (c_locale->locale_name);
> + scm_gc_free (c_locale, sizeof (* c_locale), "locale");
> +#endif
A possibility there, and with other funcs, would be to implement a
compatible freelocale(), instead of sticking conditionals in each
usage.
> +#ifdef USE_GNU_LOCALE_API
> +
> + c_locale = newlocale (c_category_mask, c_locale_name, c_base_locale);
> + if (!c_locale)
> + locale = SCM_BOOL_F;
Your docs call for an exception on unknown locale don't they?
And should you tell the gc something about the size of a locale_t, and
perhaps extra for its underlying data? To approximate memory used,
for the gc triggers.
> +void
> +scm_init_i18n ()
> +{
> + scm_add_feature ("ice-9-i18n");
Is there any point adding a feature after the module is loaded? :)
I expect a better name would be possible too.
> +(define (under-french-locale-or-unresolved thunk)
> + ;; On non-GNU systems, an exception may be raised only when the locale is
> + ;; actually used rather than at `make-locale'-time. Thus, we must guard
> + ;; against both.
> + (if %french-locale
> + (catch 'system-error thunk
> + (lambda (key . args)
> + (throw 'unresolved)))
> + (throw 'unresolved)))
Do you mean 'unsupported rather than 'unresolved, when fr_FR isn't
available from the system?
> +(with-test-prefix "number parsing"
Some french number parsing too? Just to show there's a point to
locale dependent parsing :).
- Re: Text collation,
Kevin Ryde <=