[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Internationalising strings

From: John Darrington
Subject: Re: Internationalising strings
Date: Tue, 13 Jun 2006 06:19:16 +0800
User-agent: Mutt/1.5.9i

On Mon, Jun 12, 2006 at 09:36:51AM -0700, Ben Pfaff wrote:
     John Darrington <address@hidden> writes:
     > On Sat, Jun 10, 2006 at 02:42:36PM -0700, Ben Pfaff wrote:
     >      > 2.  Obvously macros like CC_ALNUM are only correct for the C 
     >      >     Not a problem so long as everyone's aware of it, but naive
     >      >     programmers might make some mistakes ...
     >      I'm aware of the problem and trying to think of a good solution.
     > I've been thinking a bit about it too.  In the case of parsing input
     > syntax, I think the only solution is, to convert the syntax to 
     > (wchar_t *)  using mbstowcs before doing anything with it. 
     > Thus, functions like become:
     >   bool lex_is_id1(char c);  from data/identifier.c 
     > become
     >   bool lex_is_id1(wchar_t c);
     > testing for alphanumeric characters then is a matter of calling 
     > iswalnum from wctype.h
     It's one option.  I'm not sure whether it's the best option.
     (Why do you think it is the only option?)

Well perhaps it's not the only solution.  It was the only one that I'd
considered in much detail.
     I'm considering implementing something like mb[u]iter.[ch] from
     gnulib in the struct string implementation.  I like that code and
     that idea.  It abstracts away the nastiness of multibyte strings
     pretty well.

That would also be an option, and might be the most appropriate one
for new implementations (such as inside str.c).

But I think that converting existing arrays of char into arrays of
wchar_t might be easier to apply to existing code.  The problem with
multi-byte sequences is that one cannot index into them, or use
pointer increments to iterate thought them.  A few calls to mbstowcs
and some well chosen replacements of "char" with "wchar_t", might do a
lot of the work for us.

Now if only we'd used C++, then we could simply overload the *
operator with a call to mbui_cur, and the *++ operator with
mbui_advance ..... 


PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See or any PGP keyserver for public key.

Attachment: signature.asc
Description: Digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]