freeride-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[FR-devel] FW: i18n (was Re: Andy Roonie)


From: Curt Hibbs
Subject: [FR-devel] FW: i18n (was Re: Andy Roonie)
Date: Fri, 28 Jun 2002 09:57:59 -0700

Vruz, I just wanted to make sure you saw this (and related) posts on tyhe
Ruby ML. It appears that this guy has been making his own mods to the Ruby
core to support I18N.

Curt

-----Original Message-----
From: Benjamin Peterson [mailto:address@hidden
Sent: Friday, June 28, 2002 9:31 AM
To: ruby-talk ML
Subject: i18n (was Re: Andy Roonie)


>|Like many others, I would be happy to devote a large
>|amount of time to Ruby.  In my particular case it
>|would be to i18n, since I can't use Ruby without it.

>|But in practice, I have no way to find out whether
>|someone in Japan is already making an i18n effort,
or
>|whether any changes I made would be accepted, or
>|whether matz has decided what i18n should consist
of,
>|so it doesn't really make sense for me to do
anything
>|at all.

>You can tell me what you like to see in the future,
>although I cannot
>promise you anything (yet).  I mean I'd like to hear
>about the spec,
>not about the implementation.

Well, since you asked for my Christmas list, here it
is!  My wishes for the spec are very similar to those
you stated years ago:

>>>Their's only one I18N policy for Ruby.
>>>  It should not cause me trouble handling Japanese.
  ([ruby-talk:02587])

I would simply like to amend it slightly, thus:

>>>It should not cause me trouble handling text.

To meet this spec, I think the following features
would be needed.

*  Files in text mode should be read in to provide a
stream of characters, not bytes.  It will sometimes be
necessary to specify the encoding explicitly, but most
common ones can be guessed.  Ruby should NOT stop
reading a file when it comes to a 0x1a character!
*splutter*

* Files in text mode should appear in Ruby as a stream
of characters, and be written out to disk as bytes in
the specified format.

*Consoles and other IO devices are like files in this
respect.  To my Ruby program, it looks like I am just
sending and getting 'characters'.  In the Ruby engine
code, it will be necessary to translate them to
whatever encoding is specified for the
console/port/whatever.

*Strings should be of characters.  length() should
return character length, each() should iterate by
characters, [4] should get me the 4th character in the
string.  Bytes and encodings are an implementation
detail and I do not want to have to think about them
when I think of a 'string'.

*Regular expressions should work, even if I am
searching for a hangul followed by an
accent-independant 'e' in a chinese document.  They
should operate on characters, not bytes.

*All characters that exist in Unicode plane 0 should
be specifiable, handled identically, handled fast, and
handled in constant time in Ruby.  Other characters
like unicode surrogates and TRON characters are not
essential; they may require special syntax and slower
processing or may be unsupported totally.

*Source string literals should be able to contain any
Unicode character.  There is no need for source to be
able to be in any arbitrary encoding, though.  UTF8
would probably be good.

*Finally, although generally I want to think of a
string as just characters, sometimes I need to deal
with software that thinks in terms of bytes and
INSISTS on EUC-KR or ASMO-708 or some other strange
encoding.  For these cases, it would be necessary to
translate a string into a particular encoding like so:

    a = "my string".get_encoded_bytes("EUCKR")
    #  a is now an array of bytes...


*pauses for breath*

I would of course be willing to work on any of these
things if there were a plan.

>For your information, >you can get and
>see my experimental M17N implementation from the CVS
>ruby_m17n branch.

I know, but I figured something must have changed
since then, even if there is no physical expression of
it in cvs.

Speaking of things in cvs, though, I should
congratulate Kosako-san on providing a non-gnu regular
expression library and thus removing a painful
licensing issue.  Ah, how wonderful oniguruma is!  How
yet more wonderful it could be if it worked on wide
chars!

Benjamin

x




reply via email to

[Prev in Thread] Current Thread [Next in Thread]