[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: patch to gnustep-base (Unicode and others)
From: |
Richard Frith-Macdonald |
Subject: |
Re: patch to gnustep-base (Unicode and others) |
Date: |
Mon, 8 Apr 2002 07:09:12 +0100 |
On Sunday, April 7, 2002, at 11:15 PM, Serg Stoyan wrote:
Hello, Richard Frith-Macdonald.
RFM> > Here is a patch to the gnustep-base, whith additions such as:
RFM> > - fixes NSString's initWithCString* methods behaviour by
commenting out
RFM> > GSString's. Without it initWithCString* methods doesn't
convert C
RFM> > string into Unicode and this is not OpenStep compliant;
RFM>
RFM> Perhaps you can explain more ... as far as I cn see the above is
simply
RFM> wrong. Certainly initWithCString* methods are not supposed to
convert to
RFM> unicode (as a general rule), and OpenStep doesn't say they
should - so
RFM> I'm guessing you have some meaning in mind that is not immediately
RFM> obvious to me.
Here is the citation from "OpenStep Specification" (c) 1994 NeXT
Computer
Inc. Class NSString, page 2-127:
"- (id)initWithCString:(const char *)byteString
Initializes the receiver, a newly allocated NSString, by converting
the
one-byte characters in byteString into Unicode characters. byteString
must
be a null-terminated C string in the default C string encoding."
OK ... guess I was wrong about that ... it *does* seem to say strings
should be
converted to unicode ... but that's incorrect/misleading documentation.
If you look in the class description documentation, it tells you that -
'While the actual representation of character strings stored in NSString
and
NSMutableString is independant of any particular implementation, you can
in general
think of the contents of NSString and NSMNutableString object as being,
canonically,
Unicode characters (defined by the unichar data type)'
Really, this means that you should not take the method descriptions too
literally,
they are describing an API, not particular internal implementation
details.
RFM> > - adds 2 languages into Resources/Languages: Russian and
Ukrainian;
RFM>
RFM> Thanks, but I can't use them ... as I don't know what encoding
you have
RFM> created them in. I have added a README file to the
Resources/Languages
RFM> subdirectory to say what format language files *should* be in (and
RFM> corrected some errors in the existing files).
It's ok. I've just updated from CVS and created this files by
cvtenc'ing
them, just like README says. But... When i start any app i get this
message:
File NSDictionary.m: 458. In [GSDictionary -initWithContentsOfFile:]
Contents of file
'/home/stoyan/GNUstep/System/Libraries/Resources/Languages/Russian'
does not contain a dictionary
All I can suggest here is making sure you have the latest code installed.
I fixed a bug in loading 16-bit unicode property lists a day or two ago.
Here is my some environment vars:
[stoyan@localhost]$ echo $GNUSTEP_STRING_ENCODING; echo $LANG
NSKOI8RStringEncoding
ru_RU.KOI8-R
I've attached Russian and UkraineRussian(conforming to Locale.aliases)
files as well.
Thanks, I've added them (I converted to ascii with \u escapes for
consistency
with the other files, but that should make no difference).
I guess we can use 2 types of language files -- plain text property
list,
with encoding in its file name and non-printable unicode file. For
example,
in case of russian:
Languages/Russian.KOI8-R <-- plain proplist in KOI8-R encoding
Languages/Russian.WindowsCP1251 <-- plain proplist in Windows 1251
encoding
Languages/Russian <-- Unicode file, created with
'cvtenc'
Property lists should be ascii ... so I prefer to keep an ascii property
list
containing \u escape sequences for non-ascii character, and create the
other
files temporarily (for editing) using cvtenc
In this case we use Unicode file, and proplist files remains for
editors.
But keeping multiple copies in different formats could let them get out
of
sync with each other if you are not careful.
Or we can use proplist files with appropriate encoding scheme, if we
have
to use it(no unicode file for some reason).
Property list files are ascii.
Strictly speaking, anything non-ascii is not a legal property-list file,
so
while unicode files are also portable, I'd still prefer to stick to
ascii files
with \u escape sequences. That is, if we are sticking to one portable
format
for consistency, I'd prefer it to be the ascii.
PS: Another thing i've mentioned (and i guess should be somwhere in
Documentation) is about using non-ascii characters when initializing
NSString
variable. I mean using such definition:
NSString *some_string = @"some non-ascii characters";
is deprecated. In this case string doesn't not converted into Unicode
and
results is unpredictable, or something.
Well, OpenStep spec simply tells you not to do it (I'd say that's closer
to
'illegal' than 'deprecated') in the NSString class description.
Where do you think this should be documented in GNUstep ?