Re: patch to gnustep-base (Unicode and others)

bug-gnustep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: patch to gnustep-base (Unicode and others)

From:	Richard Frith-Macdonald
Subject:	Re: patch to gnustep-base (Unicode and others)
Date:	Mon, 8 Apr 2002 07:09:12 +0100

On Sunday, April 7, 2002, at 11:15 PM, Serg Stoyan wrote:

Hello, Richard Frith-Macdonald.

 RFM> > Here is a patch to the gnustep-base, whith additions such as:
RFM> > - fixes NSString's initWithCString* methods behaviour bycommenting outRFM> > GSString's. Without it initWithCString* methods doesn'tconvert C
 RFM> >   string into Unicode and this is not OpenStep compliant;
 RFM>
RFM> Perhaps you can explain more ... as far as I cn see the above issimplyRFM> wrong. Certainly initWithCString* methods are not supposed toconvert toRFM> unicode (as a general rule), and OpenStep doesn't say theyshould - so
 RFM> I'm guessing you have some meaning in mind that is not immediately
 RFM> obvious to me.
Here is the citation from "OpenStep Specification" (c) 1994 NeXTComputer
  Inc. Class NSString, page 2-127:
  "- (id)initWithCString:(const char *)byteString
Initializes the receiver, a newly allocated NSString, by convertingtheone-byte characters in byteString into Unicode characters. byteStringmust
  be a null-terminated C string in the default C string encoding."

OK ... guess I was wrong about that ... it *does* seem to say stringsshould be

converted to unicode ... but that's incorrect/misleading documentation.

If you look in the class description documentation, it tells you that -

'While the actual representation of character strings stored in NSStringandNSMutableString is independant of any particular implementation, you canin generalthink of the contents of NSString and NSMNutableString object as being,canonically,

Unicode characters (defined by the unichar data type)'

Really, this means that you should not take the method descriptions tooliterally,they are describing an API, not particular internal implementationdetails.

RFM> > - adds 2 languages into Resources/Languages: Russian andUkrainian;
 RFM>
RFM> Thanks, but I can't use them ... as I don't know what encodingyou haveRFM> created them in. I have added a README file to theResources/Languages
 RFM> subdirectory to say what format language files *should* be in (and
 RFM> corrected some errors in the existing files).
It's ok. I've just updated from CVS and created this files bycvtenc'ing
  them, just like README says. But... When i start any app i get this
  message:
File NSDictionary.m: 458. In [GSDictionary -initWithContentsOfFile:]Contents of file'/home/stoyan/GNUstep/System/Libraries/Resources/Languages/Russian'does not contain a dictionary


All I can suggest here is making sure you have the latest code installed.
I fixed a bug in loading 16-bit unicode property lists a day or two ago.

  Here is my some environment vars:

  [stoyan@localhost]$ echo $GNUSTEP_STRING_ENCODING; echo $LANG
  NSKOI8RStringEncoding
  ru_RU.KOI8-R

  I've attached Russian and UkraineRussian(conforming to Locale.aliases)
  files as well.

Thanks, I've added them (I converted to ascii with \u escapes forconsistency

with the other files, but that should make no difference).

I guess we can use 2 types of language files -- plain text propertylist,with encoding in its file name and non-printable unicode file. Forexample,
  in case of russian:

  Languages/Russian.KOI8-R         <-- plain proplist in KOI8-R encoding
Languages/Russian.WindowsCP1251 <-- plain proplist in Windows 1251encodingLanguages/Russian <-- Unicode file, created with'cvtenc'

Property lists should be ascii ... so I prefer to keep an ascii propertylistcontaining \u escape sequences for non-ascii character, and create theother

files temporarily (for editing) using cvtenc

In this case we use Unicode file, and proplist files remains foreditors.

But keeping multiple copies in different formats could let them get outof

sync with each other if you are not careful.

Or we can use proplist files with appropriate encoding scheme, if wehave
  to use it(no unicode file for some reason).


Property list files are ascii.

Strictly speaking, anything non-ascii is not a legal property-list file,sowhile unicode files are also portable, I'd still prefer to stick toascii fileswith \u escape sequences. That is, if we are sticking to one portableformat

for consistency, I'd prefer it to be the ascii.

PS: Another thing i've mentioned (and i guess should be somwhere in
Documentation) is about using non-ascii characters when initializingNSString
variable. I mean using such definition:

NSString  *some_string = @"some non-ascii characters";
is deprecated. In this case string doesn't not converted into Unicodeand
results is unpredictable, or something.

Well, OpenStep spec simply tells you not to do it (I'd say that's closerto

'illegal' than 'deprecated') in the NSString class description.

Where do you think this should be documented in GNUstep ?

[Prev in Thread]

Current Thread

[Next in Thread]

patch to gnustep-base (Unicode and others), Serg Stoyan, 2002/04/07
- Re: patch to gnustep-base (Unicode and others), Richard Frith-Macdonald, 2002/04/07
  - Re: patch to gnustep-base (Unicode and others), Serg Stoyan, 2002/04/07
    - Re: patch to gnustep-base (Unicode and others), Richard Frith-Macdonald <=
    - Re: patch to gnustep-base (Unicode and others), Serg Stoyan, 2002/04/08
    - Message not available
    - Re: patch to gnustep-base (Unicode and others), Serg Stoyan, 2002/04/08
- Fwd: Re: patch to gnustep-base (Unicode and others), Richard Frith-Macdonald, 2002/04/08

Prev by Date: Re: problems with 'back'
Next by Date: documentation and info's (once again)
Previous by thread: Re: patch to gnustep-base (Unicode and others)
Next by thread: Re: patch to gnustep-base (Unicode and others)
Index(es):
- Date
- Thread