[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Merging CoreBase into Base
From: |
Richard Frith-Macdonald |
Subject: |
Re: Merging CoreBase into Base |
Date: |
Mon, 12 Aug 2013 17:18:37 +0100 |
On 12 Aug 2013, at 16:56, Stefan Bidi <address@hidden> wrote:
> There are a couple of reasons why to use UTF-16:
> (1) The CF/Foundation APIs assume UTF-16. CFStringGetCharacterAtIndex() and
> CFStringGetCharacters() would be extremely inefficient for anything that
> isn't either ASCII, Latin1 or UTF-16. Just look at what base has to do to
> support UTF-8. It traverses through the whole string every time you call
> -characterAtIndex:.
> (2) Almost all ICU APIs use UTF-16.
>
> To address your concern about endianness, I don't think this is a problem at
> all. The API to the outside world is still the same. We store all strings
> in the host endianness and export them with the BOM if
> isExternalRepresentation is specified.
>
> I can't use libc functions on almost anything except the most basic string
> functions. Not even printf can be used because of the %@ specifier.
I would say that CoreBase should do the same as base here. Hold most strings
as latin1 to keep them small, expand to UTF-16 when required.
I think your worry about UTF-8 is needless ... the slow code is used only for
literal strings, which are almost always so short that occasional calls which
need to step through them are not a performance issue.